For what it's worth, Ryan & Buck's model of how AI developers source their post-training data suggests it might not be that hard to sneak something malicious in. Alibaba allegedly trains Qwen on random RL environments scraped off the internet.
Relatedly, how much do post-training startups like Scale/Surge/Mercor vet their contractors?
Not sure how important the contractor data is these days (or how much the data itself gets filtered/vetted), but maybe if you compromise enough of their contractor pool for some expert task set, that could be another way to get backdoors without inside access to any of the developers?
For what it's worth, Ryan & Buck's model of how AI developers source their post-training data suggests it might not be that hard to sneak something malicious in. Alibaba allegedly trains Qwen on random RL environments scraped off the internet.
https://blog.redwoodresearch.org/i/183486251/control-is-about-monitoring-right
Relatedly, how much do post-training startups like Scale/Surge/Mercor vet their contractors?
Not sure how important the contractor data is these days (or how much the data itself gets filtered/vetted), but maybe if you compromise enough of their contractor pool for some expert task set, that could be another way to get backdoors without inside access to any of the developers?