2 Comments
User's avatar
Miles Kodama's avatar

For what it's worth, Ryan & Buck's model of how AI developers source their post-training data suggests it might not be that hard to sneak something malicious in. Alibaba allegedly trains Qwen on random RL environments scraped off the internet.

https://blog.redwoodresearch.org/i/183486251/control-is-about-monitoring-right

Hewson's avatar

Relatedly, how much do post-training startups like Scale/Surge/Mercor vet their contractors?

Not sure how important the contractor data is these days (or how much the data itself gets filtered/vetted), but maybe if you compromise enough of their contractor pool for some expert task set, that could be another way to get backdoors without inside access to any of the developers?