Why securing AI model weights isn’t enough

Feb 9

AI will soon be integrated into everything. We should make sure it hasn’t been compromised.

2 Comments

For what it's worth, Ryan & Buck's model of how AI developers source their post-training data suggests it might not be that hard to sneak something malicious in. Alibaba allegedly trains Qwen on random RL environments scraped off the internet.

https://blog.redwoodresearch.org/i/183486251/control-is-about-monitoring-right

Reply (1)

Hewson

Feb 15

Relatedly, how much do post-training startups like Scale/Surge/Mercor vet their contractors?

Not sure how important the contractor data is these days (or how much the data itself gets filtered/vetted), but maybe if you compromise enough of their contractor pool for some expert task set, that could be another way to get backdoors without inside access to any of the developers?