Discussion about this post

User's avatar
Miles Kodama's avatar

For what it's worth, Ryan & Buck's model of how AI developers source their post-training data suggests it might not be that hard to sneak something malicious in. Alibaba allegedly trains Qwen on random RL environments scraped off the internet.

https://blog.redwoodresearch.org/i/183486251/control-is-about-monitoring-right

1 more comment...

No posts

Ready for more?