The loss of control of AI: which training paradigms do or don't encourage advanced AI to take over

Talk
Michael Cohen
Talk Series: 
Time: 
03.30.2026 11:00 to 12:00

Reinforcement learning agents are trained to maximize their long-term reward. This gives them the incentive to secure complete control over their reward, if they can confidently do so. Complete control requires blocking human control. This talk will discuss that issue, how AI companies may be heading toward it, and several ways it can be solved, including pessimism, human imitation, and myopia. Unfortunately, these solutions appear to unavoidably reduce the system's capability. This suggests we will need international coordination to avoid a race to the bottom.