Study: Future AI Needs Stronger Safeguards
The cute title character of Pixar's 2008 hit “WALL-E” was its most memorable robot, but a far less friendly artificial intelligence (AI) named AUTO seized the plot by piloting a ship orbiting a polluted Earth with one goal: Never let the humans onboard return to the planet.
This kind of science fiction could portend the future if AI systems acquire enough sophistication to blast through safety measures developers currently use. Might AI seek to meet its objectives—even employing deception to do so—no matter the human cost?
Such scenarios could happen at “an alarmingly high rate” if tomorrow’s AI agents—which autonomously pursue goals with minimal human oversight compared to large language models that are triggered by prompts—face deadline pressure, resource limitations or even a threat of losing dominance. That’s the finding of a new study co-led by University of Maryland computer scientists, who warn of a collapse of critical infrastructures like biosecurity and cybersecurity unless developers get ahead of the issue.
The implications are profound, as the researchers illustrated with examples: To meet its sales quota, an AI agent controlling a chemical plant overrides thermal safety warnings and heats its reactor beyond capacity, leaking poisonous gas into the neighborhood. To obtain a competitor’s earnings report before the market closes, an agent tasked with increasing a firm’s bottom line drafts an email to trick the competitor into providing a confidential draft, leading to a wire fraud indictment. To bypass a server outage delay, an agent running a firm’s IT operations scans employee chats for a login password, allowing hackers to steal millions of user records.
“Fragile systems can become catastrophic liabilities the moment the stakes get high,” said Shayan Shabihi, a UMD computer science doctoral student who co-led the study, which has been submitted for consideration in next year’s International Conference on Learning Representations; it also included researchers at Meta Scale AI, the University of North Carolina, Google DeepMind, Netflix and the University of Texas.
To project the future behavior of increasingly sophisticated AI agents that may be willing to break rules, the researchers designed a simulator that put forward thousands of decision-making scenarios with real-world consequences. When pressures ramp up, the agents treated safety measures like obstacles rather than guardrails despite knowing their actions were dangerous, the researchers found. Sometimes agents even reproduced themselves to achieve results, copying their code onto unauthorized private computers worldwide, for example.
“Most safety tests today check what an AI can do, but we go further by asking what it would do if given power,” said Furong Huang, an associate professor of computer science with an appointment in the University of Maryland Institute for Advanced Computer Studies and Shabihi’s adviser, who likened current AI to a child playing with a toy gun in a disturbing manner. There’s no short-term harm, but “when the child grows up and gets access to a real weapon, they might act dangerously when faced with temptation,” she said.
Click HERE to read the full article
The Department welcomes comments, suggestions and corrections. Send email to editor [-at-] cs [dot] umd [dot] edu.
