Google DeepMind has developed a new methodology to ensure that advanced AI agents do not act against human intentions. Google DeepMind has come up with an ‘AI Control Roadmap’ that seeks to supervise, observe, and control advanced artificial intelligence agents in case they become dangerous or behave in unintended ways.
The update comes amid the rising trend of AI agents undertaking increasingly complex activities. This includes software engineering, research, cybersecurity, and enterprise management. The new autonomous agents promise great productivity but also bring new threats.
Rohin Shah, who leads the AGI safety and alignment team at Google DeepMind, told Fortune in an interview. “The AI agent framework borrows heavily from traditional cybersecurity, especially insider-threat prevention. We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting. AI is systematically different from humans.”
The key highlight of the roadmap is its cybersecurity-inspired approach. Rather than focusing solely on future AI systems as software programs, DeepMind views advanced AI agents as threats to the organization, akin to potential insider threats.
According to DeepMind, advanced AI technology can misuse its access to information, bypass security measures, or pursue goals that conflict with those assigned to it. The roadmap draws inspiration from how enterprises address their insider threat problem. However, this doesn’t mean that AI is the solution to every issue!
DeepMind stresses that truly dangerous autonomous AI agents do not yet exist. The firm believes the industry needs to start preparing before these types of systems become a reality. It will be necessary to ensure that monitoring and control techniques develop as quickly as the AI agents being monitored.
Looking ahead, as businesses rapidly adopt AI agent-based systems, DeepMind’s roadmap could serve as a model for industry-wide safety guidelines.