News

Gemini 3.5 Flash Takes on Anthropic with Native ‘Computer Use’ Capabilities

After Anthropic, Google Launches Computer-Controlling AI Agents in Gemini 3.5 Flash, Letting Developers Build Autonomous Systems That Can Click, Type, Navigate Applications, Automate Workflows, and Execute Tasks Across Devices

Written By : Poulami Saha
Reviewed By : Achu Krishnan

Just days after Anthropic upgraded its computer-controlling AI agent capabilities, Google has introduced a similar feature for the Gemini ecosystem.  The tech giant brings ‘Computer Use’ as a built-in feature of Gemini 3.5 Flash. It allows developers to create AI agents that can interact with browsers, desktop applications, and mobile environments.

Earlier offered as a standalone model, the computer-use capability is integrated directly into Gemini 3.5 Flash. This initiative aims to simplify the development of autonomous AI agents for observing screens, reasoning through tasks, and performing actions such as clicking buttons, entering text, navigating websites, and interacting with software interfaces.

Google Integrates Computer Use into Gemini 3.5 Flash

Google explains, “The feature lets AI agents interact with software the way a person does: by looking at a screen and deciding where to click, what to type, and when to scroll, rather than relying on rigid, pre-coded integrations with each app.” 

Gemini 3.5 Flash can run agentic workflows on browsers, mobile phones, and computers. The model can analyze screenshots, understand the interface, and generate actions that mirror human interactions within software applications. Other use cases include:

  • Automating repetitive form filling and data entry

  • Conducting software and application testing

  • Performing research across websites

  • Managing long-running enterprise workflows

  • Supporting knowledge workers across multiple applications

The ability to use Gemini 3.5 Flash in the development of AI agents sets it apart from other chatbots. Google has made it clear that the model is designed for long-horizon tasks.

How Developers Can Build AI Agents

The new feature can be accessed by developers via the Gemini API and the Gemini Enterprise Agent Platform. For ‘Computer Use’ activation, the developer just needs to switch on the tool in Gemini 3.5 Flash and set the environment to a browser, desktop, or mobile.

The model returns actions in a structured format, such as mouse clicks and keystrokes, which are then executed by the developer's client-side environment. Google has also added an ‘intent’ field, which explains the reason for the action taken.

Also read: OpenAI Introduces Jalapeño; Signals New Challenge to NVIDIA’s AI Dominance with its AI Chip

Samsung Adds Galaxy S27 Pro to Lineup, GSMA Listings Point to Major Flagship Shake Up in 2027

LastPass Confirms Customer Data Stolen in Klue Supply Chain Attack, Password Vaults Still Secure

Biggest Banks in UAE Ranked by Assets (2026)

OpenAI Introduces Jalapeño; Signals New Challenge to NVIDIA’s AI Dominance with its AI Chip

HSBC Adds UAE Dirham as 6th Currency on Orion, Enabling 24/7 Payments