Skip to content
AI Article

Google's Interactions API Shifts the Agent Orchestration Battleground

By moving state and sandboxed execution server-side, Google targets the orchestration layer—but introduces real architectural trade-offs.

Rachel Goldstein
Rachel Goldstein
Dev Tools Editor · Jun 22, 2026 · 5 min read
Google's Interactions API Shifts the Agent Orchestration Battleground

The era of the stateless, token-shuffling LLM wrapper is drawing to a close. For years, developers building conversational interfaces or multi-step agents have borne the cognitive and computational tax of client-side state management. Every turn of a conversation required appending messages to an array, managing sliding context windows, and sending the entire history back to the model.

With the general availability of the Interactions API, Google is attempting to rewrite this operational model for Gemini. Rather than treating the model as a stateless calculator, Google is positioning the Interactions API as a stateful, agent-first execution environment. By moving conversation state, tool orchestration, and even code execution into remote sandboxes on Google's infrastructure, this release is a direct challenge to client-side orchestration frameworks like LangChain.

But while a server-managed agent layer promises to slash latency and boilerplate code, it also introduces significant architectural trade-offs in data sovereignty, vendor lock-in, and debugging visibility.

The Death of Client-Side State and the "Step" Schema

In the legacy generateContent paradigm, developers had to maintain the illusion of state. If a user asked a follow-up question, the developer had to reconstruct the entire chat history.

The Interactions API replaces this with server-side state retention. By passing a previous_interaction_id, developers can continue or even "fork" conversations without sending the historical token payload back over the wire.

from google import genai

client = genai.Client()

# Turn 1: Establish context
turn_1 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="My name is Alice and I am a software engineer."
)

# Turn 2: Continue the conversation using the server-side state
turn_2 = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What is my job?",
    previous_interaction_id=turn_1.id
)
print(turn_2.outputs[-1].text) # Output: You are a software engineer.

This stateful design enables cheap conversation branching. If you want to test two different prompts against the same conversational context, you simply call the API twice using the same previous_interaction_id.

Accompanying this statefulness is a structural shift in how data is represented. Google is moving away from the traditional role-based message format (user, model, system) toward a structured Step schema. In this new model, every action—whether it is a user_input, a model's internal thought process, a function_call, or a model_output—is its own typed step. This is a subtle but crucial change: it treats the interaction not as a chat log, but as an execution trace of an agentic workflow.

Managed Agents and the Remote Sandbox Catch

Perhaps the most aggressive feature of the Interactions API is Managed Agents. Instead of just returning text or tool-call payloads for the client to execute, a single API call can now provision a remote Linux sandbox.

Under this model, an agent like the default antigravity-preview-05-2026 can autonomously reason, execute code, browse the web, and manipulate files within its own isolated environment. Developers can also define custom agents with specific instructions, skills, and data sources.

For developers, this solves a massive engineering headache. Setting up secure, sandboxed execution environments (like Docker containers or WASM runtimes) to run LLM-generated code safely is notoriously difficult. Google is offering to handle the infrastructure, security, and scaling of these sandboxes out of the box.

However, this convenience comes with a catch. Moving execution to Google’s servers means your code, data, and intermediate files live in a black-box environment. For enterprise developers handling sensitive code, regulated user data, or strict compliance requirements (such as SOC 2 or HIPAA), handing over the execution sandbox to a third-party cloud provider is a non-starter. Teams will need to carefully audit sandbox logging, retention policies, and network egress controls before routing production workflows through Managed Agents.

Developer Angle: Implementation and the Migration Calculus

For teams already building on Gemini, the transition to the Interactions API is not an immediate emergency, but it is a clear direction of travel. Google has explicitly stated that while the legacy generateContent API will continue to receive mainline models, "frontier capabilities for long-running models and agents" will increasingly land exclusively on the Interactions API.

If you are starting a new project, you should default to the Interactions API. If you are maintaining an existing codebase, the decision to migrate depends on your tool-use and state requirements.

Mixing Tools and Handling Function Calls

The API allows developers to mix built-in tools (like Google Search or Maps) with custom local functions in a single request. When a custom function is triggered, the execution loop returns a function_call step, which the developer resolves and posts back to the same interaction ID:

# Define a custom tool schema
weather_tool = {
    "type": "function",
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
    }
}

# Initiate the interaction
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    input="What's the weather in Tokyo?",
    tools=[weather_tool]
)

# Handle the tool call returned by the model
for output in interaction.outputs:
    if output.type == "function_call":
        # Execute local logic (mocked here)
        result = {"temperature": "22°C", "condition": "sunny"}
        
        # Send the result back to the server-side interaction state
        final_response = client.interactions.create(
            model="gemini-3-flash-preview",
            previous_interaction_id=interaction.id,
            input={
                "type": "function_result",
                "name": output.name,
                "call_id": output.id,
                "result": result
            }
        )
        print(final_response.outputs[-1].text)

Cost and Latency Optimization

One of the most practical additions to the GA release is the introduction of Flex and Priority tiers.

  • Flex Tier: Offers a 50% cost reduction compared to standard pricing. This is designed for asynchronous, non-latency-sensitive workloads. By combining the Flex tier with the new background=True parameter, developers can trigger long-running agent tasks (like Deep Research) and poll for the results asynchronously, making agentic workflows financially viable at scale.
  • Priority Tier: Optimized for low-latency, real-time user interactions.

The Retention Limit

While server-side state is highly convenient, it is not a permanent database. Paid-tier users get a 55-day retention window on past interactions. This means the Interactions API is excellent for session management and short-to-medium-term context, but developers will still need to maintain their own database for long-term user profiles, vector embeddings, and permanent historical archives.

The Orchestration War: Google vs. the Ecosystem

By baking state, tool loops, and sandboxed execution directly into the API, Google is attempting to commoditize the agent orchestration layer. If the API itself can handle state, orchestrate tools, and run code, the need for heavy client-side frameworks like LangChain or CrewAI diminishes.

Google is playing a smart ecosystem game here. Rather than completely freezing out third-party libraries, they have partnered with tools like LiteLLM, Eigent, and Agno to ensure early integration. They have also released the gemini-interactions-api skill to help coding agents stay updated with these new patterns.

Ultimately, the Interactions API represents a major step forward in API design. It acknowledges that LLMs are no longer just text-in, text-out engines, but the central processors of complex, stateful systems. For developers willing to accept the gravity of Google's ecosystem lock-in, the reward is a dramatically simpler, faster, and cheaper path to production-ready agents.

Sources & further reading

  1. Interactions API: our primary interface for Gemini models and agents — blog.google
  2. Google makes Interactions API the default way to build with Gemini agents — dev.to
  3. Gemini Interactions API | Gemini API | Google AI for Developers — ai.google.dev
  4. Gemini Interactions API Quick Start — philschmid.de
Rachel Goldstein
Written by
Rachel Goldstein · Dev Tools Editor

Rachel has been embedded in the developer tooling ecosystem for nearly eight years, covering everything from IDE wars and package-manager drama to the quiet rise of AI-assisted coding. She has a soft spot for open-source maintainers and an unhealthy number of terminal emulators installed on a single laptop.

Discussion 0

Join the discussion

Sign in or create an account to comment and vote.

No comments yet

Be the first to weigh in.

Related Reading