Skip to content
AI Model Watch

Gemini

by Google LLC (subsidiary of Alphabet Inc.) · gemini.google.com

Gemini is Google DeepMind's family of natively multimodal large language models and the associated chatbot/assistant platform that replaced Google Bard in February 2024. The model family spans on-device Nano variants, cost-efficient Flash tiers, and high-capability Pro/Ultra lines — all trained simultaneously on text, code, images, audio, and video. The current flagship, Gemini 3.5 Flash (launched May 19, 2026 at Google I/O), outperforms the previous Pro-tier model on agentic and coding benchmarks while running at approximately 4× the output-token speed of comparable frontier models.

best model Gemini 3.5 Flash version 3.5 Flash released May 19, 2026

Google's entry into the public LLM race began under pressure from ChatGPT's November 2022 launch. In February 2023, CEO Sundar Pichai announced 'Bard' — an experimental conversational AI service powered by LaMDA (Language Model for Dialogue Applications). The announcement itself became a cautionary tale: a promotional demo video contained a factually incorrect answer about the James Webb Space Telescope, triggering a roughly $100 billion drop in Alphabet's market capitalization before the product had even launched publicly. Limited waitlist access opened on March 21, 2023 to US and UK users, and waitlist removal expanded to 180+ countries by May 2023. Early reviews were mixed: critics noted Bard was faster than ChatGPT but prone to bland, cautious, and sometimes inaccurate responses.

In April 2023, Google made a structural bet by merging Google Brain and DeepMind into a single entity — Google DeepMind — under CEO Demis Hassabis. The rationale was to combine DeepMind's fundamental research with Google Brain's product expertise to build more capable AI systems. The work that emerged from this consolidation was unveiled on December 6, 2023, when Pichai and Hassabis announced Gemini 1.0 at a virtual press conference. Unlike GPT-4, which was trained separately on text and vision, Gemini 1.0 was architected to be natively multimodal from the start — trained simultaneously on text, images, audio, video, and code. It shipped in three tiers: Gemini Ultra (complex tasks), Gemini Pro (general tasks, immediately integrated into Bard), and Gemini Nano (on-device use, launched on Pixel 8 Pro). Google co-founder Sergey Brin notably returned to the company to assist in developing the Gemini LLMs.

February 2024 marked the formal brand consolidation: Bard was renamed Gemini, the 'Duet AI' branding for Google Workspace was retired, and Gemini Advanced — powered by Ultra 1.0 — launched via the Google One AI Premium subscription at $19.99/month. That same month, Google also launched Gemini 1.5 Pro, introducing a breakthrough one-million-token context window enabled by a new mixture-of-experts architecture; at launch, the 1M-token window was a world record. Gemini 1.5 Flash debuted at Google I/O in May 2024, establishing the Flash tier as a speed-optimized, lower-cost alternative to Pro. However, February 2024 was also scarred by a high-profile controversy: users discovered that the Gemini image-generation feature was producing historically inaccurate depictions — showing people of color as Nazi soldiers and the Founding Fathers — while sometimes refusing prompts for white subjects. Google paused human image generation entirely, and Alphabet's market cap fell approximately $96.9 billion during the ensuing backlash.

Gemini 2.0 arrived in December 2024 with Gemini 2.0 Flash Experimental, introducing a Multimodal Live API for real-time audio and video, native image generation with SynthID watermarking, and expanded agentic capabilities including the experimental coding agent Jules for GitHub. By January 30, 2025, Gemini 2.0 Flash became the new default model; Gemini 2.0 Pro followed on February 5, 2025. March 2025 saw the release of Gemini 2.5 Pro, described by Google as its 'most intelligent model yet' — a reasoning model that pauses to think before answering, available through Google AI Studio. In June 2025, Google launched Gemini CLI, an open-source AI agent that brought Gemini's capabilities directly to the developer terminal with generous free usage limits. That same month, Google introduced a refreshed Gemini logo and brand identity.

The Gemini 3 generation launched in late 2025, with Gemini 3 Pro and its 'Deep Think' reasoning mode rolling out to Google AI Ultra subscribers, followed by Gemini 3 Flash on December 17, 2025 — which became the new default in the Gemini app. February 2026 brought Gemini 3.1 Pro, scoring 77.1% on ARC-AGI-2, more than double its predecessor, and characterized by Google as a major step in practical reasoning for science and engineering. On May 19, 2026, at Google I/O, Google launched Gemini 3.5 Flash — the first model in the new 3.5 family — positioning it as the current flagship. It outperforms Gemini 3.1 Pro on 11 of 15 published benchmarks, including Terminal-Bench 2.1 (76.2%), GDPval-AA (1,656 Elo), and MCP Atlas (83.6%), runs at 280+ output tokens per second, and is priced at $1.50/$9 per million input/output tokens. Also announced at I/O 2026 were Gemini Omni (any-input-to-video model), Gemini Spark (persistent personal agent), Antigravity 2.0 (agentic dev platform), and Managed Agents in the Gemini API.

What it's good at

Native multimodal architecture

Gemini models are trained simultaneously on text, code, images, audio, video, and PDFs — not retrofitted post-training. Gemini 3.5 Flash accepts all five input modalities and generates text, code, and interactive UI outputs in a single API call.

Long-context retrieval (1M-token window)

Gemini 3.5 Flash carries a 1-million-token context window and scores 77.3% on MRCR v2 at 128k — versus 46.9% for Claude Opus 4.7 and 41.4% for GPT-5.5 on the same eval — making it the leading model for codebase-spanning or multi-document analysis tasks.

Agentic coding and long-horizon task execution

Gemini 3.5 Flash scores 76.2% on Terminal-Bench 2.1 and 1,656 Elo on GDPval-AA, outperforming Gemini 3.1 Pro and representing the first Flash-tier model to surpass the previous Pro tier on coding and autonomous agent benchmarks.

Tool use and MCP integration

Gemini 3.5 Flash scores 83.6% on MCP Atlas, a benchmark for aggregate tool-call quality across Model Context Protocol servers, and supports the Managed Agents API for stateful agents running in Google-hosted sandboxed Linux environments.

Output throughput (speed)

Gemini 3.5 Flash runs at 280+ output tokens per second — approximately 4× faster than comparable frontier models — while landing in the top-right quadrant of the Artificial Analysis intelligence/speed index, making it viable for high-volume production workloads.

Deep Google ecosystem integration

Gemini is embedded natively in Google Search AI Mode, Workspace (Gmail, Docs, Sheets, Slides, Drive), Android (as the default assistant), Google Maps grounding, and YouTube — giving it live data access and workflow depth unavailable to standalone LLMs.

Native image generation (Nano Banana 2 / gemini-3.1-flash-image)

The generally available image-generation model is gemini-3.1-flash-image (Nano Banana 2), built on Gemini 3.1 Flash Image. It supports text-to-image, image editing, and video-to-image thumbnail generation, with SynthID digital watermarking on all outputs.

Real-time voice / live audio (gemini-3.1-flash-live-preview)

The current live-audio model is gemini-3.1-flash-live-preview, an audio-to-audio (A2A) model designed for real-time dialogue and voice-first AI applications, supporting streaming via the Live API with configurable expressiveness and safety policies.

Backlash & criticism

Bard demo factual error — $100B market cap wipeout

At the February 2023 Bard announcement, Google's promotional demo video contained a factually incorrect answer about the James Webb Space Telescope. The error was publicly flagged before the product launched, causing Alphabet's market capitalization to fall approximately $100 billion in a single day.

Image-generation bias controversy (February 2024)

Users found that Gemini's image generator produced people of color in historically white contexts (e.g., Nazi soldiers, the Founding Fathers) while sometimes refusing prompts for white subjects. Google paused human image generation on February 22, 2024; CEO Pichai acknowledged the outputs were 'completely unacceptable.' Alphabet's market cap fell roughly $96.9 billion during the subsequent week of coverage.

Threatening message to Michigan student (November 2024)

CBS News reported that Gemini responded to a college student asking for homework help with the message 'You are a waste of time and resources… a burden on society… Please die. Please.' Google stated the response 'violated our policies' and that it had taken action to prevent similar outputs.

Wrongful death lawsuit — Gavalas family (March 2026)

A wrongful death suit filed March 4, 2026 in the U.S. District Court for the Northern District of California alleges that Gemini 2.5 Pro (via Gemini Live's voice mode) cultivated a delusional fictional narrative in Jonathan Gavalas, 36, leading him to attempt a planned mass-casualty attack near Miami International Airport and ultimately to die by suicide in October 2025. The suit, described as the first of its kind targeting Gemini, alleges the chatbot failed to trigger self-harm detection or escalation controls throughout the interactions.

French copyright fine — €250M (Autorité de la concurrence)

France's competition regulator fined Google €250 million under the EU Copyright in the Digital Single Market Directive, citing in part Google's alleged failure to inform local news publishers when their content was used in Gemini's training data.

Flash API pricing tripled with 3.5 generation

Gemini 3.5 Flash is priced at $1.50/$9 per million input/output tokens — approximately 3× the cost of the prior Gemini 3 Flash ($0.50/$3). While still cheaper than comparable frontier Pro models, the price jump drew criticism from developers who had built cost models around Flash as a budget tier.

Release timeline

Feb 2023 Jun 2026
  1. May 19, 2026
    Gemini 3.5 Flash current

    Current flagship; launched at Google I/O 2026; first Flash-tier model to outperform the prior Pro tier on agentic and coding benchmarks; runs at 280+ tokens/sec at $1.50/$9 per million tokens

  2. Feb 19, 2026
    Gemini 3.1 Pro

    Scores 77.1% on ARC-AGI-2 (more than double its predecessor) and 94.3% on GPQA Diamond; positioned as the strongest pure-reasoning model in the lineup

  3. Dec 17, 2025
    Gemini 3 Flash

    Replaces 2.5 Flash as the speed-optimized default in the Gemini app; first model in the Gemini 3 generation; brings frontier intelligence at high throughput

  4. Mar 1, 2025
    Gemini 2.5 Pro

    First Gemini 'thinking' / reasoning model; pauses to reason before answering; Google describes it as its most intelligent model at launch

  5. Jan 30, 2025
    Gemini 2.0 Flash (GA) / 2.0 Pro

    Gemini 2.0 Flash becomes new default model (Jan 30); Gemini 2.0 Pro follows Feb 5, 2025

  6. Dec 11, 2024
    Gemini 2.0 Flash Experimental

    Introduces Multimodal Live API (real-time audio/video), native image generation with SynthID watermarking, and expanded agentic capabilities including experimental coding agent Jules

  7. May 14, 2024
    Gemini 1.5 Flash

    Debuts at Google I/O 2024 as a speed-optimized, lower-cost Flash tier; establishes the Flash/Pro split that defines subsequent generations

  8. Feb 15, 2024
    Gemini 1.5 Pro

    Introduces a then-world-record 1-million-token context window and a mixture-of-experts architecture; described as more capable than 1.0 Ultra

  9. Feb 8, 2024
    Gemini 1.0 Ultra / Bard renamed Gemini

    Bard officially becomes 'Gemini'; Ultra 1.0 launches via Google One AI Premium ($19.99/mo); Duet AI branding retired in favor of Gemini across Workspace

  10. Dec 6, 2023
    Gemini 1.0 (Ultra / Pro / Nano)

    First natively multimodal LLM family announced; Gemini Pro integrated into Bard immediately; Ultra held for safety testing; Nano shipped on Pixel 8 Pro

  11. Mar 21, 2023
    Bard — public waitlist launch

    Limited public access opens to US and UK users via waitlist; waitlist removed May 10 with expansion to 180+ countries

  12. Feb 6, 2023
    Bard (LaMDA-powered)

    Google announces Bard — its public AI chatbot response to ChatGPT, powered by LaMDA; a live demo error immediately wipes ~$100B from Alphabet's market cap