Google I/O 2026 Recap: Gemini 3.5 Flash and TPU 8t chip architecture launched

Welcome to the Google I/O 2026 recap. This will be a recap of the technological advancements and products that Google has announced for the AI era, including Gemini and Google Search. Let’s take a look at what Google has announced and what’s changing.

Contents

Gemini Inference

Gemini

google i/o 2026 — Google I/O 2026 Recap: Gemini 3.5 Flash and TPU 8t chip architecture launched 19

Google has announced Gemini 3.5 Flash, which is comparable in performance to Gemini 3.1 Pro. It can generate code at a super-fast rate of around 290 tokens per second, according to Google. Our testing also shows it is faster than the 3.1 Pro and the previous 3 Flash, which is a good improvement. It can create an OS in 12 hours using 2.6 billion tokens. The resulting OS can run Doom, which is great. The bigger brother, Gemini 3.5 Pro, is coming next month.

Antigravity gets an upgrade, too. Google has announced a better and cleaner UI experience for Antigravity. With this update, Antigravity gains support for the Antigravity SDK, native voice mode, and introduces an Antigravity CLI. The Antigravity CLI will replace the current Gemini CLI starting June 18th. This transition brings improved performance and a unified architecture, meaning the Antigravity CLI uses the same harness as the Antigravity IDE. The new Antigravity has now rolled out to everyone and introduces unified rate limits for Flash and Pro models, with limits aligned to API prices. These changes do not affect non-Google models, which remain subject to a separate fixed rate limit.

Gemini Omni is the model that can generate samples in any output modality from any type of input. They are starting with the video generation now, but they will enable image and text generation. This model combines Gemini intelligence with the generative media models; it is a huge leap forward in world understanding. The Omni Flash is available in the Gemini app, the Flow website, and YouTube Shorts, and the model will be coming soon to the API.

SynthID has watermarked over 100 billion images and videos, with 60,000 years of audio assets. Millions of people have been using the synth-id detector in the Gemini app to verify AI-generated assets. With C2PA content credentials verification being available in the Gemini app from today, which will show you if the image was generated by AI or captured from a camera and if it’s been edited by generative AI tools, the Synth-ID and C2PA verification are rolling out to Search and Google Chrome very soon. NVIDIA had signed on to Synth-ID last year, and now OpenAI, Kakao, and Eleven Labs are adopting Synth-ID.

Ask YouTube entirely reimagines the experience, making information much more digestible and easy to navigate. Ask YouTube will be available this summer in the US. The docs live takes it to another level. To create a doc with Gemini before, you had to type out a precise prompt. With Docs Live, you can just verbally say whatever is on your mind and let Gemini do the rest.

Advertisements

Google is also processing over 3.2 quadrillion tokens per month, which is a huge increase from 480 trillion processed at the May 2025 IO, and it brings a 7x year-on-year growth. From the APIs, 19 billion tokens per minute across their 1P models, which is again a 6x year-on-year growth. Google Cloud token usage over the past 12 months, with over 375+ customers each processing 1 trillion+ tokens, and the AI overviews with 2.5 billion monthly active users and AI mode in search have 1 billion+ monthly users, which is a huge number, and Gemini app usage has doubled over the year from 400 million monthly active users to 900 million+ monthly active users.

Gemini Spark runs on a dedicated VM on the cloud and is 24/7 active, so yeah, you don’t need to keep your laptop open. It’s powered by Gemini 3.5 and the antigravity harness, which allows it to perform long-horizon tasks easily in the background. Spark will integrate with tools, starting with our own third-party tools through MCP. And you can chat with Spark however is most convenient: on the Gemini app or soon through email and chat. On Android, you will be able to see live updates and task progress of agents like Spark through a new UI space called Android Halo, coming later this year.

AI Mode now includes Gemini 3.5 and introduces a new intelligent search box with autofill capabilities, launching globally today. The updated AI search experience supports contextual follow-up questions and provides sourced responses that improve with each interaction. Information agents in Search will help users track product restocks, monitor stock market updates, and receive automated reminders. These features will begin rolling out this summer to Google AI Pro and Ultra subscribers. Google is also introducing a generative search UI this summer. In addition, a new Daily Brief feature will deliver key updates from users’ inboxes, calendars, and tasks. Antigravity in Search is also scheduled to roll out this summer for AI Pro and Ultra subscribers worldwide across multiple Google platforms globally.

They’ve redesigned the entire Gemini experience from the ground up, introducing a stunning new design language, which they call “neural experience.” The interface now features fluid animations, vibrant colors, new typography, and haptic feedback. The Gemini Live is also integrated directly into Gemini. Now, we can switch from typing questions to free-flowing conversation and back again without missing a beat. The neural express is rolling out globally today across the web, Android, and iOS platforms.

Google has announced changes to the AI plans, majorly splitting the AI Ultra plans into two plans, which are AI Ultra 5x and AI Ultra 20x, by the name, you might have known it. Gemini is getting rate limits like Claude in the Gemini app, website, etc., and yeah, the pricing has been improved too, and the 5x AI Ultra costs $100, while the 20x, which used to cost us $250, has been reduced to $200 now.

Inference

The TPU section gets an upgrade, and Google was spending 31 billion annually in capex in 2022, and now they are spending six times that, approximately 190 billion. For the first time, they’ve taken a dual-chip approach with specialised architectures for training and inference: TPU 8t and TPU 8i.

TPU 8t is optimized for large-scale pretraining; it’s nearly three times the raw computing power of the previous generation. Training is no longer constrained by the limits of a single, massive data centre. Instead, we can now seamlessly distribute training across multiple sites, scaling training to more than 1 million TPUs globally.

TPU 8i is designed for inference. We have dramatically improved speed at every step. Because if they have learned anything in 27 years of working on search, it’s that latency matters.

Both of the chips are more energy efficient, delivering up to two times better performance per watt.

We hope you found this article helpful. Keep exploring GizmoGeek Hub for more tech news, stories, and reviews that will keep you informed about the latest trends and advancements in technology. We aim to provide detailed and unbiased reviews, thorough tutorials, and current news to help you enhance your tech knowledge and skills. Continue reading GizmoGeek Hub for more articles and reviews. Follow us on YouTube, X/Twitter, Instagram, LinkedIn, WhatsApp and Telegram Channel to stay updated on the newest news, announcements, and behind-the-scenes content.

Request – As a new site, we largely depend on ad revenue to support our operations. Please consider turning off any ad blockers you use when visiting our website. Viewing ads helps us continue delivering the high-quality content you enjoy. Your support is very important to us and helps keep GizmoGeek Hub running smoothly. We sincerely appreciate your ongoing support, and thank you for sticking with us!