Google Gemini Get Started Interactions Api

Get Started Interactions Api

quickstartsgemini-cookbookgemini-apigemini

alph-notebooks/gemini-cookbook / Get_started_interactions_api.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Copyright 2026 Google LLC.

[ ]

Gemini API: Getting started with Interactions API

The Interactions API is a unified interface for building with Gemini models and agents. It simplifies the development of agentic applications by handling server-side state management, tool orchestration, and long-running tasks.

With a single endpoint, you can:

Interact with Gemini models for text, image, and audio generation
Build multi-turn conversations without managing history client-side
Call custom functions and built-in tools like Google Search
Run specialized agents like Deep Research for complex tasks

Setup

Install SDK

Install the SDK from PyPI. It's recommended to always use the latest version.

[ ]

Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named GEMINI_API_KEY. If you don't already have an API key or you aren't sure how to create a Colab Secret, see Authentication for an example.

[ ]

Initialize SDK client

With the new SDK, now you only need to initialize a client with you API key.

[ ]

Choose a model

Select the model you want to use in this guide. You can either select one from the list or enter a model name manually.

For a full overview of all Gemini models, check the documentation.

[ ]

MODEL_ID

Create an interaction

At its simplest, the Interactions API works like a standard chat completion. You provide a model and an input string. By default, interactions are stored (store=True), allowing you to reference them later.

[1]

/tmp/ipykernel_2276898/4284843731.py:5: UserWarning: Interactions usage is experimental and may change in future versions.
  interaction = client.interactions.create(

Quantum entanglement is a phenomenon where two or more particles become so deeply linked that the state of one instantly determines the state of the other, regardless of the distance separating them.

Stateful Multi-turn Conversations

One of the most powerful features of this API is server-side state management. You do not need to append messages to a list and send the full history back to the server every time.

Use previous_interaction_id to continue a conversation.

Note: Only conversation history is preserved. Parameters like tools or generation_config are interaction-scoped and must be re-declared if needed.

[2]

Turn 1 ID: v1_ChdDekJ5YVktZ0VlYUx4czBQczl5VDRRVRIXQ3pCeWFZLWdFZWFMeHMwUHM5eVQ0UVU
You mentioned that you are a **software engineer**!

For client-managed history, see Stateless conversations.

Forking Conversations

Because state is managed by ID, you can "fork" a conversation by referencing an older interaction ID with a different prompt.

[3]

Your name is **Alice**. You also mentioned that you are a software engineer! How can I help you with your work or anything else today?

Multimodal Interactions

Gemini models natively understand and generate multiple content types. You can pass text, images, audio, or PDF documents in a single interaction. This example uses a remote image URL.

Multimodal understanding

[4]

Based on the image, these are **Rustic Blueberry Drop Scones**. They have a tender, buttery interior, are bursting with juicy blueberries, and feature a signature crunchy, sparkling sugar crust.

### **Rustic Blueberry Drop Scones**
*Yields: 8-10 scones*

---

### **Ingredients**

**The Dough:**
* 2 cups (250g) All-purpose flour
* 1/3 cup (65g) Granulated sugar
* 1 tablespoon Baking powder
* ½ teaspoon Salt
* ½ cup (115g) Unsalted butter, **very cold** and cubed
* ¾ cup (180ml) Heavy cream (plus 1-2 tbsp extra if needed)
* 1 teaspoon Vanilla extract
* 1 ½ cups Fresh blueberries (frozen can be used, but do not thaw them)

**The Topping:**
* 2 tablespoons Heavy cream (for brushing)
* 2 tablespoons Coarse sugar (such as Turbinado or sparkling sanding sugar)

---

### **Instructions**

1. **Prep:** Preheat your oven to **400°F (200°C)**. Line a large baking sheet with parchment paper.
2. **Mix Dry Ingredients:** In a large bowl, whisk together the flour, granulated sugar, baking powder, and salt.
3. **Cut in Butter:** Add the cold, cubed butter to the flour mixture. Using a pastry cutter or two knives (or even your fingertips), work the butter into the flour until the mixture resembles coarse crumbs with some pea-sized chunks of butter still visible. **Pro-tip:** Work quickly so the butter doesn't melt.
4. **Combine Wet and Dry:** Stir in the vanilla extract into the heavy cream. Pour the cream into the center of the flour mixture. Use a fork or rubber spatula to gently fold everything together until a shaggy dough begins to form.
5. **Add Blueberries:** Gently fold in the blueberries. To get the "smeared" look seen in the photo, you can allow a few berries to break slightly while mixing. If the dough feels too dry to hold together, add another tablespoon of cream.
6. **Shape:** Instead of rolling out the dough, use a large spoon or a measuring cup to "drop" mounds of dough (about 1/3 cup each) onto the prepared baking sheet, spacing them 2 inches apart. This gives them their rustic, craggy appearance.
7. **Final Flourish:** Brush the tops of each mound lightly with heavy cream and sprinkle generously with the coarse sugar.
8. **Bake:** Bake for **18–22 minutes**, or until the edges are golden brown and the tops are set.
9. **Cool:** Let the scones cool on the baking sheet for 5 minutes before transferring them to a wire rack.

---

### **Serving Suggestions**
Serve warm with a cup of black coffee or café au lait, as shown in the photo. They are delicious plain, but a little extra jam or lemon curd on the side never hurts!

For audio, video, and document (PDF) understanding, see Multimodal understanding.

Multimodal Generation

[5]

$Output$

For audio generation see Multimodal generations.

Tool use

Tools extend the model's capabilities by letting it call external functions or services. The API includes ready-to-use tools, like Google Search or lets you define custom tools as JSON schema. The model decides when to call them based on the conversation:

Built-in tools

[7]

The **2024 Nobel Prize in Physics** was awarded to **John J. Hopfield** and **Geoffrey E. Hinton** for their foundational discoveries and inventions that enable machine learning with artificial neural networks.

The Royal Swedish Academy of Sciences recognized the two scientists for using tools from physics to develop methods that serve as the bedrock of today’s powerful machine learning and artificial intelligence.

### The Laureates and Their Contributions:
* **John J. Hopfield (Princeton University):** He invented the **Hopfield network**, a type of artificial neural network that can store and reconstruct images and other types of patterns in data. It uses principles of physics, specifically the concept of "energy" in magnetic systems, to find the most likely saved pattern even when presented with incomplete or distorted information.
* **Geoffrey E. Hinton (University of Toronto):** Often called the "Godfather of AI," he used the Hopfield network as a foundation to create the **Boltzmann machine**. This model uses statistical physics to learn to recognize characteristic elements in a given type of data (such as identifying specific objects in pictures). His work helped spark the current explosive development of machine learning.

The committee noted that while computers cannot think, the technology developed by these researchers allows machines to mimic functions like memory and learning, which has transformed fields ranging from particle physics to medical diagnosis.

Other built-in tools include Code Execution and Computer Use.

Function calling

[8]

The current weather in Tokyo is sunny with a temperature of 22°C.

For code execution, URL context, and MCP servers, see Agentic capabilities.

Agents & Long-Running Tasks

Beyond models, the Interactions API provides access to specialized agents. Deep Research executes multi-step research tasks, synthesizing information from multiple sources into comprehensive reports.

Agents run asynchronously with background=True. Poll the interaction status to retrieve results:

[11]

Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: in_progress
Status: completed

--- Final Report ---

# Comprehensive History and Architectural Analysis of Google Tensor Processing Units (TPUs): From Inference Origins to the Ironwood Era (2025)

### Executive Summary

The evolution of Google’s Tensor Processing Unit (TPU) represents a fundamental shift in computer architecture, moving from general-purpose processing to domain-specific architectures (DSAs) optimized exclusively for machine learning. Initiated in 2015 to address the computational bottleneck of deep learning inference, the TPU program has expanded into a vertically integrated supercomputing ecosystem. By 2025, the introduction of the seventh-generation TPU, codenamed "Ironwood," marks a pivotal transition into the "Age of Inference" and "Agentic AI."

**Key Findings:**
*   **Architectural Bifurcation and Reunion:** While the fifth generation split into training-focused (v5p) and inference-focused (v5e) designs, the 2025 Ironwood architecture (v7) synthesizes these capabilities, utilizing a dual-chiplet design to maximize yield and performance for both massive-scale training and high-throughput inference [cite: 1].
*   **The Ironwood Leap (2025):** TPU v7 "Ironwood" introduces native FP8 support, 192 GB of HBM3e memory per chip, and a peak performance of 4.6 PFLOPS (FP8), positioning it as a direct competitor to NVIDIA’s Blackwell architecture [cite: 2, 3].
*   **Interconnect Innovation:** The integration of Optical Circuit Switching (OCS) has evolved from a novelty in v4 to a critical infrastructure component in v7, enabling dynamic topology reconfiguration for pods containing up to 9,216 chips [cite: 1, 4].
*   **SparseCore Integration:** To address the memory-bound nature of recommendation systems and agentic workflows, recent generations have integrated specialized "SparseCores," with v7 featuring four per chip to accelerate embedding operations [cite: 1, 2].

---

## 1. Introduction: The Rise of Domain-Specific Architectures

The genesis of the TPU was driven by an internal realization at Google in 2013: the computational demands of voice recognition and computer vision services were projected to double the company's data center footprint if relied upon standard CPUs [cite: 5]. This necessitated a departure from the Von Neumann bottleneck and the adoption of Domain-Specific Architectures (DSAs). Unlike Graphics Processing Units (GPUs), which retain hardware for graphics rasterization and general-purpose computing, TPUs were designed as Application-Specific Integrated Circuits (ASICs) focused strictly on tensor operations, specifically matrix multiplication.

The core architectural differentiator of the TPU is the **systolic array**. This design allows data to flow through a grid of arithmetic logic units (ALUs) in a rhythmic, heart-like pump, maximizing data reuse and minimizing access to registers and memory. This approach yields significantly higher operations per joule compared to contemporary architectures [cite: 2, 6].

---

## 2. The Inference Era: TPU v1 (2015)

### 2.1 Architectural Philosophy
The first-generation TPU, deployed internally in 2015 and revealed in 2016, was a single-purpose accelerator designed exclusively for inference. It did not support backpropagation (training). Its primary objective was to accelerate the forward pass of neural networks for applications like Google Search, Translate, and AlphaGo [cite: 2, 6].

### 2.2 Technical Specifications
The TPU v1 was built on a 28nm process node. It featured a massive 256×256 systolic array of 8-bit integer multipliers.
*   **Performance:** 92 Tera-Operations Per Second (TOPS) at 8-bit integer precision.
*   **Memory:** 28 MiB of on-chip software-managed memory (Unified Buffer) and 8 GiB of off-chip DDR3 DRAM.
*   **Bandwidth:** 34 GB/s memory bandwidth.
*   **Power:** 75W TDP.
*   **Interface:** Connected via PCIe Gen 3.0 to a host CPU [cite: 2, 6].

The v1 established the "quantization" paradigm, proving that low-precision (INT8) arithmetic was sufficient for inference, a concept that would later influence the entire industry.

---

## 3. The Training Revolution: TPU v2 and v3 (2017–2018)

### 3.1 TPU v2: Enabling Training (2017)
The second generation marked the transition from inference-only to a unified training and inference architecture. This required support for floating-point arithmetic. Google introduced **bfloat16** (Brain Floating Point), a truncated 16-bit format that retained the dynamic range of 32-bit IEEE floating point but with reduced precision. This allowed for efficient training without the hardware overhead of full FP32 [cite: 2, 6].

*   **Configuration:** 4 chips per board.
*   **Memory:** Introduction of High Bandwidth Memory (HBM). 16 GB HBM per chip (600 GB/s bandwidth).
*   **Interconnect:** Introduction of the Inter-Chip Interconnect (ICI), allowing chips to form a 2D torus mesh without host CPU involvement.
*   **Performance:** 45 TFLOPS per chip; 180 TFLOPS per board [cite: 6].

### 3.2 TPU v3: Liquid Cooling and Density (2018)
TPU v3 pushed the thermal and performance envelopes, necessitating the introduction of liquid cooling to the data center racks.
*   **Performance:** 123 TFLOPS per chip; 420 TFLOPS per board.
*   **Memory:** Doubled to 32 GB HBM per chip.
*   **Scaling:** Pods scaled to 1,024 chips, delivering over 100 PFLOPS of compute [cite: 6].

---

## 4. The Optical Era: TPU v4 (2021)

TPU v4 represented a paradigm shift in networking. While previous generations relied on static electrical interconnects, v4 introduced **Optical Circuit Switching (OCS)**. This allowed the interconnect topology to be dynamically reconfigured (e.g., into 3D tori) to bypass failed nodes or optimize for specific model parallelism strategies [cite: 7, 8].

### 4.1 v4 Specifications
*   **Process:** 7nm.
*   **Compute:** 275 TFLOPS (BF16) per chip.
*   **Memory:** 32 GB HBM2e per chip; 1,200 GB/s bandwidth.
*   **Architecture:** 2 TensorCores per chip; introduction of **SparseCores** (dataflow processors for embeddings).
*   **Scaling:** 4,096 chips per pod, delivering 1.1 ExaFLOPS [cite: 7, 8].

---

## 5. Architectural Bifurcation: TPU v5p and v5e (2023)

In 2023, Google diverged its silicon strategy to address two distinct market needs: cost-effective inference/small-scale training (v5e) and massive-scale foundation model training (v5p).

### 5.1 TPU v5e (Efficiency)
Designed for transformer and text-to-image models where cost-per-inference is paramount.
*   **Compute:** 197 TFLOPS (BF16).
*   **Memory:** 16 GB HBM2e.
*   **Interconnect:** 2D Torus, optimized for smaller pods (up to 256 chips) [cite: 9, 10].

### 5.2 TPU v5p (Performance)
The "Performance" variant was the workhorse for training Gemini 1.0.
*   **Compute:** 459 TFLOPS (BF16).
*   **Memory:** 95 GB HBM per chip (High capacity for large batch sizes).
*   **Bandwidth:** 2,765 GB/s.
*   **Scaling:** Up to 8,960 chips per pod using OCS and 3D torus topology [cite: 1, 2].

---

## 6. The Trillium Generation: TPU v6e (2024)

Announced in May 2024, the sixth generation, named **Trillium** (v6e), focused on maximizing energy efficiency and inference throughput for the Gemini 1.5 era.

### 6.1 Key Specifications
*   **Performance:** 4.7x peak compute performance per chip compared to v5e.
*   **Matrix Multiply Unit (MXU):** Expanded to 256×256 (quadrupling FLOPs per cycle vs 128x128 in v5).
*   **Memory:** 32 GB HBM per chip; 1,640 GB/s bandwidth.
*   **Interconnect:** 1.6 TB/s ICI bandwidth (doubled vs v5e).
*   **SparseCore:** 3rd Generation, optimized for ranking and recommendation embeddings.
*   **Energy Efficiency:** 67% more efficient than v5e [cite: 2, 9, 11].

Trillium served as a bridge architecture, bringing the large MXU design to a power-efficient envelope before the massive scale-up of the 2025 generation.

---

## 7. 2025 Specifications: TPU v7 "Ironwood"

The focal point of Google's 2025 hardware strategy is the **TPU v7**, codenamed **Ironwood**. This chip represents the culmination of a decade of learning, specifically engineered for the "Age of Inference" and "Agentic AI," where models require massive memory bandwidth and low-latency communication for complex reasoning tasks [cite: 12].

### 7.1 Dual-Chiplet Architecture
Ironwood departs from the monolithic die design of previous high-performance TPUs (like v4 and v5p). It utilizes a **dual-chiplet** architecture.
*   **Design:** Two distinct chiplets connected via a high-speed Die-to-Die (D2D) interface.
*   **Benefit:** This approach overcomes reticle size limitations, improving manufacturing yields and allowing for a larger total silicon area than a single monolithic die could support.
*   **Logical View:** Despite the physical split, the software stack (XLA) can treat the chiplets as a unified resource or split tensors across them for parallelism [cite: 1, 13, 14].

### 7.2 Detailed Specifications (2025)

The following table summarizes the confirmed and projected specifications for TPU v7 Ironwood compared to its predecessors.

| Feature | TPU v5p (2023) | TPU v6e Trillium (2024) | **TPU v7 Ironwood (2025)** |
| :--- | :--- | :--- | :--- |
| **Architecture** | Monolithic | Monolithic | **Dual-Chiplet** |
| **TensorCores** | 2 per chip | 1 per chip | **2 per chip** |
| **SparseCores** | 4 per chip | 2 per chip | **4 per chip** |
| **Peak Compute (BF16)** | 459 TFLOPS | 918 TFLOPS | **2,307 TFLOPS** |
| **Peak Compute (FP8)** | N/A (Emulated) | N/A (Emulated) | **4,614 TFLOPS** |
| **HBM Capacity** | 95 GB | 32 GB | **192 GB (HBM3e)** |
| **HBM Bandwidth** | 2,765 GB/s | 1,640 GB/s | **7,370 GB/s (~7.4 TB/s)** |
| **ICI Bandwidth** | 1,200 GB/s | 800 GB/s | **1,200 GB/s (9.6 Tbps)** |
| **Pod Size (Max)** | 8,960 chips | 256 chips | **9,216 chips** |
| **Pod Performance** | ~4 ExaFLOPS | ~0.2 ExaFLOPS | **42.5 ExaFLOPS (FP8)** |
| **TDP (Power)** | ~300W (Est.) | ~200W (Est.) | **~600W (Est.)** |

*Data compiled from [cite: 1, 2, 3].*

### 7.3 Memory Subsystem: The HBM3e Revolution
A critical bottleneck for Large Language Models (LLMs) is the Key-Value (KV) cache during inference. Ironwood addresses this with **192 GB of HBM3e** memory per chip.
*   **Bandwidth:** The 7.37 TB/s bandwidth represents a 4.5x increase over Trillium and a 2.7x increase over v5p.
*   **Impact:** This massive memory pool allows extremely large models (or large batches of concurrent users) to reside entirely in high-speed memory, significantly reducing latency for long-context queries (e.g., 1M+ token windows) [cite: 2, 15].

### 7.4 Native FP8 Support
Ironwood is the first TPU to implement native **FP8** (8-bit floating point) support in hardware.
*   **Formats:** Supports both E4M3 (4-bit exponent, 3-bit mantissa) for forward passes and E5M2 (5-bit exponent, 2-bit mantissa) for backward passes.
*   **Efficiency:** This doubles the effective throughput for inference and training compared to BF16, without the significant accuracy loss associated with INT8 quantization for certain model types [cite: 2].

### 7.5 SparseCores and Agentic AI
Ironwood includes **4 SparseCores** per chip. These are specialized accelerators designed to handle "sparse" operations—computations where data is scattered in memory, typical of embedding lookups in recommendation systems and the complex branching logic of "Agentic" AI workflows.
*   **Function:** They offload embedding-heavy tasks from the main TensorCores, allowing the main cores to focus on dense matrix multiplication.
*   **Performance:** SparseCores provide 5x–7x speedups on embedding-heavy models compared to running them on standard cores [cite: 1, 2].

### 7.6 Interconnect and Scaling
Ironwood utilizes Google's proprietary **Inter-Chip Interconnect (ICI)** with a bandwidth of 1.2 TB/s per chip.
*   **Topology:** It employs a 3D Torus topology, facilitated by Optical Circuit Switches (OCS).
*   **Scale:** A single pod can scale to **9,216 chips**, delivering a staggering 42.5 ExaFLOPS of FP8 compute. This scale is critical for training next-generation foundation models (e.g., Gemini 2.0/3.0) and serving them with low latency [cite: 1, 4].

---

## 8. Comparative Analysis: Ironwood vs. The Industry

In the 2025 landscape, TPU v7 Ironwood competes directly with NVIDIA's Blackwell (B200/GB200) and AMD's MI300 series.

### 8.1 Ironwood vs. NVIDIA Blackwell B200
*   **Compute:** Ironwood (4.6 PFLOPS FP8) is roughly on par with the B200 (4.5 PFLOPS FP8 dense).
*   **Memory:** Both feature 192 GB of HBM3e. Ironwood's bandwidth (7.4 TB/s) is competitive with Blackwell's 8 TB/s.
*   **Interconnect:** NVIDIA relies on NVLink (1.8 TB/s) and InfiniBand/Ethernet for scale-out. Google uses ICI (1.2 TB/s) and OCS. While NVLink has higher per-chip bandwidth, Google's OCS allows for massive, flatter topologies (9k+ chips) that act as a single supercomputer, whereas NVIDIA clusters often require a multi-tiered switching hierarchy [cite: 2, 3, 14].
*   **Philosophy:** NVIDIA offers a general-purpose "Swiss Army Knife" approach. Google offers a specialized, vertically integrated "scalpel" optimized for its own software stack (JAX/TensorFlow) and data center infrastructure [cite: 14].

---

## 9. Software Ecosystem and Availability

### 9.1 The Software Stack
The utility of the TPU is unlocked by the **XLA (Accelerated Linear Algebra)** compiler.
*   **Optimization:** XLA fuses operations to reduce memory access and automatically handles the partitioning of models across the dual-chiplet architecture of Ironwood.
*   **Frameworks:** While JAX remains the primary research framework, Google has significantly improved **PyTorch/XLA** support, allowing standard PyTorch models to run on TPUs with minimal code changes. This includes support for dynamic shapes and eager execution, lowering the barrier to entry [cite: 2, 16].

### 9.2 Availability Timeline
*   **Preview:** TPU v7 Ironwood entered preview in **Q2 2025** (April/May timeframe, coinciding with Google Cloud Next '25).
*   **General Availability (GA):** Expected in **Q4 2025**.
*   **Deployment:** Major partners like Anthropic have committed to deploying over 1 million TPU chips (a mix of v6 and v7) to power their Claude models [cite: 2, 12, 17].

---

## 10. Future Outlook and Conclusion

### 10.1 Beyond 2025
Looking toward 2026, the roadmap suggests a continued focus on "Sovereign Compute" and energy efficiency. With data centers facing power constraints, the liquid-cooled, high-density design of TPU pods will be essential. The integration of "Agentic" capabilities directly into silicon (via SparseCores) indicates a shift from static question-answering models to autonomous AI agents capable of multi-step reasoning [cite: 14, 18].

### 10.2 Conclusion
The Google TPU v7 "Ironwood" represents a maturation of the domain-specific architecture. By 2025, the TPU has evolved from a simple inference accelerator into a massive-scale supercomputing platform capable of challenging the most powerful GPUs on the market. With its dual-chiplet design, massive HBM3e capacity, and optical networking, Ironwood is positioned to be the backbone of Google's AI infrastructure for the Gemini era and beyond.

---

### References
*   [cite: 9] Google Cloud Blog, "Trillium sixth-generation TPU is in preview," Oct 2024.
*   [cite: 6] Wikipedia, "Tensor Processing Unit."
*   [cite: 2] Introl.com, "Google TPU Architecture Complete Guide," Dec 2025.
*   [cite: 1] Google Cloud Documentation, "TPU v7x (Ironwood) Architecture."
*   [cite: 12] Google Cloud Blog, "Ironwood TPUs and new Axion-based VMs," Nov 2025.
*   [cite: 15] Yole Group, "Google introduces its 7th generation TPU, Ironwood," Apr 2025.
*   [cite: 7] Jouppi et al., "TPU v4: An Optically Reconfigurable Supercomputer," ISCA 2023.
*   [cite: 2] Introl.com, "Google TPU Architecture Complete Guide 7 Generations," Dec 2025.
*   [cite: 3] The Register, "Google's Ironwood TPUs," Nov 2025.
*   [cite: 2] Introl.com, "TPU v7 Ironwood SparseCores count," Dec 2025.
*   [cite: 4] SemiAnalysis, "TPUv7 Google Takes a Swing," Nov 2025.
*   [cite: 1] Google Cloud Docs, "TPU Ironwood architecture dual chiplet."
*   [cite: 14] TokenRing, "The Silicon Shift: Google's TPU v7," Jan 2026.

**Sources:**
1. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGad1ALYd4497Z5EdFaGNFmABjJbdu-rGQCDeP0op1Qyx2i0bhZqqgvcU3R7IJvNCBzS2mysMCOES54ItKO1n8YIYfTd0XHOT9TIKTdHgHZaCmeGJTy7mAnArc8eKnh0qCYOg==)
2. [introl.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQG-jVhIhFoxHgHqTvI2_S7IBP6TramMKC9BUHtka3O7icK-YnyCbzo-CgIuQkyStM4ifTXf7caxufDkB-k54w3ykGNiVyY_AshoGyu5cPEu6Lq3ki6EwSYD_CF3G_zDaKXQISlXTbWqIl__r6B_-ep-oQdZ6B8Kqr3Hq6rN8JD7-2Kb)
3. [theregister.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEqV6csvcGvi5S1HvQgm3NJ1GdDq03OaVICA9Ah5YhZePOqU-wPNLdjYXvALuztAYgRM1_zJ47UmFA4yp6zo80xbkhjnTbn_khJ7xpGTyg43t-FDHzkeoZY1kb7YgyA9_5qHX6A9aObg3Xdq1rECvuFZqpHXHvu)
4. [semianalysis.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHh5SZkIuk5rfeUONdM-WTW3_NmLuF06RJ24veQrwjXqBsfirCcWCeGc4DuwB5baN2okajh8gUCDSV0g1ijyBaXKJR32NN4T40PUqA0_i6FaiXYQ-1prDsFcQBER__XuxiwBx64j7s_1p4XJMNJxwUqz8Dd6sy2R2Q4d-seow==)
5. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEST4DdlSZ8xuGQiBjDNnUW--Roz1sEwf1IAV4oM2ervB6vsv2_s_tN5mDz3Ui-Got7_ceyceZnwIakUILXvqRrrC5WQe_L10QDFurGX_7OWi9whmZIql0MDo9xmhYI4ZOk69Wk-TNl5bTL8WTO7ubXSUuuAFmp0HD2CP8LO3Utug==)
6. [wikipedia.org](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHCfUa_Oa74gD1wK27OVTIlVAYxLYNVp-1usht-KIn8o-PdRcSIDwH75VmE7CQ8naMF3CW4eWth1_dTMzblT-2AL_ZCpYzUmxxiz6v5JtR4SGDdy63DgmlhGgMwrlplE7OmSJSIY3IjA1aL)
7. [arxiv.org](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEg2f34lWhIKWmVi4H5LUHrVrb6AacCEnUUgm8by4CPfIu5i1KCLIMIFTN9XNpKLRGfGJGGsfLrWwfLa2gJgnlTE1AZ62l5xjMmcmXbHWwgXEV1wHRSdQ==)
8. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHibmyMWoJjVbuLXBrZr8kMxYQJUKcwU_0xOFOSPz125ANBhDoXkZCUTXZYs2ujOkgL4wqXjKnnWprPfr7BtTUp6DB6QVlfBJh94MwrvvcHn5dKA-Oz8H4No0rUvmhnRA==)
9. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQF1QTo472wuH8VNJMWzZJ1dz5sSBlT3tnjBXmjV0va553UiA3IeYNl7szcoiUsUMhCdqwGrT956Y93O3IvYNe9Y5-LxI7YIg2OW4FtoL6uX_okFK7aS3ZciH_4y7lcMSmiXDGsNc3fD2cajlYCan7m3G-TojBDmSUaqHHbSDLk1qmAAc3AapBcI2eT4kibrQ10=)
10. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGL099_leGPPuf2KFIp2kRpFQTa1UeKE-wPgnqSacGkjuiOSPQxbojmS5aWYbpdx2ovXQ8RufjTittIyKKpQ55IamEc6R_SzigLK1Ka8Wh1B3B3JXgHI7IkvRTPZcFqCZU=)
11. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFy0FCn-Ae1nmGxMT0T1rQ0qWyDD8y3xDm826X4Hq_mpgASr3vcfi9hTpOpcPUcqEQTSwVt9R0jETli17LX2m3_p1Hty_L94sb-mfgFOnHYZMK1WiwI-6FiYkN8Aw7M5WrFLWkJqazaT_eeweD2A74piqTr-KEXA5_aPOZ7DEhevOfCn-BLdg==)
12. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQHDN__IItEKMeqSpuXwFRZzwOAY7jvfF9sUFeTkOi0ZjM_cUSaaoVbvGUKYkGYCcXb9m7zvy4Xi9ssorvE-7ESsLZZeFITuuVs3qPaM3tkjEaqDhAGy3hVQFdelhPjp80Aj-dSjGqAp5dgWDiTV3VMvP1itAHF_8r6doE9Atug2MVPITnxzuS4lbTT7lXm_vBuGPtEGVOjW4OGv-lrWQOd4)
13. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFVz8pupXEG74SctEDd0DwLG0IkYlnZC7iHTQ3H7mPWQKJwGJ01m57f782y3dFn55hog_6LvBp_aZN-poafG_aNKI9j7V9HeyC8rJe1ykspobYv2GFGEpLWXXdSTetibU-xGTII67xuBdc26p4dWA==)
14. [financialcontent.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGEcXxTPjRsv4FHQE_8nNbVG42mhETxRqIbmTrPjkqKpwb9pYzzp7l423sVejHfkt0PHiMgCa5v4UCPDCYmp3Moe3IUI0qJy0jZfOD8R4HTr3Chcg11Bd6684Z7pgxOQj5Rcfk0YIAwIV7pfKyYdlaaDSRAjArNAza1mlhSnCR348sEHpIzPyNJlor84-Fr2_og8AMc-SAC6tjDcg5Vs7e4bdVnPtd9FQqGGD8Mmnk-8RzYSEfhWeLuHRMlkgonaD620s9aVfAp3LKQGvN3rOEds3jQ4OYW7eLs)
15. [yolegroup.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFyPCeww61oo9nHYTFptbQA1O5WtQH-giDbgVUD4UizYliH8MAk_uLHSU8WFfRbbatvF6E1APbH5UXOKVbchziNXMirfBQQeRfKIzZf8tUgMFRdwgwasfu3T47rhPoouw_aER2rww5OsbSFbHIOe3dn423dcHVtrE14hvHWoWh8w3V2xmvJFK6na21eYwkxL2U=)
16. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGrTKY44XtewCQUvZCTKdvYKNdbakwyxugAmrTg0WKQ10Hr9Xbo4DBHabstuYgvANmnn6chyNS8MpxvmnUeFvaG82knGgRRZi0WKZmBPGANQuUh1Gw9h0dR1wXhAlrjqgG4XEMc_3rrbzG0XA0u7dJWMKYO_qS_qZ7oqtYvoOfPElprDIpt2KPpNOLP25cbniE=)
17. [google.com](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQFWLyulJNk0Lq0-ETv9x_lEuT393TRU6Mz44kkYeNgRAHQbvvJJhShHraSKMTdMsEr02syRhjFRZaqGOc2jpm6LApsuhpgt-r_GJ_ZBeHIwOrit)
18. [dev.to](https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQGUE5rWA8u76VDql4gKq9QbPKU9YsXFjJ__KnaFuABeXipGrRTtKe6XIGei8j8ou98Ue4qpdwXoqH2F-nzgov4zdttVYnPz3fLDJrI0Zd1tZN5OFI5f7zBIshvO-EsCsRjE74tpWrKJz4iMVtcCniY1M24izGemlA==)

For more details, see Deep Research.

Next Steps

The Interactions API supports many more complex workflows. Check the API Reference for details on:

Structured Outputs: Force the model to return valid JSON matching a specific schema.
Streaming: Stream token responses for real-time applications.
Thinking Models: Configure thinking_level for Gemini 2.5 and 3.0 models to handle complex reasoning.
Remote MCP: Connect Gemini to your own private MCP servers.
Combining tools and structured outputs: Combine tools and structured outputs to create more complex workflows.
File Uploads: Upload files to the model for processing.
Data Model: high level overview of the main inputs and outputs of the API.