Caching REST

quickstartsrestgemini-cookbookgemini-apigemini
Copyright 2025 Google LLC.
[44]

Gemini API: Caching Quickstart with REST

This notebook introduces context caching with the Gemini API and provides examples of interacting with the Apollo 11 transcript using the Python SDK. Context caching is a way to save on requests costs when a substantial initial context is referenced repeatedly by shorter requests. It will use curl commands to call the methods in the REST API.

For a more comprehensive look, check out the caching guide.

This notebook contains curl commands you can run in Google Colab, or copy to your terminal. If you have never used the Gemini REST API, it is strongly recommended to start with the Prompting quickstart first.

Authentication

To run this notebook, your API key must be stored it in a Colab Secret named GOOGLE_API_KEY. If you are running in a different environment, you can store your key in an environment variable. See Authentication to learn more.

[45]

Caching content

Let's start by getting the transcript from the Apollo 11 mission.

[47]
--2024-07-11 17:55:31--  https://storage.googleapis.com/generativeai-downloads/data/a11.txt
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.199.207, 74.125.20.207, 108.177.98.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.199.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 847790 (828K) [text/plain]
Saving to: ‘a11.txt.2’

a11.txt.2           100%[===================>] 827.92K  --.-KB/s    in 0.008s  

2024-07-11 17:55:31 (103 MB/s) - ‘a11.txt.2’ saved [847790/847790]

Now you need to reencode it to base-64, so let's prepare the whole cachedContent while you're at it.

[48]

We can now create the cached content.

[59]
{
  "name": "cachedContents/lf0nt062ulc1",
  "model": "models/gemini-2.5-flash",
  "createTime": "2024-07-11T18:02:48.257891Z",
  "updateTime": "2024-07-11T18:02:48.257891Z",
  "expireTime": "2024-07-11T18:07:47.635193373Z",
  "displayName": "",
  "usageMetadata": {
    "totalTokenCount": 323383
  }
}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1104k    0   307  100 1104k    138   499k  0:00:02  0:00:02 --:--:--  499k

You will need it for the next commands so save the name of the cache.

You're using a text file to save the name here beacuse of colab constrainsts but you could also simply use a variable.

[57]
cachedContents/qidqwuaxdqz4

Listing caches

Since caches have a reccuring cost it's a good idea to keep an eye on them. It can also be useful if you need to find their name.

[60]
{
  "cachedContents": [
    {
      "name": "cachedContents/lf0nt062ulc1",
      "model": "models/gemini-2.5-flash",
      "createTime": "2024-07-11T18:02:48.257891Z",
      "updateTime": "2024-07-11T18:02:48.257891Z",
      "expireTime": "2024-07-11T18:07:47.635193373Z",
      "displayName": "",
      "usageMetadata": {
        "totalTokenCount": 323383
      }
    },
    {
      "name": "cachedContents/qidqwuaxdqz4",
      "model": "models/gemini-2.5-flash",
      "createTime": "2024-07-11T18:02:30.516233Z",
      "updateTime": "2024-07-11T18:02:30.516233Z",
      "expireTime": "2024-07-11T18:07:29.803448998Z",
      "displayName": "",
      "usageMetadata": {
        "totalTokenCount": 323383
      }
    }
  ]
}

Using cached content when prompting

Prompting using cached content is the same as what is illustrated in the Prompting quickstart except you're adding a "cachedContent" value which is the name of the cache you saved earlier.

[51]
{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "This is a transcript of the air-to-ground voice transmission between the Apollo 11 crew and Mission Control during their journey to the moon and back. The transcript covers the period from the launch of the Saturn V rocket until the splashdown of the command module in the Pacific Ocean. \n\nThe transcript documents a fascinating conversation between the astronauts and Mission Control, highlighting the various activities of the mission. It covers:\n\n* **Launch and Trans-Lunar Injection:** The launch sequence, staging of the rocket, and the critical Trans-Lunar Injection (TLI) maneuver that sent the Apollo 11 spacecraft towards the moon.\n* **Orbit and Docking:** The crew's actions in earth orbit, their successful docking with the Lunar Module (LM), and the transfer of the LM to the docking port of the command module. \n* **Lunar Surface Preparations:** Communications checks, pre-flight preparations for the Lunar Module, and the successful ejection of the LM from the command module. \n* **Lunar Orbit Insertion:** The burn that sent the spacecraft into lunar orbit and the subsequent activities in lunar orbit. \n* **Lunar Landing and EVA:** The descent of the LM to the surface of the moon, Neil Armstrong’s famous first steps, and the Lunar surface activities. \n* **Lunar Ascent:** The launch of the LM back into lunar orbit and the docking with the command module. \n* **Trans-Earth Injection:** The burn that sent the spacecraft on its return journey to Earth. \n* **Earth Entry and Splashdown:** The re-entry of the command module into the Earth’s atmosphere, the deployment of parachutes, and the splashdown in the Pacific Ocean.\n\nThe transcript provides valuable insight into the complexity and meticulous planning that went into the Apollo 11 mission. It showcases the close communication and coordination between the crew and Mission Control, and the dedication of the many individuals who made this historic mission possible. \n"
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 323388,
    "candidatesTokenCount": 397,
    "totalTokenCount": 323785,
    "cachedContentTokenCount": 323383
  }
}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3042    0  2817  100   225    117      9  0:00:25  0:00:24  0:00:01   580

As you can see, among the 323699 tokens, 323383 were cached (and thus less expensive) and only 311 were from the prompt.

Since the cached tokens are cheaper than the normal ones, it means this prompt was 75% cheaper that if you had not used caching. Check the pricing here for the up-to-date discount on cached tokens.

Optional: Updating a cache

If you need to update a cache, to chance its content, or just extend its longevity, just use PATCH:

[62]
{
  "name": "cachedContents/qidqwuaxdqz4",
  "model": "models/gemini-2.5-flash",
  "createTime": "2024-07-11T18:02:30.516233Z",
  "updateTime": "2024-07-11T18:05:38.781423Z",
  "expireTime": "2024-07-11T18:10:38.759996261Z",
  "displayName": "",
  "usageMetadata": {
    "totalTokenCount": 323383
  }
}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   322    0   307  100    15    822     40 --:--:-- --:--:-- --:--:--   863

Deleting cached content

The cache has a small recurring storage cost (cf. pricing) so by default it is only saved for an hour. In this case you even set it up for a shorter amont of time (using "ttl") of 10mn.

Still, if you don't need you cache anymore, it is good practice to delete it proactively.

[63]
{}

Next Steps

Useful API references:

If you want to know more about the caching REST APIs, you can check the full API specifications and the caching documentation.

Continue your discovery of the Gemini API

Check the File API notebook to know more about that API. The vision capabilities of the Gemini API are a good reason to use the File API and the caching. The Gemini API also has configurable safety settings that you might have to customize when dealing with big files.