Pdf Structured Outputs On Invoices And Forms
Copyright 2025 Google LLC.
1. Set up Environment and create inference Client
The first task is to install the google-genai Python SDK and obtain an API key. If you don”t have a can get one from Google AI Studio: Get a Gemini API key. If you are new to Google Colab checkout the quickstart).
Once you have the SDK and API key, you can create a client and define the model you are going to use the new Gemini Flash model, which is available via free tier with 1,500 request per day (at 2025-02-06).
Note: If you want to use Vertex AI see here how to create your client
2. Work with PDFs and other files
Gemini models are able to process images and videos, which can used with base64 strings or using the filesapi. After uploading the file you can include the file uri in the call directly. The Python API includes a upload and delete method.
For this example you have 2 PDFs samples, one basic invoice and on form with and written values.
You can now upload the files using our client with the upload method. Let's try this for one of the files.
Note: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but they cannot be downloaded. File uploads are available at no cost.
After a file is uploaded you can check to how many tokens it got converted. This not only help us understand the context you are working with it also helps to keep track of the cost.
File: invoice equals to 821 tokens
3. Structured outputs with Gemini 2.0 and Pydantic
Structured Outputs is a feature that ensures Gemini always generate responses that adhere to a predefined format, such as JSON Schema. This means you have more control over the output and how to integrate it into our application as it is guaranteed to return a valid JSON object with the schema you define.
Gemini 2.0 currenlty supports 3 dfferent types of how to define a JSON schemas:
- A single python type, as you would use in a typing annotation.
- A Pydantic BaseModel
- A dict equivalent of genai.types.Schema / Pydantic BaseModel
Lets look at quick text-based example.
{
"age": 0,
"first_name": "Philipp",
"last_name": "Schmid",
"work_topics": [
{
"name": "AI"
},
{
"name": "Gemini"
},
{
"name": "Gemma"
}
]
}
First name is Philipp
4. Extract Structured data from PDFs using Gemini 2.0
Now, let's combine the File API and structured output to extract information from our PDFs. You can create a simple method that accepts a local file path and a pydantic model and return the structured data for us. The method will:
- Upload the file to the File API
- Generate a structured response using the Gemini API
- Convert the response to the pydantic model and return it
In our Example every PDF is a different to each other. So you want to define unique Pydantic models for each PDF to show the performance of the Gemini 2.0. If you have very similar PDFs and want to extract the same information you can use the same model for all of them.
Invoice.pdf: Extract the invoice number, date and all list items with description, quantity and gross worth and the total gross worthhandwriting_form.pdf: Extract the form number, plan start date and the plan liabilities beginning of the year and end of the year
Note: Using Pydantic features you can add more context to the model to make it more accurate as well as some validation to the data. Adding a comprehensive description can significantly improve the performance of the model. Libraries like instructor added automatic retries based on validation errors, which can be a great help, but come at the cost of additional requests.
Invoice.pdf

<class '__main__.Invoice'> Extracted Invoice: 27301261 on 10/09/2012 with total gross worth 544.46 Item: Lilly Pulitzer dress Size 2 with quantity 5.0 and gross worth 247.5 Item: New ERIN Erin Fertherston Straight Dress White Sequence Lining Sleeveless SZ 10 with quantity 1.0 and gross worth 65.99 Item: Sequence dress Size Small with quantity 3.0 and gross worth 115.5 Item: fire los angeles dress Medium with quantity 3.0 and gross worth 21.45 Item: Eileen Fisher Women's Long Sleeve Fleece Lined Front Pockets Dress XS Gray with quantity 3.0 and gross worth 52.77 Item: Lularoe Nicole Dress Size Small Light Solid Grey/ White Ringer Tee Trim with quantity 2.0 and gross worth 8.25 Item: J.Crew Collection Black & White sweater Dress sz S with quantity 1.0 and gross worth 33.0
Fantastic! The model did a great job extracting the information from the invoice.
handwriting_form.pdf

Extracted Form Number: CA530082 with start date 02/05/2022. Plan liabilities beginning of the year 40000.0 and end of the year 55000.0
Learning more
If you want to learn more about the File API, Structured Outputs and how to use it to process images, audio, and video files, check out the following resources:
- Learn more about the File API with the quickstart.
- Learn more about prompting with media files in the docs, including the supported formats and maximum length.
- Learn more about Structured Outputs in the docs.
Property ordering with Gemini 2.0
Important: Gemini 2.0 models require explicit ordering of keys in structured output schemas. When working with Gemini 2.0, you must define the desired property ordering as a list within the propertyOrdering field as part of your schema configuration.
This ensures consistent and predictable ordering of properties in the JSON response, which is particularly important when the output property order matters for downstream processing.
{
"invoice_number": "12345",
"date": "2024-01-15",
"vendor": "Acme Corp",
"total_amount": 1250.00
}