Notebooks
N
NVIDIA
Llama Nemotron VL Nano 8B

Llama Nemotron VL Nano 8B

gpu-accelerationretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-exampleslarge-language-modelsmicroservicetriton-inference-servernemotronLLMllama_3.1_nemotron_nano_VL_8Bragnemovlm

 Click here to deploy. Open In Colab

[ ]

Llama Nemotron Nano VL 8B - A Simple Notebook Walkthrough!

Explore NVIDIA's 8B Vision Language Model capable of OCR and Document Understanding. You can use this notebook to try out Llama Nemotron Nano VL model - hosted on build.nvidia.com.

All you need to run this model is an NVIDIA API KEY, which you can find on the model page linked above by clicking on "Get API Key". Make sure to login or sign up!

image

NOTE: Don't worry about any credits to use this model, although there is a rate limit of 40 requests per minute.

In case, you are running this notebook in a local environment, let's ensure we install all the required libraries to run this notebook.

[ ]
Requirement already satisfied: pyarrow in /usr/local/lib/python3.11/dist-packages (18.1.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (3.10.0)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (2.2.2)
Requirement already satisfied: openai in /usr/local/lib/python3.11/dist-packages (1.82.1)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (2.32.3)
Requirement already satisfied: Pillow in /usr/local/lib/python3.11/dist-packages (11.2.1)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (4.58.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (1.4.8)
Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.0.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (24.2)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (3.2.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas) (2025.2)
Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from openai) (4.9.0)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from openai) (1.9.0)
Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from openai) (0.28.1)
Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from openai) (0.10.0)
Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.11/dist-packages (from openai) (2.11.5)
Requirement already satisfied: sniffio in /usr/local/lib/python3.11/dist-packages (from openai) (1.3.1)
Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.11/dist-packages (from openai) (4.67.1)
Requirement already satisfied: typing-extensions<5,>=4.11 in /usr/local/lib/python3.11/dist-packages (from openai) (4.13.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests) (2.4.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests) (2025.4.26)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx<1,>=0.23.0->openai) (1.0.9)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.11/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.16.0)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->openai) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->openai) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=1.9.0->openai) (0.4.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.7->matplotlib) (1.17.0)

NOTE: Please restart the kernel after installation to avoid import errors.

[ ]
Provide NVDEV API Key: ··········

Set up OpenAI Client

First, we'll need to point our OpenAI client to build.nvidia.com API

[ ]

Let's also create a function that calls Llama Nemotron Nano VL. This function takes image(s) and text prompt as input, and gives out the model response.

We will have to combine the text prompt along with the images to craft a prompt to the model. Dive deeper into the API Reference here.

Best Practice: Order of items in the messages dictionary matter. First put your images, and then your text prompt.

[ ]

Invoice/Receipt Understanding

We are going to demonstrate the capability of this small VLM on an invoice. This model shines with it's OCR capabilities. Let's grab an invoice from a dataset on HuggingFace katanaml-org/invoices-donut-data-v1. We are going to pull an invoice image from the test set of this dataset.

[ ]
--2025-06-06 00:09:50--  https://huggingface.co/datasets/katanaml-org/invoices-donut-data-v1/resolve/main/data/test-00000-of-00001-56af6bd5ff7eb34d.parquet
Resolving huggingface.co (huggingface.co)... 3.166.152.44, 3.166.152.65, 3.166.152.110, ...
Connecting to huggingface.co (huggingface.co)|3.166.152.44|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.hf.co/repos/8b/53/8b532de81ab76db9001c6481db8bba0d5b5ec4539ffe0aaebec9bcfafdadaba2/712a9a65000d6a2cf04d41bc794843ca30452bb87473951b1adafd3610eab08b?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27test-00000-of-00001-56af6bd5ff7eb34d.parquet%3B+filename%3D%22test-00000-of-00001-56af6bd5ff7eb34d.parquet%22%3B&Expires=1749172190&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0OTE3MjE5MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy84Yi81My84YjUzMmRlODFhYjc2ZGI5MDAxYzY0ODFkYjhiYmEwZDViNWVjNDUzOWZmZTBhYWViZWM5YmNmYWZkYWRhYmEyLzcxMmE5YTY1MDAwZDZhMmNmMDRkNDFiYzc5NDg0M2NhMzA0NTJiYjg3NDczOTUxYjFhZGFmZDM2MTBlYWIwOGI%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=kzODCOTkdIdrBfnxCar47ay8Ss7OSkSDyAKa29cHvikoBlP6IxLwtU6q7B05mrnzJ6FAuzXJM57Vd0kwJ979wl3CeAS8kBILd766cUnAgkQGOGWr2CwU3KJpq6jcSJsb9jbwXp0R5XqjdcC4ec7NI85WaoqD%7Ex8UCsl3CCfy%7EhjbLELAJABSuCtqMKz4lFx97WODnCHPG6vpvZTUtQkmWf2v%7EPIXMeh8y8a3jO-N8wm0Qp%7EF8f0ZjcpzCxIeC8r7q37mwlZ10WS19ToPvDknwxpeibzh5beo-GfJzu5u7c6pMXAirvRD1nOmXuJkSCd9RHG8LyvU4OFyK12BEVNDyg__&Key-Pair-Id=K3RPWS32NSSJCE [following]
--2025-06-06 00:09:50--  https://cdn-lfs.hf.co/repos/8b/53/8b532de81ab76db9001c6481db8bba0d5b5ec4539ffe0aaebec9bcfafdadaba2/712a9a65000d6a2cf04d41bc794843ca30452bb87473951b1adafd3610eab08b?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27test-00000-of-00001-56af6bd5ff7eb34d.parquet%3B+filename%3D%22test-00000-of-00001-56af6bd5ff7eb34d.parquet%22%3B&Expires=1749172190&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0OTE3MjE5MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9yZXBvcy84Yi81My84YjUzMmRlODFhYjc2ZGI5MDAxYzY0ODFkYjhiYmEwZDViNWVjNDUzOWZmZTBhYWViZWM5YmNmYWZkYWRhYmEyLzcxMmE5YTY1MDAwZDZhMmNmMDRkNDFiYzc5NDg0M2NhMzA0NTJiYjg3NDczOTUxYjFhZGFmZDM2MTBlYWIwOGI%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=kzODCOTkdIdrBfnxCar47ay8Ss7OSkSDyAKa29cHvikoBlP6IxLwtU6q7B05mrnzJ6FAuzXJM57Vd0kwJ979wl3CeAS8kBILd766cUnAgkQGOGWr2CwU3KJpq6jcSJsb9jbwXp0R5XqjdcC4ec7NI85WaoqD%7Ex8UCsl3CCfy%7EhjbLELAJABSuCtqMKz4lFx97WODnCHPG6vpvZTUtQkmWf2v%7EPIXMeh8y8a3jO-N8wm0Qp%7EF8f0ZjcpzCxIeC8r7q37mwlZ10WS19ToPvDknwxpeibzh5beo-GfJzu5u7c6pMXAirvRD1nOmXuJkSCd9RHG8LyvU4OFyK12BEVNDyg__&Key-Pair-Id=K3RPWS32NSSJCE
Resolving cdn-lfs.hf.co (cdn-lfs.hf.co)... 99.84.252.15, 99.84.252.38, 99.84.252.37, ...
Connecting to cdn-lfs.hf.co (cdn-lfs.hf.co)|99.84.252.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10441675 (10.0M) [binary/octet-stream]
Saving to: ‘test-00000-of-00001-56af6bd5ff7eb34d.parquet.1’

test-00000-of-00001 100%[===================>]   9.96M  --.-KB/s    in 0.1s    

2025-06-06 00:09:50 (71.5 MB/s) - ‘test-00000-of-00001-56af6bd5ff7eb34d.parquet.1’ saved [10441675/10441675]

Let's select a random image from the data.

[ ]
Output

Starting with simple transcription. But before that we also need to convert our image to base64 encoded string.

Best Practice: When images have tables, ask the model to extract in LaTeX format for an accurate full page OCR.

[ ]

We are now ready to prompt the model.

[ ]
Invoice no: 26343874

Date of issue: 07/17/2013

Seller:

Smith Ltd 290 Jodi Gardens Charlesview, TN 10958

Tax Id: 916-97-6743 IBAN: GB78TZMX55978564099007

Client:

Herring-Floyd 8113 Hansen Cliff Apt. 826 Port Alice, ID 99663

Tax Id: 913-80-4636

ITEMS

SUMMARY

\begin{tabular}{cccccccc} **No.** & **Description** & **Qty** & **UM** & **Net price** & **Net worth** & **VAT [%]** & **Gross worth**\\ 1. & Microsoft Xbox One X 1TB 4k Ultra HD Console black controller works great 1787 & 5,00 & each & 249,00 & 1 245,00 & 10% & 1 369,50\\ 2. & Cables 500gb,Sony PlayStation 4 PS4 CUH 1215A Jet Black Console w/ Controller & 5,00 & each & 100,00 & 500,00 & 10% & 550,00\\ 3. & PS2 Slim Console System SCPH-77000 PINK Playstation 2 Tested & 4,00 & each & 252,00 & 1 008,00 & 10% & 1 108,80\\ 4. & Nintendo Game Boy Micro Console - Black (OXYSFBB) & 2,00 & each & 130,00 & 260,00 & 10% & 286,00\\ 5. & Nintendo GameCube System Console Bundle Mario Sunshine + Genuine controller LOT & 1,00 & each & 159,99 & 159,99 & 10% & 175,99\\ \end{tabular}

\begin{tabular}{ccccc}  & VAT [%] & Net worth & VAT & Gross worth\\  & 10% & 3 172,99 & 317,30 & 3 490,29\\ Total &  & $ 3 172,99 & $ 317,30 & $ 3 490,29\\ \end{tabular}

You can now transcribe invoices for quick extraction of content. This model is also capable at answering questions on an image.

Let's ask a question on the invoice to get a Monetary insight.

Best Practice: Be concise and Don't be vague on your ask.

[ ]
No
[ ]
10%

You can ask a question to do line-level Item Analysis.

[ ]
5

Try to prompt for Entity Detection!

[ ]
No, there are no visible logos or branding that indicate a company identity in the provided image.
[ ]
Based on the image, there are no visible handwritten or stamped elements on the invoice. All text appears to be printed, with no signs of alterations or additional markings that would indicate hand-written or stamped content.

This time, let's grab an example image from the web. Again, first we need to convert the image to a base64-encoded string.

[ ]
Output

Starting with simple transcription.

[ ]
Invoice vs. Receipt

Receipt Example

East Repair Inc.
1912 Harvest Lane
New York, NY 12210

BILL TO
John Smith
2 Court Square
New York, NY 12210

SHIP TO
John Smith
3787 Pineview Drive
Cambridge, MA 12210

RECEIPT #
RECEIPT DATE
P.O.#
US-001
11/02/2019
2312/2019

Transaction Date

Receipt Total

$145.00

QTY
DESCRIPTION
UNIT PRICE
AMOUNT

1
Front and rear brake cables
100.00
100.00

2
New set of pedal arms
15.00
30.00

3
Labor 3hrs
5.00
15.00

Total
145.00

Invoice Example

Bill to:
John Smith
442 Swanssea Street Denver, CO 80303
United States

Invoice:
Invoice Date:
Due Date:

03.01.2021
03.15.2021

Description
Quantity
Unit
Price
Amount

Consultation
1
each
150.00
150.00

Discount 10%
Subtotal without Tax
Total USD

-15.00
150.00
135.00

Amount Paid
0.00

Amount Due (USD)
135.00

Terms & Conditions
10% new customer discount.
Please pay the invoice within 14 days via the payment link below.

Let's prompt to test the model's ability to understand different tables in an image.

[ ]
The total amount paid on the receipt is $145, and on the invoice is $135.

It accurately pulled and mapped the totals to receipt and invoice.

Let's ask an extractive question.

[ ]
The invoice was issued on 03.01.2021.

Llama Nemotron VL is also good at text grounding, accurately telling where text is located in the image.

[ ]
The terms and conditions are written at the bottom of the invoice example.

##📌 Conclusion

In this notebook, we explored the capabilities of Llama Nemotron Nano VL 8B, a lightweight vision-laguage model developed by NVIDIA, through invoice and document understanding tasks.

Here is a quick summary of what we achieved:

  • ✅ Performed OCR-based transcription with strong layout awareness and LaTeX formatting.
  • 💬 Asked semantic questions about discounts, tax rates, and itemized billing.
  • 🔍 Evaluated the model's ability to detect entities and text grounding abilities.

Llama Nemotron Nano VL 8B proves to be an efficient and practical tool for intelligent document processing at scale. It is especially useful for developers looking for affordable and responsive models with OCR + question/answering capabilities.