Notebooks
M
MongoDB
Vision Rag Voyageai Claude

Vision Rag Voyageai Claude

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

Vision RAG: Enabling Search on Any Documents

In this notebook, you will learn how to implement Vision RAG applications using Voyage AI's multimodal embedding models and Anthropic's vision-capable LLMs.

Vision RAG is especially useful for extracting information from complex documents like PDFs, slide decks, and figures. With Voyage AI’s multimodal embeddings, retrieval works on both text and images, including screenshots of those documents. Paired with vision-capable LLMs, this extends traditional text-based RAG into Vision RAG, enabling retrieval and reasoning that go beyond text alone.

To showcase the power of multimodal embeddings and vision-capable LLMs, we’ll extract rich image content containing text, figures, and diagrams from the latest GitHub Octoverse survey. You can find the survey here: GitHub Octoverse 2025.

Open In Colab

If you want a full end-to-end example using MongoDB as a vector store and GCP for storing image files, see this tutorial: Building Multimodal AI Applications with MongoDB, Voyage AI, and Gemini.

Step 1: Install necessary libraries

First, we need to set up our Python environment. We will install the voyageai client for generating embeddings and the anthropic client for the generative model.

[1]

[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: pip install --upgrade pip

Step 2: Initialize API clients

To interact with the models, you must initialize the client objects with your API keys. You will need a Voyage AI API key (for the voyage-multimodal-3 model) and an Anthropic API key (for claude-sonnet-4.5).

Note: It is best practice to use environment variables or a secret manager rather than hardcoding keys in production.

[15]

Step 3: Extract visual content

For this example, we will scrape charts and infographics directly from the GitHub Octoverse blog post. In a production setting, this step might involve converting PDF pages to images or processing a directory of PNGs. We’ll start by importing the standard utilities we need for web requests, image processing, and math operations.

[3]

Next, we define a helper function extract_image_urls to parse the article’s HTML and grab image links, filtering out small icons or logos.

[4]

Now let’s run the extraction on the specific URL.

[5]
Fetching infographic URLs from GitHub Octoverse article...
URL: https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/

Found 38 images:

1. https://github.blog/wp-content/uploads/2024/06/AI-DarkMode-4.png?resize=800%2C425
2. https://github.blog/wp-content/uploads/2024/05/Enterprise-DarkMode-3.png?resize=800%2C425
3. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.42.45 AM.png?resize=800%2C425
4. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.43.47 AM.png?resize=800%2C425
5. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.44.23 AM.png?resize=800%2C425
6. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.46.23 AM.png?resize=800%2C425
7. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.47.04 AM.png?resize=800%2C425
8. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-hero-image.png?resize=1600%2C850
9. https://github.blog/wp-content/uploads/2025/10/Octoverse-2025-top-level-metrics.png?resize=1440%2C810
10. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-year-of-record-growth.png?resize=1152%2C288
11. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-number-of-new-developers-on-github.png?resize=1728%2C972
12. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-80-percent-of-new-devs-use-copilot-in-week-one.png?resize=1440%2C810
13. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-developer-productivity-top-line-metrics.png?resize=1728%2C432
14. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-contributions-by-type.png?resize=1024%2C576
15. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-where-the-world-codes-top-line-metrics.png?resize=1728%2C432
16. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-on-github-1.jpeg?resize=1944%2C1094
17. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-projecting-the-top-developer-populations-2030.png?resize=1728%2C972
18. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-top-metrics.png?resize=1728%2C432
19. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-growth-metrics.png?resize=1584%2C891
20. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-open-source-projects-by-contributors.png?resize=1440%2C810
21. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-projects-by-contributors-2.png?resize=1728%2C972
22. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-projects-attracting-most-first-time-contributors.jpg?resize=1944%2C1094
23. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-by-contributors-contributions-open-source.png?resize=1024%2C576
24. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-security-top-metrics.png?resize=1728%2C432
25. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-dependabot-metrics.png?resize=1440%2C810
26. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-vulnerability-types-codeql.png?resize=1728%2C972
27. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-common-vulnerabilities-in-github-actions-codeql.png?resize=1728%2C972
28. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages-metrics.png?resize=1728%2C432
29. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages.png?resize=1728%2C972
30. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-js-ts-combined-usage.png?resize=1728%2C972
31. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-repos-built-in-the-last-12-months.png?resize=1728%2C972
32. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-ai-projects.png?resize=1440%2C810
33. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-code-environments-in-ai-projects.png?resize=1728%2C972
34. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-generative-ai-top-metrics.png?resize=1728%2C432
35. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-growth-in-number-of-contributors-to-genAI-projects.png?resize=1728%2C972
36. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-number-of-projects-using-genAI-model-SDKs.png?resize=1728%2C972
37. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-contributions-to-generative-AI-projects-by-country.png?resize=1728%2C972
38. https://github.blog/wp-content/uploads/2025/12/Colorway-2.jpg?resize=400%2C212

The scraping might still return general blog assets. To ensure high relevance, we will filter the list to only include images containing “octoverse-2025” in their URL, which targets the report’s charts.

[6]
Found 29 octoverse-2025 images:

1. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-hero-image.png?resize=1600%2C850
2. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-year-of-record-growth.png?resize=1152%2C288
3. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-number-of-new-developers-on-github.png?resize=1728%2C972
4. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-80-percent-of-new-devs-use-copilot-in-week-one.png?resize=1440%2C810
5. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-developer-productivity-top-line-metrics.png?resize=1728%2C432
6. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-contributions-by-type.png?resize=1024%2C576
7. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-where-the-world-codes-top-line-metrics.png?resize=1728%2C432
8. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-on-github-1.jpeg?resize=1944%2C1094
9. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-projecting-the-top-developer-populations-2030.png?resize=1728%2C972
10. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-top-metrics.png?resize=1728%2C432
11. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-growth-metrics.png?resize=1584%2C891
12. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-open-source-projects-by-contributors.png?resize=1440%2C810
13. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-projects-by-contributors-2.png?resize=1728%2C972
14. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-projects-attracting-most-first-time-contributors.jpg?resize=1944%2C1094
15. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-by-contributors-contributions-open-source.png?resize=1024%2C576
16. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-security-top-metrics.png?resize=1728%2C432
17. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-dependabot-metrics.png?resize=1440%2C810
18. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-vulnerability-types-codeql.png?resize=1728%2C972
19. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-common-vulnerabilities-in-github-actions-codeql.png?resize=1728%2C972
20. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages-metrics.png?resize=1728%2C432
21. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages.png?resize=1728%2C972
22. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-js-ts-combined-usage.png?resize=1728%2C972
23. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-repos-built-in-the-last-12-months.png?resize=1728%2C972
24. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-ai-projects.png?resize=1440%2C810
25. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-code-environments-in-ai-projects.png?resize=1728%2C972
26. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-generative-ai-top-metrics.png?resize=1728%2C432
27. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-growth-in-number-of-contributors-to-genAI-projects.png?resize=1728%2C972
28. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-number-of-projects-using-genAI-model-SDKs.png?resize=1728%2C972
29. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-contributions-to-generative-AI-projects-by-country.png?resize=1728%2C972

Step 4: Build the multimodal index

This is the core indexing step. We loop through our filtered URLs, download the images locally, and then pass them to Voyage AI’s voyage-multimodal-3 model. This model converts the visual content into a dense vector embedding.

[7]
First 3 embeddings (out of 38):
[[ 0.01434326 -0.03149414  0.00759888 ... -0.00411987 -0.00897217
  -0.02502441]
 [ 0.01721191  0.00854492  0.01452637 ...  0.0123291  -0.05688477
   0.02380371]
 [ 0.01525879 -0.00958252  0.04370117 ...  0.01385498 -0.02563477
  -0.00300598]]

Step 5: Define RAG components

We need three specific capabilities to make our RAG pipeline work:

  • Image Encoding: Converting images to base64 so they can be sent to the Anthropic API
  • Vector Retrieval: Searching our array of embeddings to find the image most semantically similar to the user’s text query
  • Generation: Sending the retrieved image and the user’s query to a VLM to get a natural language answer.

Let’s define helper functions for each.

[8]

Step 6: Combine the components into a complete pipeline

We can now wrap these steps into a single entry point, vision_rag. This function accepts a user query, performs the retrieval to find the correct chart, displays it, and then answers the question.

[9]

Step 7: Run queries

Let's test our pipeline. We will ask a specific question about developer communities. The system should identify the correct infographic from the report and read the data directly from it.

[10]
Query: What countries has the biggest developer communities?
Most relevant image: img/octoverse-2025-top-10-countries-on-github-1.jpeg
Output
Based on the GitHub developer population data from 2020-2025, the countries with the biggest developer communities are:

1. **United States** - 28M developers (steady at #1)
2. **India** - 21.9M developers (moved up 1 spot)
3. **China** - 10.7M developers (dropped 1 spot)

These three countries have significantly larger developer populations than the rest, with the US having the largest community, followed by India which showed substantial growth (34.36% CAGR), and China in third place.

Now we can try a quantitative question regarding open-source repositories.

[11]
Query: How many open source repositories are there in 2025?
Most relevant image: img/octoverse-2025-open-source-growth-metrics.png
Output
According to the image, in 2025 there are **395M public and open source repositories**.

And finally, a ranking question about programming languages.

[12]
Query: What are the top programming languages?
Most relevant image: img/octoverse-2025-top-programming-languages.png
Output
Based on the GitHub data from 2023-2025 shown in the image, the top 10 programming languages are:

1. **JavaScript**
2. **Python**
3. **TypeScript**
4. **Java**
5. **C#**
6. **PHP**
7. **C++**
8. **Shell**
9. **C**
10. **Go**

Notable observations from the visualization:
- JavaScript, Python, and TypeScript switch positions at the top throughout this period
- JavaScript starts at #1, drops to #2, then falls to #3 by 2025
- Python moves from #2 to #1 and back to #2
- TypeScript rises from #3 to eventually claim the #1 spot by August 2025
- Languages ranked #4-6 (Java, C#, PHP) remain relatively stable
- The bottom positions (#7-10) show more movement, with languages like Go, HCL, Shell, C++, and C shifting positions
[13]
Query: Whats the top programming of 2025 in terms of contributors?
Most relevant image: img/octoverse-2025-top-programming-languages-metrics.png
Output
Based on the image, **TypeScript** is the top programming language in terms of contributors in 2025, with **+1M contributors** and a growth rate of **+66% YOY**, which allowed it to overtake both Python and JavaScript.