Vision Rag Voyageai Claude
Vision RAG: Enabling Search on Any Documents
In this notebook, you will learn how to implement Vision RAG applications using Voyage AI's multimodal embedding models and Anthropic's vision-capable LLMs.
Vision RAG is especially useful for extracting information from complex documents like PDFs, slide decks, and figures. With Voyage AI’s multimodal embeddings, retrieval works on both text and images, including screenshots of those documents. Paired with vision-capable LLMs, this extends traditional text-based RAG into Vision RAG, enabling retrieval and reasoning that go beyond text alone.
To showcase the power of multimodal embeddings and vision-capable LLMs, we’ll extract rich image content containing text, figures, and diagrams from the latest GitHub Octoverse survey. You can find the survey here: GitHub Octoverse 2025.
If you want a full end-to-end example using MongoDB as a vector store and GCP for storing image files, see this tutorial: Building Multimodal AI Applications with MongoDB, Voyage AI, and Gemini.
Step 1: Install necessary libraries
First, we need to set up our Python environment. We will install the voyageai client for generating embeddings and the anthropic client for the generative model.
[notice] A new release of pip is available: 25.1.1 -> 25.3 [notice] To update, run: pip install --upgrade pip
Step 2: Initialize API clients
To interact with the models, you must initialize the client objects with your API keys. You will need a Voyage AI API key (for the voyage-multimodal-3 model) and an Anthropic API key (for claude-sonnet-4.5).
Note: It is best practice to use environment variables or a secret manager rather than hardcoding keys in production.
Step 3: Extract visual content
For this example, we will scrape charts and infographics directly from the GitHub Octoverse blog post. In a production setting, this step might involve converting PDF pages to images or processing a directory of PNGs. We’ll start by importing the standard utilities we need for web requests, image processing, and math operations.
Next, we define a helper function extract_image_urls to parse the article’s HTML and grab image links, filtering out small icons or logos.
Now let’s run the extraction on the specific URL.
Fetching infographic URLs from GitHub Octoverse article... URL: https://github.blog/news-insights/octoverse/octoverse-a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/ Found 38 images: 1. https://github.blog/wp-content/uploads/2024/06/AI-DarkMode-4.png?resize=800%2C425 2. https://github.blog/wp-content/uploads/2024/05/Enterprise-DarkMode-3.png?resize=800%2C425 3. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.42.45 AM.png?resize=800%2C425 4. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.43.47 AM.png?resize=800%2C425 5. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.44.23 AM.png?resize=800%2C425 6. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.46.23 AM.png?resize=800%2C425 7. https://github.blog/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-8.47.04 AM.png?resize=800%2C425 8. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-hero-image.png?resize=1600%2C850 9. https://github.blog/wp-content/uploads/2025/10/Octoverse-2025-top-level-metrics.png?resize=1440%2C810 10. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-year-of-record-growth.png?resize=1152%2C288 11. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-number-of-new-developers-on-github.png?resize=1728%2C972 12. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-80-percent-of-new-devs-use-copilot-in-week-one.png?resize=1440%2C810 13. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-developer-productivity-top-line-metrics.png?resize=1728%2C432 14. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-contributions-by-type.png?resize=1024%2C576 15. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-where-the-world-codes-top-line-metrics.png?resize=1728%2C432 16. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-on-github-1.jpeg?resize=1944%2C1094 17. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-projecting-the-top-developer-populations-2030.png?resize=1728%2C972 18. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-top-metrics.png?resize=1728%2C432 19. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-growth-metrics.png?resize=1584%2C891 20. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-open-source-projects-by-contributors.png?resize=1440%2C810 21. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-projects-by-contributors-2.png?resize=1728%2C972 22. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-projects-attracting-most-first-time-contributors.jpg?resize=1944%2C1094 23. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-by-contributors-contributions-open-source.png?resize=1024%2C576 24. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-security-top-metrics.png?resize=1728%2C432 25. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-dependabot-metrics.png?resize=1440%2C810 26. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-vulnerability-types-codeql.png?resize=1728%2C972 27. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-common-vulnerabilities-in-github-actions-codeql.png?resize=1728%2C972 28. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages-metrics.png?resize=1728%2C432 29. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages.png?resize=1728%2C972 30. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-js-ts-combined-usage.png?resize=1728%2C972 31. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-repos-built-in-the-last-12-months.png?resize=1728%2C972 32. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-ai-projects.png?resize=1440%2C810 33. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-code-environments-in-ai-projects.png?resize=1728%2C972 34. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-generative-ai-top-metrics.png?resize=1728%2C432 35. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-growth-in-number-of-contributors-to-genAI-projects.png?resize=1728%2C972 36. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-number-of-projects-using-genAI-model-SDKs.png?resize=1728%2C972 37. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-contributions-to-generative-AI-projects-by-country.png?resize=1728%2C972 38. https://github.blog/wp-content/uploads/2025/12/Colorway-2.jpg?resize=400%2C212
The scraping might still return general blog assets. To ensure high relevance, we will filter the list to only include images containing “octoverse-2025” in their URL, which targets the report’s charts.
Found 29 octoverse-2025 images: 1. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-hero-image.png?resize=1600%2C850 2. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-year-of-record-growth.png?resize=1152%2C288 3. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-number-of-new-developers-on-github.png?resize=1728%2C972 4. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-80-percent-of-new-devs-use-copilot-in-week-one.png?resize=1440%2C810 5. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-developer-productivity-top-line-metrics.png?resize=1728%2C432 6. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-contributions-by-type.png?resize=1024%2C576 7. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-where-the-world-codes-top-line-metrics.png?resize=1728%2C432 8. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-on-github-1.jpeg?resize=1944%2C1094 9. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-projecting-the-top-developer-populations-2030.png?resize=1728%2C972 10. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-top-metrics.png?resize=1728%2C432 11. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-growth-metrics.png?resize=1584%2C891 12. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-open-source-projects-by-contributors.png?resize=1440%2C810 13. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-fastest-growing-projects-by-contributors-2.png?resize=1728%2C972 14. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-open-source-projects-attracting-most-first-time-contributors.jpg?resize=1944%2C1094 15. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-10-countries-by-contributors-contributions-open-source.png?resize=1024%2C576 16. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-security-top-metrics.png?resize=1728%2C432 17. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-dependabot-metrics.png?resize=1440%2C810 18. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-vulnerability-types-codeql.png?resize=1728%2C972 19. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-common-vulnerabilities-in-github-actions-codeql.png?resize=1728%2C972 20. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages-metrics.png?resize=1728%2C432 21. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-top-programming-languages.png?resize=1728%2C972 22. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-js-ts-combined-usage.png?resize=1728%2C972 23. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-repos-built-in-the-last-12-months.png?resize=1728%2C972 24. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-languages-in-ai-projects.png?resize=1440%2C810 25. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-most-common-code-environments-in-ai-projects.png?resize=1728%2C972 26. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-generative-ai-top-metrics.png?resize=1728%2C432 27. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-growth-in-number-of-contributors-to-genAI-projects.png?resize=1728%2C972 28. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-number-of-projects-using-genAI-model-SDKs.png?resize=1728%2C972 29. https://github.blog/wp-content/uploads/2025/10/octoverse-2025-total-contributions-to-generative-AI-projects-by-country.png?resize=1728%2C972
Step 4: Build the multimodal index
This is the core indexing step. We loop through our filtered URLs, download the images locally, and then pass them to Voyage AI’s voyage-multimodal-3 model. This model converts the visual content into a dense vector embedding.
First 3 embeddings (out of 38): [[ 0.01434326 -0.03149414 0.00759888 ... -0.00411987 -0.00897217 -0.02502441] [ 0.01721191 0.00854492 0.01452637 ... 0.0123291 -0.05688477 0.02380371] [ 0.01525879 -0.00958252 0.04370117 ... 0.01385498 -0.02563477 -0.00300598]]
Step 5: Define RAG components
We need three specific capabilities to make our RAG pipeline work:
- Image Encoding: Converting images to base64 so they can be sent to the Anthropic API
- Vector Retrieval: Searching our array of embeddings to find the image most semantically similar to the user’s text query
- Generation: Sending the retrieved image and the user’s query to a VLM to get a natural language answer.
Let’s define helper functions for each.
Step 6: Combine the components into a complete pipeline
We can now wrap these steps into a single entry point, vision_rag. This function accepts a user query, performs the retrieval to find the correct chart, displays it, and then answers the question.
Step 7: Run queries
Let's test our pipeline. We will ask a specific question about developer communities. The system should identify the correct infographic from the report and read the data directly from it.
Query: What countries has the biggest developer communities? Most relevant image: img/octoverse-2025-top-10-countries-on-github-1.jpeg
Based on the GitHub developer population data from 2020-2025, the countries with the biggest developer communities are: 1. **United States** - 28M developers (steady at #1) 2. **India** - 21.9M developers (moved up 1 spot) 3. **China** - 10.7M developers (dropped 1 spot) These three countries have significantly larger developer populations than the rest, with the US having the largest community, followed by India which showed substantial growth (34.36% CAGR), and China in third place.
Now we can try a quantitative question regarding open-source repositories.
Query: How many open source repositories are there in 2025? Most relevant image: img/octoverse-2025-open-source-growth-metrics.png
According to the image, in 2025 there are **395M public and open source repositories**.
And finally, a ranking question about programming languages.
Query: What are the top programming languages? Most relevant image: img/octoverse-2025-top-programming-languages.png
Based on the GitHub data from 2023-2025 shown in the image, the top 10 programming languages are: 1. **JavaScript** 2. **Python** 3. **TypeScript** 4. **Java** 5. **C#** 6. **PHP** 7. **C++** 8. **Shell** 9. **C** 10. **Go** Notable observations from the visualization: - JavaScript, Python, and TypeScript switch positions at the top throughout this period - JavaScript starts at #1, drops to #2, then falls to #3 by 2025 - Python moves from #2 to #1 and back to #2 - TypeScript rises from #3 to eventually claim the #1 spot by August 2025 - Languages ranked #4-6 (Java, C#, PHP) remain relatively stable - The bottom positions (#7-10) show more movement, with languages like Go, HCL, Shell, C++, and C shifting positions
Query: Whats the top programming of 2025 in terms of contributors? Most relevant image: img/octoverse-2025-top-programming-languages-metrics.png
Based on the image, **TypeScript** is the top programming language in terms of contributors in 2025, with **+1M contributors** and a growth rate of **+66% YOY**, which allowed it to overtake both Python and JavaScript.