Notebooks
d
deepset
Llama32 Agentic Rag

Llama32 Agentic Rag

agentic-aiagenticagentsgenaiAIhaystack-cookbookgenai-usecaseshaystack-ainotebooksPythonragai-tools

πŸ•΅πŸ» Agentic RAG with πŸ¦™ Llama 3.2 3B

Β Β Β Β Β Β  Β Β Β Β Β Β 

In their Llama 3.2 collection, Meta released two small yet powerful Language Models.

In this notebook, we'll use the 3B model to build an Agentic Retrieval Augmented Generation application.

🎯 Our goal is to create a system that answers questions using a knowledge base focused on the Seven Wonders of the Ancient World. If the retrieved documents don't contain the answer, the application will fall back to web search for additional context.

Stack:

Setup

[ ]

Create our knowledge base

In this section, we download a dataset on the Seven Wonders of the Ancient World, enrich each document with a semantic vector and store the documents in an in-memory database.

To better understand this process, you can explore to the introductory Haystack tutorial.

[ ]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Batches:   0%|          | 0/5 [00:00<?, ?it/s]
151

Load and try Llama 3.2

We will use Hugging Face Transformers to load the model on a Colab.

There are plenty of other options to use open models on Haystack, including for example Ollama for local inference or serving with Groq.

(πŸ“• Choosing the Right Generator).

Authorization

[ ]
Your Hugging Face tokenΒ·Β·Β·Β·Β·Β·Β·Β·Β·Β·
[ ]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
[ ]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
{'replies': ['\n\nThe capital of France is Paris.']}

Build the πŸ•΅πŸ» Agentic RAG Pipeline

Here's the idea πŸ‘‡

  • Perform a vector search on our knowledge base using the query.
  • Pass the top 5 documents to Llama, injected in a specific prompt
  • In the prompt, instruct the model to reply with "no_answer" if it cannot infer the answer from the documents; otherwise, provide the answer.
  • If "no_answer" is returned, run a web search and inject the results into a new prompt.
  • Let Llama generate a final answer based on the web search results.

For a detailed explanation of a similar use case, take a look at this tutorial: Building Fallbacks to Websearch with Conditional Routing.

Retrieval part

Let's initialize the components to use for the initial retrieval phase.

[ ]

Prompt template

Let's define the first prompt template, which instructs the model to:

  • answer the query based on the retrieved documents, if possible
  • reply with 'no_answer', otherwise
[ ]

Conditional Router

This is the component that will perform data routing, depending on the reply given by the Language Model.

[ ]
[ ]
{'answer': 'this is the answer!'}
[ ]
{'go_to_websearch': 'my query'}

Web search

[ ]
[ ]
Found documents:
Content: Tanzania is a country in East Africa within the African Great Lakes region. It is bordered by Uganda, Kenya, the Indian Ocean, Mozambique, Malawi, Zambia, Rwanda, Burundi, and the Democratic Republic of the Congo.
Content: Tanzania is a country in East Africa's Great Lakes Region, located just below the Equator. It is bordered by eight countries and the Indian Ocean, and has diverse geographical features such as mountains, lakes, rivers, and islands.
Content: Tanzania is an East African country formed by the union of Tanganyika and Zanzibar in 1964. It has diverse landscapes, including Mount Kilimanjaro, Lake Victoria, and the Great Rift Valley, and a rich cultural heritage.
Content: Tanzania is the largest and most populous country in East Africa, with a total area of 947,300 sq km and a coastline of 1,424 km. It has diverse natural features, including mountains, lakes, rivers, and islands, and borders eight other countries.
Content: Tanzania is a country in Eastern Africa, bordering the Indian Ocean, between Kenya and Mozambique. It has many lakes, national parks, and mountains, including Mount Kilimanjaro, the highest point in Africa.

Search Links:
https://en.wikipedia.org/wiki/Tanzania
https://www.worldatlas.com/maps/tanzania
https://www.britannica.com/place/Tanzania
https://www.cia.gov/the-world-factbook/countries/tanzania/
https://en.wikipedia.org/wiki/Geography_of_Tanzania

Prompt template after Web search

[ ]

Assembling the Pipeline

Now that we have all the components, we can assemble the full pipeline.

To handle the different prompt sources, we'll use a BranchJoiner. This allows us to connect multiple output sockets (with prompts) to our language model. In our case, the prompt will either come from the initial prompt_builder or from prompt_builder_after_websearch.

[ ]
<haystack.core.pipeline.pipeline.Pipeline object at 0x7cd028903ca0>
,πŸš… Components
,  - text_embedder: SentenceTransformersTextEmbedder
,  - retriever: InMemoryEmbeddingRetriever
,  - prompt_builder: PromptBuilder
,  - prompt_joiner: BranchJoiner
,  - llm: HuggingFaceLocalGenerator
,  - router: ConditionalRouter
,  - websearch: DuckduckgoApiWebSearch
,  - prompt_builder_after_websearch: PromptBuilder
,πŸ›€οΈ Connections
,  - text_embedder.embedding -> retriever.query_embedding (List[float])
,  - retriever.documents -> prompt_builder.documents (List[Document])
,  - prompt_builder.prompt -> prompt_joiner.value (str)
,  - prompt_joiner.value -> llm.prompt (str)
,  - llm.replies -> router.replies (List[str])
,  - router.go_to_websearch -> websearch.query (str)
,  - router.go_to_websearch -> prompt_builder_after_websearch.query (str)
,  - websearch.documents -> prompt_builder_after_websearch.documents (List[Document])
,  - prompt_builder_after_websearch.prompt -> prompt_joiner.value (str)
[ ]
Output

Agentic RAG in action! πŸ”Ž

[ ]
[ ]
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


FROM THE KNOWLEDGE BASE: The Great Pyramid of Giza was built as the tomb of Fourth Dynasty pharaoh Khufu, and its construction is believed to have taken around 27 years to complete.
[ ]
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


FROM THE WEB: Munich is located in the south of Germany, and is the capital of the federal state of Bavaria. It is connected to other major cities in Germany and Austria, and has direct access to Italy.
[ ]
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


FROM THE KNOWLEDGE BASE: The head of the Colossus of Rhodes was of a standard rendering at the time, with curly hair and evenly spaced spikes of bronze or silver flame radiating from it, similar to the images found on contemporary Rhodian coins.
[ ]
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


FROM THE WEB: No, the Leaning Tower of Pisa was one of the Seven Wonders of the Medieval World, but not of the ancient world.
[ ]
Batches:   0%|          | 0/1 [00:00<?, ?it/s]
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


FROM THE KNOWLEDGE BASE: Muawiyah I was a Muslim general who conquered Rhodes in 653.

(Notebook by Stefano Fiorucci)