Notebooks
M
MongoDB
Haystack Mongodb Cooking Advisor Pipeline

Haystack Mongodb Cooking Advisor Pipeline

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

Open In Colab

Haystack and MongoDB Atlas RAG notebook

Install dependencies:

[11]
Collecting haystack-ai
  Using cached haystack_ai-2.2.1-py3-none-any.whl (345 kB)
Collecting mongodb-atlas-haystack
  Using cached mongodb_atlas_haystack-0.3.0-py3-none-any.whl (13 kB)
Collecting tiktoken
  Using cached tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
Collecting datasets
  Using cached datasets-2.19.2-py3-none-any.whl (542 kB)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.1.4)
Collecting lazy-imports (from haystack-ai)
  Downloading lazy_imports-0.3.1-py3-none-any.whl (12 kB)
Requirement already satisfied: more-itertools in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (10.1.0)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (3.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (1.25.2)
Collecting openai>=1.1.0 (from haystack-ai)
  Downloading openai-1.34.0-py3-none-any.whl (325 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 325.5/325.5 kB 6.1 MB/s eta 0:00:00
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (2.0.3)
Collecting posthog (from haystack-ai)
  Downloading posthog-3.5.0-py2.py3-none-any.whl (41 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.3/41.3 kB 5.9 MB/s eta 0:00:00
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (2.8.2)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (6.0.1)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (2.31.0)
Requirement already satisfied: tenacity in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (8.3.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (4.66.4)
Requirement already satisfied: typing-extensions>=4.7 in /usr/local/lib/python3.10/dist-packages (from haystack-ai) (4.12.2)
Collecting pymongo[srv] (from mongodb-atlas-haystack)
  Downloading pymongo-4.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (669 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 669.1/669.1 kB 9.9 MB/s eta 0:00:00
Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken) (2024.5.15)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets) (3.14.0)
Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (14.0.2)
Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets) (0.6)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 11.4 MB/s eta 0:00:00
Collecting requests (from haystack-ai)
  Downloading requests-2.32.3-py3-none-any.whl (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 10.0 MB/s eta 0:00:00
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 11.5 MB/s eta 0:00:00
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 12.3 MB/s eta 0:00:00
Requirement already satisfied: fsspec[http]<=2024.3.1,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.6.0)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets) (3.9.5)
Requirement already satisfied: huggingface-hub>=0.21.2 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.23.3)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets) (24.1)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (4.0.3)
Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (3.7.1)
Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai>=1.1.0->haystack-ai) (1.7.0)
Collecting httpx<1,>=0.23.0 (from openai>=1.1.0->haystack-ai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 9.3 MB/s eta 0:00:00
Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (2.7.3)
Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai>=1.1.0->haystack-ai) (1.3.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->haystack-ai) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->haystack-ai) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->haystack-ai) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->haystack-ai) (2024.6.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->haystack-ai) (2.1.5)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->haystack-ai) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->haystack-ai) (2024.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil->haystack-ai) (1.16.0)
Collecting monotonic>=1.5 (from posthog->haystack-ai)
  Downloading monotonic-1.6-py2.py3-none-any.whl (8.2 kB)
Collecting backoff>=1.10.0 (from posthog->haystack-ai)
  Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo[srv]->mongodb-atlas-haystack)
  Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 15.0 MB/s eta 0:00:00
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai>=1.1.0->haystack-ai) (1.2.1)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 12.6 MB/s eta 0:00:00
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai>=1.1.0->haystack-ai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 9.4 MB/s eta 0:00:00
Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai>=1.1.0->haystack-ai) (0.7.0)
Requirement already satisfied: pydantic-core==2.18.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai>=1.1.0->haystack-ai) (2.18.4)
Installing collected packages: monotonic, xxhash, requests, lazy-imports, h11, dnspython, dill, backoff, tiktoken, pymongo, posthog, multiprocess, httpcore, httpx, openai, datasets, haystack-ai, mongodb-atlas-haystack
  Attempting uninstall: requests
    Found existing installation: requests 2.31.0
    Uninstalling requests-2.31.0:
      Successfully uninstalled requests-2.31.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible.
Successfully installed backoff-2.2.1 datasets-2.19.2 dill-0.3.8 dnspython-2.6.1 h11-0.14.0 haystack-ai-2.2.1 httpcore-1.0.5 httpx-0.27.0 lazy-imports-0.3.1 mongodb-atlas-haystack-0.3.0 monotonic-1.6 multiprocess-0.70.16 openai-1.34.0 posthog-3.5.0 pymongo-4.7.3 requests-2.32.3 tiktoken-0.7.0 xxhash-3.4.1

Setup MongoDB Atlas connection and Open AI

  • Set the MongoDB connection string. Follow the steps here to get the connection string from the Atlas UI.

  • Set the OpenAI API key. Steps to obtain an API key as here

[3]
[8]
Enter your MongoDB connection string:··········
[9]
Enter your Open AI Key:··········

Create vector search index on collection

Follow this tutorial to create a vector index on database: ai_shop collection test_collection.

Verify that the index name is vector_index and the syntax specify:

	{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    }
  ]
}

Setup vector store to load documents:

[12]

Build the writer pipeline to load documnets

[ ]
Calculating embeddings: 100%|██████████| 3/3 [00:01<00:00,  2.85it/s]
{'doc_embedder': {'meta': {'model': 'text-embedding-ada-002',
,   'usage': {'prompt_tokens': 310, 'total_tokens': 310}}},
, 'doc_writer': {'documents_written': 81}}

Build a RAG Pipeline

Lets create a pipeline that will Retrieve Augment and Generate a response for user questions

[ ]
<haystack.core.pipeline.pipeline.Pipeline object at 0x7e7804231bd0>
,🚅 Components
,  - text_embedder: OpenAITextEmbedder
,  - retriever: MongoDBAtlasEmbeddingRetriever
,  - prompt_builder: PromptBuilder
,  - llm: OpenAIGenerator
,🛤️ Connections
,  - text_embedder.embedding -> retriever.query_embedding (List[float])
,  - retriever.documents -> prompt_builder.documents (List[Document])
,  - prompt_builder.prompt -> llm.prompt (str)

Lets test the pipeline

[ ]
To cook a lasagne, you can follow this classic recipe:

### Ingredients:
#### For the meat sauce:
- 2 tablespoons olive oil
- 1 onion, finely chopped
- 2 cloves garlic, minced
- 500g ground beef
- 800g canned tomatoes, crushed
- 2 tablespoons tomato paste
- 1 teaspoon dried basil
- 1 teaspoon dried oregano
- Salt and pepper to taste

#### For the béchamel sauce:
- 4 tablespoons butter
- 4 tablespoons all-purpose flour
- 500ml milk
- A pinch of nutmeg
- Salt and pepper to taste

#### For assembly:
- 250g lasagne sheets
- 200g mozzarella cheese, shredded
- 1 cup grated Parmesan cheese
- Fresh basil leaves for garnish (optional)

### Instructions:
1. **Preheat the oven** to 375°F (190°C).

2. **Prepare the meat sauce:**
   - Heat the olive oil in a large skillet over medium heat.
   - Add the chopped onion and cook until soft and translucent, about 5 minutes.
   - Stir in the minced garlic and cook for another minute.
   - Add the ground beef and cook until browned, breaking it up with a spoon as it cooks.
   - Stir in the crushed tomatoes, tomato paste, dried basil, and dried oregano.
   - Season with salt and pepper, then reduce the heat to low.
   - Let the sauce simmer for 30 minutes, stirring occasionally.

3. **Prepare the béchamel sauce:**
   - In a medium saucepan, melt the butter over medium heat.
   - Add the flour and whisk continuously for about 2 minutes to create a roux.
   - Gradually add the milk while whisking to prevent lumps from forming.
   - Cook the mixture, whisking constantly, until it thickens, about 5-7 minutes.
   - Season with a pinch of nutmeg, salt, and pepper.

4. **Assemble the lasagne:**
   - Spread a thin layer of the meat sauce on the bottom of a 9x13 inch baking dish.
   - Place a layer of lasagne sheets over the sauce.
   - Spread another layer of meat sauce over the lasagne sheets, followed by a layer of béchamel sauce.
   - Sprinkle some shredded mozzarella cheese over the béchamel sauce.
   - Repeat the layers until all the ingredients are used, finishing with a layer of béchamel sauce and a generous topping of mozzarella and Parmesan cheese.

5. **Bake the lasagne:**
   - Cover the baking dish with aluminum foil.
   - Bake in the preheated oven for 30 minutes.
   - Remove the foil and bake for an additional 15 minutes, or until the top is golden brown and bubbling.

6. **Rest and serve:**
   - Remove the lasagne from the oven and let it rest for 10-15 minutes before slicing.
   - Garnish with fresh basil leaves if desired, and serve.

Enjoy your delicious homemade lasagne!