Notebooks
M
Milvus
Build RAG With Milvus And Cognee

Build RAG With Milvus And Cognee

image-searchvector-databasesemantic-searchIntegrationmilvusembeddingsunstructured-dataquestion-answeringLLMmilvus-bootcampdeep-learningimage-recognitionimage-classificationaudio-searchPythonragNLP

Build RAG with Milvus and Cognee

Cognee is a developer-first platform that streamlines AI application development with scalable, modular ECL (Extract, Cognify, Load) pipelines. By integrating seamlessly with Milvus, Cognee enables efficient connection and retrieval of conversations, documents, and transcriptions, reducing hallucinations and optimizing operational costs.

With strong support for vector stores like Milvus, graph databases, and LLMs, Cognee provides a flexible and customizable framework for building retrieval-augmented generation (RAG) systems. Its production-ready architecture ensures improved accuracy and efficiency for AI-powered applications.

In this tutorial, we will show you how to build a RAG (Retrieval-Augmented Generation) pipeline with Milvus and Cognee.

[ ]

If you are using Google Colab, to enable dependencies just installed, you may need to restart the runtime (click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu).

By default, it use OpenAI as the LLM in this example. You should prepare the api key, and set it in the config set_llm_api_key() function.

To configure Milvus as the vector database, set the VECTOR_DB_PROVIDER to milvus and specify the VECTOR_DB_URL and VECTOR_DB_KEY. Since we are using Milvus Lite to store data in this demo, only the VECTOR_DB_URL needs to be provided.

[41]

As for the environment variables VECTOR_DB_URL and VECTOR_DB_KEY:

  • Setting the VECTOR_DB_URL as a local file, e.g../milvus.db, is the most convenient method, as it automatically utilizes Milvus Lite to store all data in this file.
  • If you have large scale of data, you can set up a more performant Milvus server on docker or kubernetes. In this setup, please use the server uri, e.g.http://localhost:19530, as your VECTOR_DB_URL.
  • If you want to use Zilliz Cloud, the fully managed cloud service for Milvus, adjust the VECTOR_DB_URL and VECTOR_DB_KEY, which correspond to the Public Endpoint and Api key in Zilliz Cloud.

Prepare the data

We use the FAQ pages from the Milvus Documentation 2.4.x as the private knowledge in our RAG, which is a good data source for a simple RAG pipeline.

Download the zip file and extract documents to the folder milvus_docs.

[ ]

We load all markdown files from the folder milvus_docs/en/faq. For each document, we just simply use "# " to separate the content in the file, which can roughly separate the content of each main part of the markdown file.

[2]

Build RAG

Resetting Cognee Data

[ ]

With a clean slate ready, we can now add our dataset and process it into a knowledge graph.

Adding Data and Cognifying

[ ]

The add method loads the dataset (Milvus FAQs) into Cognee and the cognify method processes the data to extract entities, relationships, and summaries, constructing a knowledge graph.

Querying for Summaries

Now that the data has been processed, let's query the knowledge graph.

[23]
{'id': 'de5c6713-e079-5d0b-b11d-e9bacd1e0d73', 'text': 'Milvus stores two data types: inserted data and metadata.'}

This query searches the knowledge graph for a summary related to the query text, and the most related candidate is printed.

Querying for Chunks

Summaries offer high-level insights, but for more granular details, we can query specific chunks of data directly from the processed dataset. These chunks are derived from the original data that was added and analyzed during the knowledge graph creation.

[24]

Let's format and display it for better readability!

[26]
ID: 4be01c4b-9ee5-541c-9b85-297883934ab3

Text:

Where does Milvus store data?

Milvus deals with two types of data, inserted data and metadata.

Inserted data, including vector data, scalar data, and collection-specific schema, are stored in persistent storage as incremental log. Milvus supports multiple object storage backends, including [MinIO](https://min.io/), [AWS S3](https://aws.amazon.com/s3/?nc1=h_ls), [Google Cloud Storage](https://cloud.google.com/storage?hl=en#object-storage-for-companies-of-all-sizes) (GCS), [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs), [Alibaba Cloud OSS](https://www.alibabacloud.com/product/object-storage-service), and [Tencent Cloud Object Storage](https://www.tencentcloud.com/products/cos) (COS).

Metadata are generated within Milvus. Each Milvus module has its own metadata that are stored in etcd.

###

In our previous steps, we queried the Milvus FAQ dataset for both summaries and specific chunks of data. While this provided detailed insights and granular information, the dataset was large, making it challenging to clearly visualize the dependencies within the knowledge graph.

To address this, we will reset the Cognee environment and work with a smaller, more focused dataset. This will allow us to better demonstrate the relationships and dependencies extracted during the cognify process. By simplifying the data, we can clearly see how Cognee organizes and structures information in the knowledge graph.

Reset Cognee

[ ]

Adding the Focused Dataset

Here, a smaller dataset with only one line of text is added and processed to ensure a focused and easily interpretable knowledge graph.

[ ]

Querying for Insights

By focusing on this smaller dataset, we can now clearly analyze the relationships and structure within the knowledge graph.

[ ]

This output represents the results of a knowledge graph query, showcasing entities (nodes) and their relationships (edges) as extracted from the processed dataset. Each tuple includes a source entity, a relationship type, and a target entity, along with metadata like unique IDs, descriptions, and timestamps. The graph highlights key concepts and their semantic connections, providing a structured understanding of the dataset.

Congratulations, you have learned the basic usage of cognee with Milvus. If you want to know more advanced usage of cognee, please refer to its official page .