Zilliz Pipeline Rag Text
Build RAG using Zilliz Cloud Pipelines
(Note) Zilliz Cloud Pipelines is about to deprecate. Please stay tuned for detailed instructions on alternative solutions.
Zilliz Cloud Pipelines is AI-powered retrieval service. It simplifies the maintenance of information retrieval system by providing ingestion and search pipelines as easy-to-use API service. As an AI application developer, with quality optimization and devops taken care of, you can focus on building AI applications tailored to your specific use case.
In this notebook, we show how to use Zilliz Cloud Pipelines to build a simple yet scalable Retrieval Augmented Generation (RAG) application. Retrieval is at the heart of RAG solution, which typically involves maintaining a knowledge base with document pieces, hosting an embedding model and using vector database as retrieval engine. With Zilliz Cloud Pipelines, you don't need to deal with such a complex tech stack. Everything can be done with an API call.
We first create the an Ingestion pipeline for text indexing and a Search pipeline for knowledge retrieval. Then we run Ingestion pipeline by API call to import given text to establish the knowledge base. Finally, we build an RAG application that runs Search pipeline to conduct Retrieval Augmented Generation.

Setup
Prerequisites
Please make sure you have a Serverless cluster in Zilliz Cloud. If not already, you can sign up for free.
To learn how to create a Serverless cluster and get your CLOUD_REGION, CLUSTER_ID, API_KEY and PROJECT_ID, please refer to this page for more details.
With the Serverless Cluster created, please get the cluster id, API key and project id as shown and fill in the following code:

Create an ingestion pipeline
Ingestion pipelines can transform unstructured data into searchable vector embeddings and store them in Zilliz Cloud Vector Database.
In the Ingestion pipeline, you can specify functions to customize its behavior. The input data that Ingestion pipeline expects also depends on the specified functions. Currently, Ingestion pipeline allows the following types of functions:
- The
INDEX_TEXTfunction accepts a list of text as input. It converts each text to a vector embedding and maps an input field (text_list) to two fields (text, embedding) in the corresponding collection (auto-generated if not exist). - The
INDEX_DOCfunction expects a document as input. It splits the input text document into chunks and generates a vector embedding for each chunk. This function maps an input field (doc_url) to four output fields (doc_name, chunk_id, chunk_text, and embedding) in the corresponding collection (auto-generated if not exist). - The
INDEX_IMAGEfunction requires an image data and its unique id as input. It generates the image embedding and maps two given input fields (image_url, image_id) to two output fields (image_id, embedding) in the corresponding collection (auto-generated if not exist). - The
PRESERVEfunction stores a user-defined input as additional scalar field in the corresponding collection. This is typically used to store meta information of the core inputs, such as publisher info and tags that describes the property.
Please note that an ingestion function must contain one and only one index function as the core function, while the preserve function is optional.
In the following example, we will create an Ingestion pipeline with an INDEX_TEXT function and a PRESERVE function. As part of creating the Ingestion pipeline, a vector database collection named my_text_collection will be created in the cluster. It contains five fields:
idas auto-generated for each entitytext,embeddingas defined byINDEX_TEXTfunctiontitleas defined byPRESERVEfunction
{'code': 200, 'data': {'pipelineId': 'pipe-f119d6273a5ce19f65767f', 'name': 'my_ingestion_pipeline', 'type': 'INGESTION', 'description': 'A pipeline that generates text embeddings and stores title information.', 'status': 'SERVING', 'functions': [{'name': 'index_my_text', 'action': 'INDEX_TEXT', 'inputFields': ['text_list'], 'language': 'ENGLISH', 'embedding': 'zilliz/bge-base-en-v1.5'}, {'name': 'title_info', 'action': 'PRESERVE', 'inputField': 'title', 'outputField': 'title', 'fieldType': 'VarChar'}], 'clusterId': 'in03-d4426aaee81eb7e', 'collectionName': 'my_text_collection', 'totalTokenUsage': 0}}
After successful creation, it will return a pipeline ID. We will run this pipeline later with pipeline ID to ingest text inputs.
Create a search pipeline
Search pipelines enables semantic search by converting a query string into a vector embedding and then retrieving top-K nearest neighbour vectors, each vector represents a chunk of ingested document and carries other associated information such as file name and preserved properties.
A search pipeline contains a search function from the following types, in which you need to set the the cluster and collection to search from:
- The
SEARCH_DOC_CHUNKfunction expects a user query as input and returns relevant doc chunks from the knowledge base. - The
SEARCH_TEXTfunction expects a user query as input and returns relevant text entities from the knowledge base. - The
SEARCH_IMAGEfunction expects an image url as input, which will output data entities of most similar images.
In this example, we will need a SEARCH_TEXT function to enable the text retrieval.
{'code': 200, 'data': {'pipelineId': 'pipe-25c2ba7ab0726aa2e11e70', 'name': 'my_search_pipeline', 'type': 'SEARCH', 'description': 'A pipeline that receives text and search for semantically similar texts.', 'status': 'SERVING', 'functions': [{'name': 'search_chunk_text_and_title', 'action': 'SEARCH_TEXT', 'inputFields': ['query_text'], 'clusterId': 'in03-d4426aaee81eb7e', 'collectionName': 'my_text_collection', 'reranker': 'zilliz/bge-reranker-base', 'embedding': 'zilliz/bge-base-en-v1.5'}], 'totalTokenUsage': 0}}
Similarly, after successful creation, it will return a pipeline ID. We will run this pipeline later and will use this pipeline ID.
In addition to the creating pipelines through RESTful API as introduced in this notebook, you can also create pipelines through Web UI with a few clicks. Check the documentation to learn more about how to ingest, search, and delete different types of data (text, document, image, etc.).
Run ingestion pipeline
The text ingestion pipeline accepts a list of text data as input. In the following demo, we run the ingestion pipeline with text pieces and subheadings from the sample blog: What Milvus version to start with.
{'code': 200, 'data': {'token_usage': 225, 'num_entities': 3, 'ids': [449431798276845977, 449431798276845978, 449431798276845979]}}
{'code': 200, 'data': {'token_usage': 135, 'num_entities': 2, 'ids': [449431798276845981, 449431798276845982]}}
{'code': 200, 'data': {'token_usage': 136, 'num_entities': 2, 'ids': [449431798276845984, 449431798276845985]}}
Now we have successfully ingested the text pieces with corresponding titles and embeddings into the vector database. If you want to inspect the data in the collection, you can use the Data Preview tool in Zilliz Cloud web UI.
Build RAG application with Search pipeline
Run search pipeline
The first step in building an RAG app is to retrieve information pieces most relevant to the question from a knowledge base (typically a vector database collection).
This is as simple as running a Search pipeline that we just created above. Following is how to run a Search pipeline with query text and specifications, and we wrap this run with a function that can be used in the RAG app we will show shortly.
{'code': 200,
'data': {'result': [{'distance': 0.8722565174102783,
'id': 449431798276845977,
'text': 'As the name suggests, Milvus Lite is a '
'lightweight version that integrates seamlessly '
'with Google Colab and Jupyter Notebook. It is '
'packaged as a single binary with no additional '
'dependencies, making it easy to install and run '
'on your machine or embed in Python '
'applications. Additionally, Milvus Lite '
'includes a CLI-based Milvus standalone server, '
'providing flexibility for running it directly '
'on your machine. Whether you embed it within '
'your Python code or utilize it as a standalone '
'server is entirely up to your preference and '
'specific application requirements.',
'title': 'Milvus Lite'},
{'distance': 0.3541138172149658,
'id': 449431798276845978,
'text': 'Milvus Lite is ideal for rapid prototyping and '
'local development, offering support for quick '
'setup and experimentation with small-scale '
'datasets on your machine. However, its '
'limitations become apparent when transitioning '
'to production environments with larger datasets '
'and more demanding infrastructure requirements. '
'As such, while Milvus Lite is an excellent tool '
'for initial exploration and testing, it may not '
'be suitable for deploying applications in '
'high-volume or production-ready settings.',
'title': 'Milvus Lite'}],
'token_usage': 34}}
[{'text': 'As the name suggests, Milvus Lite is a lightweight version that integrates seamlessly with Google Colab and Jupyter Notebook. It is packaged as a single binary with no additional dependencies, making it easy to install and run on your machine or embed in Python applications. Additionally, Milvus Lite includes a CLI-based Milvus standalone server, providing flexibility for running it directly on your machine. Whether you embed it within your Python code or utilize it as a standalone server is entirely up to your preference and specific application requirements.',
, 'title': 'Milvus Lite'},
, {'text': 'Milvus Lite is ideal for rapid prototyping and local development, offering support for quick setup and experimentation with small-scale datasets on your machine. However, its limitations become apparent when transitioning to production environments with larger datasets and more demanding infrastructure requirements. As such, while Milvus Lite is an excellent tool for initial exploration and testing, it may not be suitable for deploying applications in high-volume or production-ready settings.',
, 'title': 'Milvus Lite'}] We can see that when we ask a question, this search run can return the top k knowledge fragments we need. This is also a basis for forming RAG.
Build a chatbot powered by RAG
With the above convenient helper function retrieval_with_pipeline, we can retrieve the knowledge ingested into the vector database.
Below, we show a simple RAG app that can answer based on the knowledge we have ingested previously. It uses OpenAI gpt-3.5-turbo as LLM and a simple prompt. To test it, you can replace with your own OpenAI API Key.
This implements an RAG chatbot, it will use Search pipeline to retrieve the most relevant text pieces from the database, and enhance the answer quality with it. Let's see how it works in action!
Chat with RAG
'Based on the context provided, you should choose Milvus Lite if you want to use it in a Jupyter Notebook with a small scale of data. Milvus Lite is specifically designed for rapid prototyping and local development, offering support for quick setup and experimentation with small-scale datasets on your machine. It is lightweight, easy to install, and integrates seamlessly with Google Colab and Jupyter Notebook.'
The ground truth content in the original knowledge text is:
As the name suggests, Milvus Lite is a lightweight version that integrates seamlessly with Google Colab and Jupyter Notebook. It is packaged as a single binary with no additional dependencies, making it easy to install and run on your machine or embed in Python applications. Additionally, Milvus Lite includes a CLI-based Milvus standalone server, providing flexibility for running it directly on your machine. Whether you embed it within your Python code or utilize it as a standalone server is entirely up to your preference and specific application requirements.
We can tell that the RAG we built successfully answers this question that requires deep domain knowledge.
'If you are working with a small scale of data in a Jupyter notebook, you may want to consider using Milvus CE (Community Edition). Milvus CE is a free and open-source vector database that is suitable for small-scale projects and experimentation. It is easy to set up and use in a Jupyter notebook environment, making it a good choice for beginners or those working with limited data. Additionally, Milvus CE offers a range of features and functionalities that can help you efficiently store and query your data in a vectorized format.'
In opposite, the LLM without RAG doesn't have domain knowledge required for this question, even worse, it outputs incorrect answer. This is a typical example of the so called hallucination problem of LLM.
That's how to use Zilliz Cloud Pipelines to build RAG applications. To learn more, you can refer to https://docs.zilliz.com/docs/pipelines for detailed information.
If you have any question, feel free to contact us at support@zilliz.com