Zilliz Pipeline Rag Advanced
Filter by Tags in Zilliz Cloud Pipelines
(Note) Zilliz Cloud Pipelines is about to deprecate. Please stay tuned for detailed instructions on alternative solutions.
In the previous notebook, we have learned the basics of Zilliz Cloud Pipelines. In this notebook, we show an example of filtering retrieval results by tags. The Pipelines operations are wrapped with a helper class to simply the code.
Setup
Prerequisites
Please make sure you have a Serverless cluster in Zilliz Cloud. If not already, you can sign up for free.
To learn how to create a Serverless cluster and get your CLOUD_REGION, CLUSTER_ID, API_KEY and PROJECT_ID, please refer to this page for more details.
Create an ingestion pipeline
Ingestion pipelines can transform unstructured data into searchable vector embeddings and store them in Zilliz Cloud Vector Database.
In the following example we create an Ingestion pipeline named as my_ingestion_pipeline. As part of creating the Ingestion pipeline, a vector database collection named my_rag_collection will be created in the cluster. It contains five fields:
doc_name,chunk_id,chunk_text,embeddingas defined byINDEX_DOCfunctionversionas defined byPRESERVEfunction
If you run this code and get the error "This collection already exists", it means you have created this collection before. You can change the collection_name or delete the collection manually.
Create a search pipeline
Search pipelines enables semantic search by converting a query string into a vector embedding and then retrieving top-K nearest neighbour vectors and doc chunks.
In the following example we create a Search pipeline named my_search_pipeline. It searches the collection created by the Ingestion pipeline above.
Run ingestion pipeline
Ingestion pipeline accepts files from Object Storage Service such as AWS S3 or Google Cloud Storage (GCS).
We use two versions of Milvus (an open-source vector database) doc, which are from Milvus 2.3 and Milvus 2.2 . They are stored on Google Cloud Storage and attach its version info. We pass the version information into the keyword arguments of the run() method.
<Response [200]>
Now we have successfully ingested the document by splitting it into doc chunks and uploading the generated embedding into the vector database collection. If you want to inspect the data in the collection, you can use the Data Preview tool in Zilliz Cloud web UI.
Build RAG application with Search pipeline
Run search pipeline
We can use the run() method to run a search pipeline. The run() method takes a question as input and returns the top k knowledge fragments.
The returned information also needs to include other_output_fields=['version'], and the filter condition is version == "2.2".
[{'chunk_text': '# Delete Entities\nThis topic describes how to delete entities in Milvus. \nMilvus supports deleting entities by primary key filtered with boolean expression. \nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance.',
, 'version': '2.2'},
, {'chunk_text': '# Delete Entities\n## Prepare boolean expression\nPrepare the boolean expression that filters the entities to delete. \nMilvus only supports deleting entities with clearly specified primary keys, which can be achieved merely with the term expression in. Other operators can be used only in query or scalar filtering in vector search. See Boolean Expression Rules for more information. \nThe following example filters data with primary key values of 0 and 1. \n```python\nexpr = "book_id in [0,1]"\n```',
, 'version': '2.2'}] Let’s try changing the filter conditions to version == "2.3"
[{'chunk_text': '# Delete Entities\nThis topic describes how to delete entities in Milvus. \nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions. \nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance. \nBefore deleting entities by comlpex boolean expressions, make sure the collection has been loaded.\nDeleting entities by complex boolean expressions is not an atomic operation. Therefore, if it fails halfway through, some data may still be deleted.\nDeleting entities by complex boolean expressions is supported only when the consistency is set to Bounded. For details, see Consistency.',
, 'version': '2.3'},
, {'chunk_text': '# Delete Entities\n## Prepare boolean expression\nPrepare the boolean expression that filters the entities to delete. \nMilvus supports deleting entities by primary key or complex boolean expressions. For more information on expression rules and supported operators, see Boolean Expression Rules.',
, 'version': '2.3'}] We can see that when we ask a question, this search run can return the top k knowledge fragments we need. This is also a basis for forming RAG.
Build a chatbot powered by RAG
Below, we show a simple RAG app that can answer based on the knowledge we have ingested previously. It uses OpenAI gpt-3.5-turbo as LLM and a simple prompt. To test it, you can replace with your own OpenAI API Key.
This implements an RAG chatbot, it will use Search pipeline to retrieve the most relevant chunks from ingested documents, and enhance the answer quality with it. Let's see how it works in action!
Chat with RAG
'No, according to the provided information, Milvus only supports deleting entities by primary key filtered with a boolean expression. Other operators can be used only in query or scalar filtering in vector search.'
The ground truth content in the original knowledge text is:
Milvus supports deleting entities by primary key filtered with boolean expression.
'Yes, users can delete Milvus entities through non-primary key filtering by using complex boolean expressions.'
The ground truth content in the original knowledge text is:
Milvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.
Indeed, Milvus 2.3 has enhanced the Delete Entities function. In the latest version 2.3, deleting by complex boolean expressions can be supported. By filtering different Milvus versions, we have achieved the ability to RAG different knowledge sources.
That's how to use Zilliz Cloud Pipelines to build RAG applications. To learn more, you can refer to https://docs.zilliz.com/docs/pipelines for detailed information.
If you have any question, feel free to contact us at support@zilliz.com