Question Answering Using Vector Store Search Qdrant
Question answering using vector store search
GPT excels at answering questions, but only on topics it remembers from its training data. What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,
- Recent events after Sep 2021
- Your non-public documents
- Information from past conversations
- etc.
This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer questions using a library of reference text.
- Search: search your library of text for relevant text sections
- Ask: insert the retrieved text sections into a message to GPT and ask it the question"
Why search is better than fine-tuning
GPT can learn knowledge in two ways:
- Via model weights (i.e., fine-tune the model on a training set)
- Via model inputs (i.e., insert the knowledge into an input message)
Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall.
As an analogy, model weights are like long-term memory. When you fine-tune a model, it's like studying for an exam a week away. When the exam arrives, the model may forget details, or misremember facts it never read.
In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it's like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.
One downside of text search relative to fine-tuning is that each model is limited by a maximum amount of text it can read at once:
| Model | Maximum text length |
|---|---|
gpt-3.5-turbo | 4,096 tokens (~5 pages) |
gpt-4 | 8,192 tokens (~10 pages) |
gpt-4-32k | 32,768 tokens (~40 pages) |
Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.
Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach. Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.
Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach.
Search
Text can be searched in many ways. E.g.,
- Lexical-based search
- Graph-based search
- Embedding-based search
This example notebook uses embedding-based search. Embeddings are simple to implement and work especially well with questions, as questions often don't lexically overlap with their answers.
Consider embeddings-only search as a starting point for your own system. Better search systems might combine multiple search methods, along with features like popularity, recency, user history, redundancy with prior search results, click rate data, etc. Q&A retrieval performance may also be improved with techniques like HyDE, in which questions are first transformed into hypothetical answers before being embedded. Similarly, GPT can also potentially improve search results by automatically transforming questions into sets of keywords or search terms.
Full procedure
Specifically, this notebook demonstrates the following procedure:
- Prepare search data (once per document)
- Collect: We'll download a few hundred Wikipedia articles about the 2022 Olympics
- Chunk: Documents are split into short, mostly self-contained sections to be embedded
- Embed: Each section is embedded with the OpenAI API
- Store: Embeddings are saved (for large datasets, use a vector database)
- Search (once per query)
- Given a user question, generate an embedding for the query from the OpenAI API
- Using the embeddings, rank the text sections by relevance to the query
- Ask (once per query)
- Insert the question and the most relevant sections into a message to GPT
- Return GPT's answer
Costs
Because GPT is more expensive than embeddings search, a system with a decent volume of queries will have its costs dominated by step 3.
- For
gpt-3.5-turbousing ~1,000 tokens per query, it costs ~$0.002 per query, or ~500 queries per dollar (as of Apr 2023) - For
gpt-4, again assuming ~1,000 tokens per query, it costs ~$0.03 per query, or ~30 queries per dollar (as of Apr 2023) Of course, exact costs will depend on the system specifics and usage patterns.
Preamble
We'll begin by:
- Importing the necessary libraries
- Selecting models for embeddings search and question answering
Installation
Install the Azure Open AI SDK using the below command.
Run this cell, it will prompt you for the apiKey, endPoint, embeddingDeployment, and chatDeployment
Import namesapaces and create an instance of OpenAiClient using the azureOpenAIEndpoint and the azureOpenAIKey
Load embedding data
IMPORTANT In this sample, we'll be loading wikipedia_embeddings.json. This file is generated by running the Embedding_Wikipedia_articles_for_search.ipynb notebook.
2. Start DB locally
5d210d1ef33f1b0023ce5b603e15c285e30b15e6dbf7ea05a1604632314122cd
Create collection
Persist data
2. Search
Now we'll define a search function that:
- Takes a user query and a dataframe with text & embedding columns
- Embeds the user query with the OpenAI API
- Uses distance between query embedding and text embeddings to rank the texts
- Returns two lists:
- The top N texts, ranked by relevance
- Their corresponding relevance scores
Define search function
- Generates embeddings for user query
- Builds query parameters for Milvus search client to extract the page_name and content_block fields
- Run search
- Transform search results into an
IEnumerable<SearchResult>
Use SearchAsync to search the data
3.Ask
With the search function above, we can now automatically retrieve relevant knowledge and insert it into messages to GPT.
Below, we define a function AskAsync that:
- Takes a user query
- Searches for text relevant to the query
- Stuffs that text into a message for GPT
- Sends the message to GPT
- Returns GPT's answer
The 2022 Winter Olympics took place in Beijing, China.
The 2022 Winter Olympics featured 109 events in 7 different sports, encompassing a total of 15 disciplines. New events included men's and women's big air freestyle skiing, women's monobob, mixed team competitions in freestyle skiing aerials, ski jumping, and snowboard cross, and the mixed relay in short track speed skating.