Azure Question Answering Using Vector Store Search Qdrant

Question Answering Using Vector Store Search Qdrant

azure-openai-samplesBasic_SamplesdotnetcsharpDatastores

alph-notebooks/azure-openai-samples / Question_answering_using_vector_store_search_qdrant.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Question answering using vector store search

GPT excels at answering questions, but only on topics it remembers from its training data. What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,

Recent events after Sep 2021
Your non-public documents
Information from past conversations
etc.

This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer questions using a library of reference text.

Search: search your library of text for relevant text sections
Ask: insert the retrieved text sections into a message to GPT and ask it the question"

Why search is better than fine-tuning

GPT can learn knowledge in two ways:

Via model weights (i.e., fine-tune the model on a training set)
Via model inputs (i.e., insert the knowledge into an input message)

Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall.

As an analogy, model weights are like long-term memory. When you fine-tune a model, it's like studying for an exam a week away. When the exam arrives, the model may forget details, or misremember facts it never read.

In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it's like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.

One downside of text search relative to fine-tuning is that each model is limited by a maximum amount of text it can read at once:

Model	Maximum text length
`gpt-3.5-turbo`	4,096 tokens (~5 pages)
`gpt-4`	8,192 tokens (~10 pages)
`gpt-4-32k`	32,768 tokens (~40 pages)

Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.

Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach. Continuing the analogy, you can think of the model like a student who can only look at a few pages of notes at a time, despite potentially having shelves of textbooks to draw upon.

Therefore, to build a system capable of drawing upon large quantities of text to answer questions, we recommend using a Search-Ask approach.

Search

Text can be searched in many ways. E.g.,

Lexical-based search
Graph-based search
Embedding-based search

This example notebook uses embedding-based search. Embeddings are simple to implement and work especially well with questions, as questions often don't lexically overlap with their answers.

Consider embeddings-only search as a starting point for your own system. Better search systems might combine multiple search methods, along with features like popularity, recency, user history, redundancy with prior search results, click rate data, etc. Q&A retrieval performance may also be improved with techniques like HyDE, in which questions are first transformed into hypothetical answers before being embedded. Similarly, GPT can also potentially improve search results by automatically transforming questions into sets of keywords or search terms.

Full procedure

Specifically, this notebook demonstrates the following procedure:

Prepare search data (once per document)
1. Collect: We'll download a few hundred Wikipedia articles about the 2022 Olympics
2. Chunk: Documents are split into short, mostly self-contained sections to be embedded
3. Embed: Each section is embedded with the OpenAI API
4. Store: Embeddings are saved (for large datasets, use a vector database)
Search (once per query)
1. Given a user question, generate an embedding for the query from the OpenAI API
2. Using the embeddings, rank the text sections by relevance to the query
Ask (once per query)
1. Insert the question and the most relevant sections into a message to GPT
2. Return GPT's answer

Costs

Because GPT is more expensive than embeddings search, a system with a decent volume of queries will have its costs dominated by step 3.

For gpt-3.5-turbo using ~1,000 tokens per query, it costs ~$0.002 per query, or ~500 queries per dollar (as of Apr 2023)
For gpt-4, again assuming ~1,000 tokens per query, it costs ~$0.03 per query, or ~30 queries per dollar (as of Apr 2023) Of course, exact costs will depend on the system specifics and usage patterns.

Preamble

We'll begin by:

Importing the necessary libraries
Selecting models for embeddings search and question answering

Installation

Install the Azure Open AI SDK using the below command.

[1]

[ ]

Run this cell, it will prompt you for the apiKey, endPoint, embeddingDeployment, and chatDeployment

[3]

Import namesapaces and create an instance of `OpenAiClient` using the `azureOpenAIEndpoint` and the `azureOpenAIKey`

[4]

[5]

Load embedding data

IMPORTANT In this sample, we'll be loading wikipedia_embeddings.json. This file is generated by running the Embedding_Wikipedia_articles_for_search.ipynb notebook.

[6]

[7]

[8]

2. Start DB locally

[10]

5d210d1ef33f1b0023ce5b603e15c285e30b15e6dbf7ea05a1604632314122cd

[11]

[12]

[13]

Create collection

[14]

[15]

Persist data

[16]

2. Search

Now we'll define a search function that:

Takes a user query and a dataframe with text & embedding columns
Embeds the user query with the OpenAI API
Uses distance between query embedding and text embeddings to rank the texts
Returns two lists:
- The top N texts, ranked by relevance
- Their corresponding relevance scores

Define search function

Generates embeddings for user query
Builds query parameters for Milvus search client to extract the page_name and content_block fields
Run search
Transform search results into an IEnumerable<SearchResult>

[17]

Use `SearchAsync` to search the data

[18]

3.Ask

With the search function above, we can now automatically retrieve relevant knowledge and insert it into messages to GPT.

Below, we define a function AskAsync that:

Takes a user query
Searches for text relevant to the query
Stuffs that text into a message for GPT
Sends the message to GPT
Returns GPT's answer

[19]

[21]

The 2022 Winter Olympics took place in Beijing, China.

[25]

The 2022 Winter Olympics featured 109 events in 7 different sports, encompassing a total of 15 disciplines. New events included men's and women's big air freestyle skiing, women's monobob, mixed team competitions in freestyle skiing aerials, ski jumping, and snowboard cross, and the mixed relay in short track speed skating.