Question Answering Using Fusion Retriever Architecture
Question answering using fusion retriever architecture
This notebook builds ontop of Question answering using embeddings-based search adding two more concepts
- query rewrite
- re-ranking
This changes will impact the Search function (the Retrival part of RAG).
This architecture is called fusion retriever.
Installation
Install the Azure Open AI SDK using the below command.
Run this cell, it will prompt you for the apiKey, endPoint, embeddingDeployment, and chatDeployment
Import namesapaces and create an instance of OpenAiClient using the azureOpenAIEndpoint and the azureOpenAIKey
1. Prepare search data
To save you the time & expense, we've prepared a pre-embedded dataset of a few hundred Wikipedia articles about the 2022 Winter Olympics. To see how we constructed this dataset, or to modify it yourself, see Embedding Wikipedia articles for search
2. Query generation
The GenerateQueries function is asynchronous and takes two parameters: originalQuery which is a string representing the original search query, and numQueries which is an integer representing the number of search queries to generate.
Inside the function, several operations are performed:
-
It creates a prompt string using the
originalQueryandnumQueriesparameters. This prompt is formatted to instruct an AI assistant to generatenumQueriessearch queries related to theoriginalQuery. -
It creates a new instance of the
ChatCompletionsOptionsclass, setting various properties to configure the chat completion request. TheMessagesproperty is set to a list containing twoChatMessageobjects: one with the role ofSystemand a content of "You answer questions about the 2022 Winter Olympics.", and one with the role ofUserand a content of the previously created prompt. TheTemperatureproperty is set to 0, theMaxTokensproperty is set to 3500, and theDeploymentNameproperty is set tochatDeployment. -
It calls the
GetChatCompletionsAsyncmethod of theclientobject, passing in theoptionsobject. This method sends a chat completion request to the OpenAI API and returns a response. Theawaitkeyword is used to asynchronously wait for the method to complete. -
It retrieves the content of the first choice from the response and splits it into an array of strings using
Environment.NewLineas the separator. This array represents the generated search queries.
Finally, it returns the array of generated search queries.
3. Search
Now we'll define a search function that:
- Takes a user query and a dataframe with text & embedding columns
- Calculates the text embedding of the query using ADA model
- Generates additional queries usign GPT model
- Calculates text embeddings for the new queries
- Uses distance between each of the new generatedquery embedding and text embeddings to rank the texts
- Re-ranks the retrived documents using the original query embeddings
- Returns a list with:
- The top N texts, ranked by relevance
- Their corresponding relevance scores
The SearchAsync function is used to perform a search given a query and a collection of knowledge. It takes three parameters: query which is a string representing the search query, knowledge which is a collection of PageBlockWithEmbeddings objects representing the knowledge base, and resultCount which is an optional parameter that defaults to 5 and represents the number of search results to return.
This is what happens in the function:
-
It retrieves the embedding for the query using the
GetEmbeddingsAsyncmethod of theclientobject and stores it inqueryEmbedding. -
It generates alternative queries using the
GenerateQueriesmethod and stores them ingeneratedQueries. -
It retrieves the embeddings for the generated queries using the
GetEmbeddingsAsyncmethod of theclientobject and stores them ingeneratedQueriesEmbeddings. -
It initializes an empty list of
RankedTextobjects, namedretrievedFacts. -
It enters a loop that iterates over the
generatedQueriesEmbeddings. For each embedding, it scores the knowledge base by similarity to the embedding, filters out items with a score less than 0.8, takes the top 10 items, and adds them to theretrievedFactslist. -
It scores the
retrievedFactsby similarity to thequeryEmbedding, filters out items with a score less than 0.8, takes the topresultCountitems, and transforms them intoSearchResultobjects. This part is usually referred to as re-ranking.
Finally, it returns the resulting collection of SearchResult objects.
3.Ask
With the search function above, we can now automatically retrieve relevant knowledge and insert it into messages to GPT.
Below, we define a function AskAsync that:
- Takes a user query
- Searches for text relevant to the query
- Stuffs that text into a message for GPT
- Sends the message to GPT
- Returns GPT's answer
The AskAsync method starts by calling the SearchAsync method with the user's question and a dataset about the 2022 Winter Olympics (olympicsData). The SearchAsync method searches the dataset for relevant information and returns a list of search results.
Next, the method constructs a string articles that contains all the search results. Each search result is formatted as a section of a Wikipedia article. The search results are joined together with newline characters in between.
The method then constructs a userQuestion string that instructs the AI to use the articles to answer the question. If the answer cannot be found in the articles, the AI is instructed to respond with "I could not find an answer."
The userQuestion string is then used to create an instance of ChatCompletionsOptions. This object is used to specify the parameters for a chat completion request to the OpenAI API. The Messages property of the object is set to a list that contains a system message and a user message. The system message instructs the AI that it answers questions about the 2022 Winter Olympics. The user message is the userQuestion string. The Temperature property is set to 0, which means that the AI will generate more deterministic responses. The MaxTokens property is set to 3500, which limits the length of the AI's response. The DeploymentName property is set to chatDeployment, which likely specifies the deployment of the chat model.
The method then makes an asynchronous request to the OpenAI API to get chat completions. The GetChatCompletionsAsync method of the client object is used to make this request. The method takes the ChatCompletionsOptions instance as a parameter.
Finally, the method processes the response from the OpenAI API to extract the AI's answer. The Value.Choices.FirstOrDefault()?.Message?.Content expression is used to get the content of the first choice in the response. The method then returns this answer.
According to the articles, Norway won a total of 16 gold medals at the 2022 Winter Olympics.
The 2022 Winter Olympics took place in Beijing, China.