Notebooks
A
Azure
Question Answering Using Fusion Retriever Architecture

Question Answering Using Fusion Retriever Architecture

azure-openai-samplesBasic_Samplesdotnetembeddingscsharp

Question answering using fusion retriever architecture

This notebook builds ontop of Question answering using embeddings-based search adding two more concepts

  • query rewrite
  • re-ranking

This changes will impact the Search function (the Retrival part of RAG).

This architecture is called fusion retriever.

Installation

Install the Azure Open AI SDK using the below command.

[1]
[ ]

Run this cell, it will prompt you for the apiKey, endPoint, embeddingDeployment, and chatDeployment

[6]

Import namesapaces and create an instance of OpenAiClient using the azureOpenAIEndpoint and the azureOpenAIKey

[7]
[8]

1. Prepare search data

To save you the time & expense, we've prepared a pre-embedded dataset of a few hundred Wikipedia articles about the 2022 Winter Olympics. To see how we constructed this dataset, or to modify it yourself, see Embedding Wikipedia articles for search

[9]
[10]
[11]

2. Query generation

The GenerateQueries function is asynchronous and takes two parameters: originalQuery which is a string representing the original search query, and numQueries which is an integer representing the number of search queries to generate.

Inside the function, several operations are performed:

  1. It creates a prompt string using the originalQuery and numQueries parameters. This prompt is formatted to instruct an AI assistant to generate numQueries search queries related to the originalQuery.

  2. It creates a new instance of the ChatCompletionsOptions class, setting various properties to configure the chat completion request. The Messages property is set to a list containing two ChatMessage objects: one with the role of System and a content of "You answer questions about the 2022 Winter Olympics.", and one with the role of User and a content of the previously created prompt. The Temperature property is set to 0, the MaxTokens property is set to 3500, and the DeploymentName property is set to chatDeployment.

  3. It calls the GetChatCompletionsAsync method of the client object, passing in the options object. This method sends a chat completion request to the OpenAI API and returns a response. The await keyword is used to asynchronously wait for the method to complete.

  4. It retrieves the content of the first choice from the response and splits it into an array of strings using Environment.NewLine as the separator. This array represents the generated search queries.

Finally, it returns the array of generated search queries.

[12]
[13]

3. Search

Now we'll define a search function that:

  • Takes a user query and a dataframe with text & embedding columns
  • Calculates the text embedding of the query using ADA model
  • Generates additional queries usign GPT model
  • Calculates text embeddings for the new queries
  • Uses distance between each of the new generatedquery embedding and text embeddings to rank the texts
  • Re-ranks the retrived documents using the original query embeddings
  • Returns a list with:
    • The top N texts, ranked by relevance
    • Their corresponding relevance scores

The SearchAsync function is used to perform a search given a query and a collection of knowledge. It takes three parameters: query which is a string representing the search query, knowledge which is a collection of PageBlockWithEmbeddings objects representing the knowledge base, and resultCount which is an optional parameter that defaults to 5 and represents the number of search results to return.

This is what happens in the function:

  1. It retrieves the embedding for the query using the GetEmbeddingsAsync method of the client object and stores it in queryEmbedding.

  2. It generates alternative queries using the GenerateQueries method and stores them in generatedQueries.

  3. It retrieves the embeddings for the generated queries using the GetEmbeddingsAsync method of the client object and stores them in generatedQueriesEmbeddings.

  4. It initializes an empty list of RankedText objects, named retrievedFacts.

  5. It enters a loop that iterates over the generatedQueriesEmbeddings. For each embedding, it scores the knowledge base by similarity to the embedding, filters out items with a score less than 0.8, takes the top 10 items, and adds them to the retrievedFacts list.

  6. It scores the retrievedFacts by similarity to the queryEmbedding, filters out items with a score less than 0.8, takes the top resultCount items, and transforms them into SearchResult objects. This part is usually referred to as re-ranking.

Finally, it returns the resulting collection of SearchResult objects.

[14]
[15]

3.Ask

With the search function above, we can now automatically retrieve relevant knowledge and insert it into messages to GPT.

Below, we define a function AskAsync that:

  • Takes a user query
  • Searches for text relevant to the query
  • Stuffs that text into a message for GPT
  • Sends the message to GPT
  • Returns GPT's answer

The AskAsync method starts by calling the SearchAsync method with the user's question and a dataset about the 2022 Winter Olympics (olympicsData). The SearchAsync method searches the dataset for relevant information and returns a list of search results.

Next, the method constructs a string articles that contains all the search results. Each search result is formatted as a section of a Wikipedia article. The search results are joined together with newline characters in between.

The method then constructs a userQuestion string that instructs the AI to use the articles to answer the question. If the answer cannot be found in the articles, the AI is instructed to respond with "I could not find an answer."

The userQuestion string is then used to create an instance of ChatCompletionsOptions. This object is used to specify the parameters for a chat completion request to the OpenAI API. The Messages property of the object is set to a list that contains a system message and a user message. The system message instructs the AI that it answers questions about the 2022 Winter Olympics. The user message is the userQuestion string. The Temperature property is set to 0, which means that the AI will generate more deterministic responses. The MaxTokens property is set to 3500, which limits the length of the AI's response. The DeploymentName property is set to chatDeployment, which likely specifies the deployment of the chat model.

The method then makes an asynchronous request to the OpenAI API to get chat completions. The GetChatCompletionsAsync method of the client object is used to make this request. The method takes the ChatCompletionsOptions instance as a parameter.

Finally, the method processes the response from the OpenAI API to extract the AI's answer. The Value.Choices.FirstOrDefault()?.Message?.Content expression is used to get the content of the first choice in the response. The method then returns this answer.

[16]
[21]
According to the articles, Norway won a total of 16 gold medals at the 2022 Winter Olympics.
[22]
The 2022 Winter Olympics took place in Beijing, China.