deepset Using Hyde For Improved Retrieval

Using Hyde For Improved Retrieval

agentic-aiagenticagentsgenaiAIhaystack-cookbookgenai-usecaseshaystack-ainotebooksPythonragai-tools

alph-notebooks/haystack-cookbook / using_hyde_for_improved_retrieval.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Using Hypothetical Document Embeddings (HyDE) to Improve Retrieval

📚 This cookbook has an accompanying article with a complete walkthrough "Optimizing Retrival with HyDE"

In this coookbook, we are building Haystack components that allow us to easily incorporate HyDE into our RAG pipelines, to optimize retrieval.

To learn more about HyDE and when it's useful, check out our guide to Hypothetical Document Embeddings (HyDE)

Install Requirements

[ ]

In the following sections, we will be using the OpenAIGenerator, so we need to provide our API key 👇

[ ]

Building a Pipeline for Hypothetical Document Embeddings

We will build a Haystack pipeline that generates 'fake' documents. For this part, we are using the OpenAIGenerator with a PromptBuilder that instructs the model to generate paragraphs.

[ ]

Next, we use the OutputAdapter to transform the generated paragraphs into a List of Documents. This way, we will be able to use the SentenceTransformersDocumentEmbedder to create embeddings, since this component expects List[Document]

[ ]

Finally, we create a custom component, HypotheticalDocumentEmbedder, that expects documents and can return a list of hypotethetical_embeddings which is the average of the embeddings from the "hypothetical" (fake) documents. To learn more about this technique and where it's useful, check out our Guide to HyDE

[ ]

We add all of our components into a pipeline to genereate a hypothetical document embedding 🚀👇

[ ]

Build a HyDE Component That Encapsulates the Whole Logic

This section shows you how to create a HypotheticalDocumentEmbedder that instead, encapsulates the entire logic, and also allows us to provide the embedding model as an optional parameter.

This "mega" components does a few things:

Allows the user to pick the LLM which generates the hypothetical documents
Allows users to define how many documents should be created with nr_completions
Allows users to define the embedding model they want to use to generate the HyDE embeddings.

[ ]

Use HyDE For Retrieval

Let's see how we can use this component in a full pipeline. First, let's index some documents into an InMemoryDocumentStore

[ ]

We can now run a retrieval pipeline that doesn't just retrieve based on the query embeddings, instead, it uses the HypotheticalDocumentEmbedder to create hypothetical document embeddings based on our query and uses these new embeddings to retrieve documents.

[ ]