Notebooks
d
deepset
Query Expansion

Query Expansion

agentic-aiagenticagentsgenaiAIhaystack-cookbookgenai-usecaseshaystack-ainotebooksPythonragai-tools

Advanced RAG: Query Expansion

by Tuana Celik (LI, Twitter/X)

This is part one of the Advanced Use Cases series:

1️⃣ Extract Metadata from Queries to Improve Retrieval cookbook & full article

2️⃣ Query Expansion & the full article

3️⃣ Query Decomposition cookbook & full article

4️⃣ Automated Metadata Enrichment

In this cookbook, you'll learn how to implement query expansion for RAG. Query expansion consists of asking an LLM to produce a number of similar queries to a user query. We are then able to use each of these queries in the retrieval process, increasing the number and relevance of retrieved documents.

📚 Read the full article

[ ]
[20]
[10]

The Process of Query Expansion

First, let's import the QueryExpander from Haystack Experimental.

Next, we’ll create a QueryExpander instance. This component generates a specified number (default is 4) of additional queries that are similar to the original user query. It returns queries, which include the original query plus the generated similar ones.

[4]
[6]
{'queries': ['natural language processing tools',
,  'free nlp libraries',
,  'open-source NLP platforms',
,  'public domain language processing frameworks',
,  'open source nlp frameworks']}

Retrieval Without Query Expansion

[11]
[12]
{'keyword_retriever': {'documents': [Document(id=8b306c8303c59508a53e5139b4e688c3817fa0211b095bcc77ab3823defa0b32, content: 'Air travel is one of the core contributors to climate change.', score: 2.023895027544814),
,   Document(id=aa996058ca5b30d8b469d33e992e094058e707bfb0cf057ee1d5b55ac4320234, content: 'The impact of climate change is evident in the melting of the polar ice caps.', score: 1.8661960327485192),
,   Document(id=4d5f7ef8df12c93cb5728cc0247bf95282a14017ce9d0b35486091f8972347a5, content: 'The effects of climate are many including loss of biodiversity', score: 1.5532314806726806)]}}

Retrieval With Query Expansion

Now let's have a look at what documents we are able to retrieve if we are to inluce query expansion in the process. For this step, let's create a MultiQueryInMemoryBM25Retriever that is able to use BM25 retrieval for each (expansded) query in turn.

This component also handles the same document being retrieved for multiple queries and will not return duplicates.

[13]
[14]
<haystack.core.pipeline.pipeline.Pipeline object at 0x10b2e9c10>
,🚅 Components
,  - expander: QueryExpander
,  - keyword_retriever: MultiQueryInMemoryBM25Retriever
,🛤️ Connections
,  - expander.queries -> keyword_retriever.queries (List[str])
[15]
{'expander': {'queries': ['global warming',
,   'environmental climate shifts',
,   "changes in Earth's climate",
,   'effects of climate variability',
,   'climate change']},
, 'keyword_retriever': {'documents': [Document(id=0901b034998c7263f74ac60cad5d9d520df524e59b045f3afab8e6cf1710791d, content: 'Consequences of global warming include the rise in sea levels.', score: 3.5574227469366644),
,   Document(id=40fcd5a4a3670b7e105db664783c076167bb699cade9aa7fd6d409fac2efb49e, content: 'There is a global call to reduce the amount of air travel people take.', score: 2.409793821666657),
,   Document(id=395d2da61fff546098eec2838da741033d71fef84dfa7a91fc40b1d275631933, content: 'One of the effects of environmental changes is the change in weather patterns.', score: 2.185936012403085),
,   Document(id=8b306c8303c59508a53e5139b4e688c3817fa0211b095bcc77ab3823defa0b32, content: 'Air travel is one of the core contributors to climate change.', score: 2.023895027544814),
,   Document(id=aa996058ca5b30d8b469d33e992e094058e707bfb0cf057ee1d5b55ac4320234, content: 'The impact of climate change is evident in the melting of the polar ice caps.', score: 1.8661960327485192),
,   Document(id=4d5f7ef8df12c93cb5728cc0247bf95282a14017ce9d0b35486091f8972347a5, content: 'The effects of climate are many including loss of biodiversity', score: 1.5532314806726806)]}}

Query Expansion for RAG

Let's start off by populating a document store with chunks of context from various Wikipedia pages.

[16]
[17]

RAG without Query Expansion

[23]
<haystack.core.pipeline.pipeline.Pipeline object at 0x10c433a50>
,🚅 Components
,  - keyword_retriever: InMemoryBM25Retriever
,  - chat_prompt_builder: ChatPromptBuilder
,  - llm: OpenAIChatGenerator
,🛤️ Connections
,  - keyword_retriever.documents -> chat_prompt_builder.documents (List[Document])
,  - chat_prompt_builder.prompt -> llm.messages (List[ChatMessage])
[24]
{'keyword_retriever': {'documents': [Document(id=b5544970917394145729c23e6447d57ca42aad99fd68d17641682b4726b326b5, content: 'Renewable energy (also called green energy) is energy made from renewable natural resources that are...', meta: {'title': 'Renewable energy', 'url': 'https://en.wikipedia.org/wiki/Renewable_energy', 'source_id': '2089cf2a38231b23fe28c73d1b8c847fdb84190acf6ea60fac87c733fad5694d', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 2.5931591814370085),
,   Document(id=30a6ac048f1c1b62b11b64b08bc62a416e26751a0209718426970a03279aa4f0, content: 'Wind power is the use of wind energy to generate useful work. Historically, wind power was used by s...', meta: {'title': 'Wind power', 'url': 'https://en.wikipedia.org/wiki/Wind_power', 'source_id': 'bf171fa3a08865668abc6b95bfa89aeeef6520aa05263e2c3166121553262c41', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 2.1698503702912926),
,   Document(id=6111c991130070bb618386d1028eaabe822aacec6c6d971570856a7e6f74240c, content: 'A fossil fuel is a flammable carbon compound- or hydrocarbon-containing material formed naturally in...', meta: {'title': 'Fossil fuel', 'url': 'https://en.wikipedia.org/wiki/Fossil_fuel', 'source_id': '518ec040573081a250fa14f8696301f2a7ea3fc3be40ad3f561c927fd5b8a8a3', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 1.9462904069647091)]},
, 'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='Green energy sources refer to renewable energy, which is "energy made from renewable natural resources" (Document: [Renewable energy](https://en.wikipedia.org/wiki/Renewable_energy)). One notable example of a green energy source is wind power, which is utilized to generate useful work from wind energy (Document: [Wind power](https://en.wikipedia.org/wiki/Wind_power)). In contrast, fossil fuels, which are non-renewable and consist of flammable carbon compounds, represent a different category of energy source (Document: [Fossil fuel](https://en.wikipedia.org/wiki/Fossil_fuel)).')], _name=None, _meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 125, 'prompt_tokens': 563, 'total_tokens': 688, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}})]}}

RAG with Query Expansion

[25]
<haystack.core.pipeline.pipeline.Pipeline object at 0x10c2c1710>
,🚅 Components
,  - expander: QueryExpander
,  - keyword_retriever: MultiQueryInMemoryBM25Retriever
,  - chat_prompt_builder: ChatPromptBuilder
,  - llm: OpenAIChatGenerator
,🛤️ Connections
,  - expander.queries -> keyword_retriever.queries (List[str])
,  - keyword_retriever.documents -> chat_prompt_builder.documents (List[Document])
,  - chat_prompt_builder.prompt -> llm.messages (List[ChatMessage])
[26]
Output
[27]
{'expander': {'queries': ['renewable energy sources',
,   'sustainable power generation',
,   'eco-friendly energy options',
,   'clean energy resources',
,   'green energy sources']},
, 'keyword_retriever': {'documents': [Document(id=bc480a9fbfd7d1e9bbd2deb6f714b6533a62673064f6d015b674b9121e128f67, content: 'An electric battery is a source of electric power consisting of one or more electrochemical cells wi...', meta: {'title': 'Electric battery', 'url': 'https://en.wikipedia.org/wiki/Electric_battery', 'source_id': '03300aed831e8407b7c7db4d7ead042d2c1c6858133809b03424185910318241', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 4.3266776407237),
,   Document(id=219c74b79bdebd2288deb0846466baa963cc685d90b824f91d2e3c1b9788aec5, content: 'An electric vehicle (EV) is a motor vehicle whose propulsion is powered fully or mostly by electrici...', meta: {'title': 'Electric vehicle', 'url': 'https://en.wikipedia.org/wiki/Electric_vehicle', 'source_id': '3f37bb285d46a2d30be6f670e4ad22ee6f99088f031768f32cd63ffd08b468f6', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 4.125736125230999),
,   Document(id=8a0c9682962456496887522a3ec9552057a67f7166ef50b311a3adc038ed58af, content: 'Nuclear power is the use of nuclear reactions to produce electricity. Nuclear power can be obtained ...', meta: {'title': 'Nuclear power', 'url': 'https://en.wikipedia.org/wiki/Nuclear_power', 'source_id': 'fad4070db55e2e0028d5eaf097b2ab15945c08653a321eaf4ab28668b998553f', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 2.640043921499558),
,   Document(id=b5544970917394145729c23e6447d57ca42aad99fd68d17641682b4726b326b5, content: 'Renewable energy (also called green energy) is energy made from renewable natural resources that are...', meta: {'title': 'Renewable energy', 'url': 'https://en.wikipedia.org/wiki/Renewable_energy', 'source_id': '2089cf2a38231b23fe28c73d1b8c847fdb84190acf6ea60fac87c733fad5694d', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 2.5931591814370085),
,   Document(id=30a6ac048f1c1b62b11b64b08bc62a416e26751a0209718426970a03279aa4f0, content: 'Wind power is the use of wind energy to generate useful work. Historically, wind power was used by s...', meta: {'title': 'Wind power', 'url': 'https://en.wikipedia.org/wiki/Wind_power', 'source_id': 'bf171fa3a08865668abc6b95bfa89aeeef6520aa05263e2c3166121553262c41', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 2.1698503702912926),
,   Document(id=6111c991130070bb618386d1028eaabe822aacec6c6d971570856a7e6f74240c, content: 'A fossil fuel is a flammable carbon compound- or hydrocarbon-containing material formed naturally in...', meta: {'title': 'Fossil fuel', 'url': 'https://en.wikipedia.org/wiki/Fossil_fuel', 'source_id': '518ec040573081a250fa14f8696301f2a7ea3fc3be40ad3f561c927fd5b8a8a3', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 1.9462904069647091),
,   Document(id=d726f9d1598e52a199ad3949700e1bc68de4d60d0a11a1c78fd6eae44f906458, content: 'Coal is a combustible black or brownish-black sedimentary rock, formed as rock strata called coal se...', meta: {'title': 'Coal', 'url': 'https://en.wikipedia.org/wiki/Coal', 'source_id': 'cdd0037a9e7d1d67721f1cf1a598b01a6d3a35780dc1a9211215a8a6636fb0d4', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 1.321711298057089)]},
, 'llm': {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='Green energy sources refer to renewable energy that is derived from natural resources which are replenished over time. As stated, "Renewable energy (also called green energy) is energy made from renewable natural resources that are..." (Document: [Renewable energy](https://en.wikipedia.org/wiki/Renewable_energy)). \n\nAdditionally, specific examples of green energy sources include wind power, which "is the use of wind energy to generate useful work" (Document: [Wind power](https://en.wikipedia.org/wiki/Wind_power)). Other green energy sources may include solar power, hydroelectric power, and geothermal energy, although these were not directly mentioned in the retrieved documents.\n\nPlease refer to the detailed article on renewable energy for more comprehensive insights (source: [Renewable energy](https://en.wikipedia.org/wiki/Renewable_energy)).')], _name=None, _meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 167, 'prompt_tokens': 1212, 'total_tokens': 1379, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}})]}}