Notebooks
W
Weaviate
Query Expansion Haystack Weaviate

Query Expansion Haystack Weaviate

vector-searchvector-databaseretrieval-augmented-generationHaystackllm-frameworksfunction-callingweaviate-recipesintegrationsPythongenerative-aillm-agent-frameworks

Open In Colab

Advanced RAG: Query Expansion

by Tuana Celik (LI, Twitter/X)

In this cookbook, you'll learn how to implement query expansion for RAG. Query expansion consists of asking an LLM to produce a number of similar queries to a user query. We are then able to use each of these queries in the retrieval process, increasing the number and relevance of retrieved documents.

📚 Read the full article

[ ]
[ ]
[ ]
[ ]
Your OpenAI API Key: ··········

The Process of Query Expansion

First, let's create a QueryExpander. This component is going to be able to create a number (defaults to 5) of additional queries, similar to the original user query. It returns queries that has the original query + number of similar queries.

[ ]
[ ]
{'queries': ['natural language processing tools',
,  'free nlp libraries',
,  'open-source language processing software',
,  'nlp frameworks with open-source licensing',
,  'open source nlp frameworks']}

Retrieval Without Query Expansion

[ ]
[ ]
INFO:weaviate-client:Binary /root/.cache/weaviate-embedded did not exist. Downloading binary from https://github.com/weaviate/weaviate/releases/download/v1.26.1/weaviate-v1.26.1-Linux-amd64.tar.gz
INFO:weaviate-client:Started /root/.cache/weaviate-embedded: process ID 6895
7
[ ]
[ ]
{'keyword_retriever': {'documents': [Document(id=8b306c8303c59508a53e5139b4e688c3817fa0211b095bcc77ab3823defa0b32, content: 'Air travel is one of the core contributors to climate change.', score: 0.747502326965332),
,   Document(id=aa996058ca5b30d8b469d33e992e094058e707bfb0cf057ee1d5b55ac4320234, content: 'The impact of climate change is evident in the melting of the polar ice caps.', score: 0.7204996347427368),
,   Document(id=4d5f7ef8df12c93cb5728cc0247bf95282a14017ce9d0b35486091f8972347a5, content: 'The effects of climate are many including loss of biodiversity', score: 0.4040358066558838)]}}

Retrieval With Query Expansion

Now let's have a look at what documents we are able to retrieve if we are to inluce query expansion in the process. For this step, let's create a MultiQueryInMemoryBM25Retriever that is able to use BM25 retrieval for each (expansded) query in turn.

This component also handles the same document being retrieved for multiple queries and will not return duplicates.

[ ]
[ ]
<haystack.core.pipeline.pipeline.Pipeline object at 0x7fe9f242c970>
,🚅 Components
,  - expander: QueryExpander
,  - keyword_retriever: MultiQueryInMemoryBM25Retriever
,🛤️ Connections
,  - expander.queries -> keyword_retriever.queries (List[str])
[ ]
{'keyword_retriever': {'documents': [Document(id=aa996058ca5b30d8b469d33e992e094058e707bfb0cf057ee1d5b55ac4320234, content: 'The impact of climate change is evident in the melting of the polar ice caps.', score: 1.4499847888946533),
,   Document(id=0901b034998c7263f74ac60cad5d9d520df524e59b045f3afab8e6cf1710791d, content: 'Consequences of global warming include the rise in sea levels.', score: 1.3326431512832642),
,   Document(id=8b306c8303c59508a53e5139b4e688c3817fa0211b095bcc77ab3823defa0b32, content: 'Air travel is one of the core contributors to climate change.', score: 0.747502326965332),
,   Document(id=4d5f7ef8df12c93cb5728cc0247bf95282a14017ce9d0b35486091f8972347a5, content: 'The effects of climate are many including loss of biodiversity', score: 0.5684852600097656),
,   Document(id=395d2da61fff546098eec2838da741033d71fef84dfa7a91fc40b1d275631933, content: 'One of the effects of environmental changes is the change in weather patterns.', score: 0.5258742570877075)]},
, 'expander': {'queries': ['global warming effects',
,   'environmental impact of climate change',
,   'rising temperatures consequences',
,   'ecosystem changes due to climate change',
,   'carbon footprint reduction strategies',
,   'climate change']}}

Query Expansion for RAG

Let's start off by populating a document store with chunks of context from various Wikipedia pages.

[ ]
[ ]

RAG without Query Expansion

[ ]
<haystack.core.pipeline.pipeline.Pipeline object at 0x7fe9f2d86560>
,🚅 Components
,  - keyword_retriever: InMemoryBM25Retriever
,  - prompt: PromptBuilder
,  - llm: OpenAIGenerator
,🛤️ Connections
,  - keyword_retriever.documents -> prompt.documents (List[Document])
,  - prompt.prompt -> llm.prompt (str)
[ ]
{'llm': {'replies': ['Green energy sources refer to renewable energy derived from natural resources that are replenished over time. Examples include wind power, which harnesses wind energy to generate useful work. Unlike fossil fuels, green energy sources are sustainable and environmentally friendly. (Sources: Renewable energy - https://en.wikipedia.org/wiki/Renewable_energy, Wind power - https://en.wikipedia.org/wiki/Wind_power)'],
,  'meta': [{'model': 'gpt-3.5-turbo-0125',
,    'index': 0,
,    'finish_reason': 'stop',
,    'usage': {'completion_tokens': 77,
,     'prompt_tokens': 581,
,     'total_tokens': 658}}]},
, 'keyword_retriever': {'documents': [Document(id=9330ab45666b8880b10296cdee81ed3fd49e4dc1c621dc09ce499491d5108a7d, content: 'Renewable energy (or green energy) is energy from renewable natural resources that are replenished o...', meta: {'title': 'Renewable energy', 'url': 'https://en.wikipedia.org/wiki/Renewable_energy', 'source_id': 'fef99a734366aa5973ff0b2dcf9b595b02760e00830c9ff053ba968afaa47715', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 2.4197480497119446),
,   Document(id=0236e33198dc1f0bd81b9a5dcc9247efa55fc93ed82e1b2cf7c2087463f19b0d, content: 'Wind power is the use of wind energy to generate useful work. Historically, wind power was used by s...', meta: {'title': 'Wind power', 'url': 'https://en.wikipedia.org/wiki/Wind_power', 'source_id': 'e68ab3cc0e06d528fcf5e25bb7fa5f861d30a97f32057641016c69806db52b7d', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 1.9808224721291219),
,   Document(id=a51abe626d5271983d7e0f46e1eabd0893c914da92d97a6723d9fa456e70098e, content: 'A fossil fuel is a carbon compound- or hydrocarbon-containing material such as coal, oil, and natura...', meta: {'title': 'Fossil fuel', 'url': 'https://en.wikipedia.org/wiki/Fossil_fuel', 'source_id': 'a63254a0211209430abe6a3db5b40de61ec2d92faf7feceee5fa0c38e2185823', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 1.763127778347316)]}}

RAG with Query Expansion

[ ]
<haystack.core.pipeline.pipeline.Pipeline object at 0x7fe9f5170eb0>
,🚅 Components
,  - expander: QueryExpander
,  - keyword_retriever: MultiQueryInMemoryBM25Retriever
,  - prompt: PromptBuilder
,  - llm: OpenAIGenerator
,🛤️ Connections
,  - expander.queries -> keyword_retriever.queries (List[str])
,  - keyword_retriever.documents -> prompt.documents (List[Document])
,  - prompt.prompt -> llm.prompt (str)
[ ]
Output
[ ]
{'llm': {'replies': ['"Renewable energy (or green energy) is energy from renewable natural resources that are replenished o..." (source: Wikipedia)'],
,  'meta': [{'model': 'gpt-3.5-turbo-0125',
,    'index': 0,
,    'finish_reason': 'stop',
,    'usage': {'completion_tokens': 27,
,     'prompt_tokens': 1070,
,     'total_tokens': 1097}}]},
, 'expander': {'queries': ['renewable energy sources',
,   'sustainable energy options',
,   'clean energy resources',
,   'alternative energy sources',
,   'eco-friendly power sources',
,   'green energy sources']},
, 'keyword_retriever': {'documents': [Document(id=46f5d23e6803bb4c3d616f660bf99d055c632854cc2dd9a28de742564fcef660, content: 'An electric vehicle (EV) is a vehicle that uses one or more electric motors for propulsion. The vehi...', meta: {'title': 'Electric vehicle', 'url': 'https://en.wikipedia.org/wiki/Electric_vehicle', 'source_id': '581cb767709a5312641573d87ec4fe58220095255f24c297520ee1070bf473e0', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 3.349688622783147),
,   Document(id=d95897b1c4f2cf5c9a3536bd211c79a50b56ce320008fbf4bc26b5bd83d02049, content: 'An electric battery is a source of electric power consisting of one or more electrochemical cells wi...', meta: {'title': 'Electric battery', 'url': 'https://en.wikipedia.org/wiki/Electric_battery', 'source_id': '94fb9aed1508db9f1a53d4b6f9dc8b0fe45dcf2ff372c92424372f44d00439da', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 3.278746749054649),
,   Document(id=0236e33198dc1f0bd81b9a5dcc9247efa55fc93ed82e1b2cf7c2087463f19b0d, content: 'Wind power is the use of wind energy to generate useful work. Historically, wind power was used by s...', meta: {'title': 'Wind power', 'url': 'https://en.wikipedia.org/wiki/Wind_power', 'source_id': 'e68ab3cc0e06d528fcf5e25bb7fa5f861d30a97f32057641016c69806db52b7d', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 1.9248484931817393),
,   Document(id=9330ab45666b8880b10296cdee81ed3fd49e4dc1c621dc09ce499491d5108a7d, content: 'Renewable energy (or green energy) is energy from renewable natural resources that are replenished o...', meta: {'title': 'Renewable energy', 'url': 'https://en.wikipedia.org/wiki/Renewable_energy', 'source_id': 'fef99a734366aa5973ff0b2dcf9b595b02760e00830c9ff053ba968afaa47715', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 0.8383441344504454),
,   Document(id=a51abe626d5271983d7e0f46e1eabd0893c914da92d97a6723d9fa456e70098e, content: 'A fossil fuel is a carbon compound- or hydrocarbon-containing material such as coal, oil, and natura...', meta: {'title': 'Fossil fuel', 'url': 'https://en.wikipedia.org/wiki/Fossil_fuel', 'source_id': 'a63254a0211209430abe6a3db5b40de61ec2d92faf7feceee5fa0c38e2185823', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 0.764201225127407),
,   Document(id=7099fafc118346207fb04dc74149f6de8e13cd36d0b5407e22d7067f25e3f2b0, content: 'Nuclear power is the use of nuclear reactions to produce electricity. Nuclear power can be obtained ...', meta: {'title': 'Nuclear power', 'url': 'https://en.wikipedia.org/wiki/Nuclear_power', 'source_id': '19675f5020f2a9604d99a79c71221c7f305707a421412a4df6e5d4895b75c696', 'page_number': 1, 'split_id': 0, 'split_idx_start': 0}, score: 0.7409425118003157)]}}
[ ]