deepset Keyword Extraction

Keyword Extraction

agentic-aiagenticagentsgenaiAIhaystack-cookbookgenai-usecaseshaystack-ainotebooksPythonragai-tools

alph-notebooks/haystack-cookbook / keyword-extraction.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Keyword Extraction with LLM Chat Generator

This notebook demonstrates how to extract keywords and key phrases from text using Haystack’s ChatPromptBuilder together with an LLM via OpenAIChatGenerator. We will:

Define a prompt that instructs the model to identify single- and multi-word keywords.
Capture each keyword’s character offsets.
Assign a relevance score (0–1).
Parse and display the results as JSON.

Install packages and setup OpenAI API key

[ ]

[8]

Import Required Libraries

[9]

Prepare Text

Collect your text you want to analyze.

[16]

Build the Prompt

We construct a single-message template that instructs the model to extract keywords, their positions and scores and return the output as JSON object.

[17]

Initialize the Generator and Extract Keywords

We use OpenAIChatGenerator (e.g., gpt-4o-mini) to send our prompt and request a JSON-formatted response.

[ ]

Parse and Display Results

Finally, convert the returned JSON string into a Python object and iterate over the extracted keywords.

[19]

Keyword: artificial intelligence
 Positions: [0]
 Score: 1.0

Keyword: large language models
 Positions: [18]
 Score: 0.95

Keyword: healthcare
 Positions: [63]
 Score: 0.9

Keyword: finance
 Positions: [72]
 Score: 0.9

Keyword: education
 Positions: [81]
 Score: 0.9

Keyword: customer service
 Positions: [91]
 Score: 0.9

Keyword: natural language
 Positions: [108]
 Score: 0.85

Keyword: unstructured data
 Positions: [162]
 Score: 0.85

Keyword: key word extraction
 Positions: [193]
 Score: 0.8

Keyword: significant terms
 Positions: [215]
 Score: 0.8

Keyword: technical terminology
 Positions: [290]
 Score: 0.75

Keyword: domain-specific jargon
 Positions: [311]
 Score: 0.75

Keyword: named entities
 Positions: [334]
 Score: 0.7

Keyword: action verbs
 Positions: [352]
 Score: 0.7

Keyword: contextual relevance
 Positions: [367]
 Score: 0.7

Keyword: tokenization
 Positions: [406]
 Score: 0.65

Keyword: stopword removal
 Positions: [420]
 Score: 0.65

Keyword: part-of-speech tagging
 Positions: [437]
 Score: 0.65

Keyword: frequency analysis
 Positions: [457]
 Score: 0.65

Keyword: semantic relationship mapping
 Positions: [476]
 Score: 0.65

Keyword: essential information
 Positions: [508]
 Score: 0.6

Keyword: main topics
 Positions: [529]
 Score: 0.6