Notebooks
d
deepset
Keyword Extraction

Keyword Extraction

agentic-aiagenticagentsgenaiAIhaystack-cookbookgenai-usecaseshaystack-ainotebooksPythonragai-tools

Keyword Extraction with LLM Chat Generator

This notebook demonstrates how to extract keywords and key phrases from text using Haystack’s ChatPromptBuilder together with an LLM via OpenAIChatGenerator. We will:

  • Define a prompt that instructs the model to identify single- and multi-word keywords.

  • Capture each keyword’s character offsets.

  • Assign a relevance score (0–1).

  • Parse and display the results as JSON.

Install packages and setup OpenAI API key

[ ]
[8]

Import Required Libraries

[9]

Prepare Text

Collect your text you want to analyze.

[16]

Build the Prompt

We construct a single-message template that instructs the model to extract keywords, their positions and scores and return the output as JSON object.

[17]

Initialize the Generator and Extract Keywords

We use OpenAIChatGenerator (e.g., gpt-4o-mini) to send our prompt and request a JSON-formatted response.

[ ]

Parse and Display Results

Finally, convert the returned JSON string into a Python object and iterate over the extracted keywords.

[19]
Keyword: artificial intelligence
 Positions: [0]
 Score: 1.0

Keyword: large language models
 Positions: [18]
 Score: 0.95

Keyword: healthcare
 Positions: [63]
 Score: 0.9

Keyword: finance
 Positions: [72]
 Score: 0.9

Keyword: education
 Positions: [81]
 Score: 0.9

Keyword: customer service
 Positions: [91]
 Score: 0.9

Keyword: natural language
 Positions: [108]
 Score: 0.85

Keyword: unstructured data
 Positions: [162]
 Score: 0.85

Keyword: key word extraction
 Positions: [193]
 Score: 0.8

Keyword: significant terms
 Positions: [215]
 Score: 0.8

Keyword: technical terminology
 Positions: [290]
 Score: 0.75

Keyword: domain-specific jargon
 Positions: [311]
 Score: 0.75

Keyword: named entities
 Positions: [334]
 Score: 0.7

Keyword: action verbs
 Positions: [352]
 Score: 0.7

Keyword: contextual relevance
 Positions: [367]
 Score: 0.7

Keyword: tokenization
 Positions: [406]
 Score: 0.65

Keyword: stopword removal
 Positions: [420]
 Score: 0.65

Keyword: part-of-speech tagging
 Positions: [437]
 Score: 0.65

Keyword: frequency analysis
 Positions: [457]
 Score: 0.65

Keyword: semantic relationship mapping
 Positions: [476]
 Score: 0.65

Keyword: essential information
 Positions: [508]
 Score: 0.6

Keyword: main topics
 Positions: [529]
 Score: 0.6