Keyword Extraction
Keyword Extraction with LLM Chat Generator
This notebook demonstrates how to extract keywords and key phrases from text using Haystack’s ChatPromptBuilder together with an LLM via OpenAIChatGenerator. We will:
-
Define a prompt that instructs the model to identify single- and multi-word keywords.
-
Capture each keyword’s character offsets.
-
Assign a relevance score (0–1).
-
Parse and display the results as JSON.
Install packages and setup OpenAI API key
Import Required Libraries
Prepare Text
Collect your text you want to analyze.
Build the Prompt
We construct a single-message template that instructs the model to extract keywords, their positions and scores and return the output as JSON object.
Initialize the Generator and Extract Keywords
We use OpenAIChatGenerator (e.g., gpt-4o-mini) to send our prompt and request a JSON-formatted response.
Parse and Display Results
Finally, convert the returned JSON string into a Python object and iterate over the extracted keywords.
Keyword: artificial intelligence Positions: [0] Score: 1.0 Keyword: large language models Positions: [18] Score: 0.95 Keyword: healthcare Positions: [63] Score: 0.9 Keyword: finance Positions: [72] Score: 0.9 Keyword: education Positions: [81] Score: 0.9 Keyword: customer service Positions: [91] Score: 0.9 Keyword: natural language Positions: [108] Score: 0.85 Keyword: unstructured data Positions: [162] Score: 0.85 Keyword: key word extraction Positions: [193] Score: 0.8 Keyword: significant terms Positions: [215] Score: 0.8 Keyword: technical terminology Positions: [290] Score: 0.75 Keyword: domain-specific jargon Positions: [311] Score: 0.75 Keyword: named entities Positions: [334] Score: 0.7 Keyword: action verbs Positions: [352] Score: 0.7 Keyword: contextual relevance Positions: [367] Score: 0.7 Keyword: tokenization Positions: [406] Score: 0.65 Keyword: stopword removal Positions: [420] Score: 0.65 Keyword: part-of-speech tagging Positions: [437] Score: 0.65 Keyword: frequency analysis Positions: [457] Score: 0.65 Keyword: semantic relationship mapping Positions: [476] Score: 0.65 Keyword: essential information Positions: [508] Score: 0.6 Keyword: main topics Positions: [529] Score: 0.6