Notebooks
E
Elastic
01 Keyword Querying Filtering

01 Keyword Querying Filtering

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

Keyword querying and filtering

Open In Colab

This interactive notebook will introduce you to the basic Elasticsearch queries, using the official Elasticsearch Python client. Before getting started on this section you should work through our quick start, as you will be using the same dataset.

Install and import libraries

[ ]
[2]

Create the client instance

[3]

Enable Telemetry

Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!

[ ]

Test the Client

Before you continue, confirm that the client has connected with this test.

[ ]

Pretty printing Elasticsearch responses

Let's add a helper function to print Elasticsearch responses in a readable format. This function is similar to the one that was used in the quickstart guide.

[4]

Querying

πŸ” NOTE: to run the queries that follow you need the book_index dataset from our quick start. If you haven't worked through the quick start, please follow the steps described there to create an Elasticsearch deployment with the dataset in it, and then come back to run the queries here.

In the query context, a query clause answers the question β€œHow well does this document match this query clause?”. In addition to deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field.

Full text queries

Full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.

  • match. The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
  • multi-match. The multi-field version of the match query.

Match query

Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.

The match query is the standard query for performing a full-text search, including options for fuzzy matching.

Read more.

[5]

ID: HwOa7osBiUNHLMdf3q2r
Publication date: 2019-10-29
Title: The Pragmatic Programmer: Your Journey to Mastery
Summary: A guide to pragmatic programming for software engineers and developers
Publisher: addison-wesley
Reviews: 30
Authors: ['andrew hunt', 'david thomas']
Score: 0.7042277

ID: IAOa7osBiUNHLMdf3q2r
Publication date: 2019-05-03
Title: Python Crash Course
Summary: A fast-paced, no-nonsense guide to programming in Python
Publisher: no starch press
Reviews: 42
Authors: ['eric matthes']
Score: 0.7042277

ID: JgOa7osBiUNHLMdf3q2r
Publication date: 2011-05-13
Title: The Clean Coder: A Code of Conduct for Professional Programmers
Summary: A guide to professional conduct in the field of software engineering
Publisher: prentice hall
Reviews: 20
Authors: ['robert c. martin']
Score: 0.6771651

ID: IgOa7osBiUNHLMdf3q2r
Publication date: 2008-08-11
Title: Clean Code: A Handbook of Agile Software Craftsmanship
Summary: A guide to writing code that is easy to read, understand and maintain
Publisher: prentice hall
Reviews: 55
Authors: ['robert c. martin']
Score: 0.62883455

ID: JQOa7osBiUNHLMdf3q2r
Publication date: 1994-10-31
Title: Design Patterns: Elements of Reusable Object-Oriented Software
Summary: Guide to design patterns that can be used in any object-oriented language
Publisher: addison-wesley
Reviews: 45
Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides']
Score: 0.62883455

Multi-match query

The multi_match query builds on the match query to allow multi-field queries.

Read more.

[6]

ID: JAOa7osBiUNHLMdf3q2r
Publication date: 2018-12-04
Title: Eloquent JavaScript
Summary: A modern introduction to programming
Publisher: no starch press
Reviews: 38
Authors: ['marijn haverbeke']
Score: 2.0307527

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 1.7064086

ID: IwOa7osBiUNHLMdf3q2r
Publication date: 2015-03-27
Title: You Don't Know JS: Up & Going
Summary: Introduction to JavaScript and programming as a whole
Publisher: oreilly
Reviews: 36
Authors: ['kyle simpson']
Score: 1.6360576

Individual fields can be boosted with the caret (^) notation. Note in the following query how the score of the results that have "JavaScript" in their title is multiplied.

[7]

ID: JAOa7osBiUNHLMdf3q2r
Publication date: 2018-12-04
Title: Eloquent JavaScript
Summary: A modern introduction to programming
Publisher: no starch press
Reviews: 38
Authors: ['marijn haverbeke']
Score: 6.0922585

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 5.1192265

ID: IwOa7osBiUNHLMdf3q2r
Publication date: 2015-03-27
Title: You Don't Know JS: Up & Going
Summary: Introduction to JavaScript and programming as a whole
Publisher: oreilly
Reviews: 36
Authors: ['kyle simpson']
Score: 1.6360576

Term-level Queries

You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.

Term search

Returns document that contain exactly the search term.

[8]

ID: HwOa7osBiUNHLMdf3q2r
Publication date: 2019-10-29
Title: The Pragmatic Programmer: Your Journey to Mastery
Summary: A guide to pragmatic programming for software engineers and developers
Publisher: addison-wesley
Reviews: 30
Authors: ['andrew hunt', 'david thomas']
Score: 1.4816045

ID: JQOa7osBiUNHLMdf3q2r
Publication date: 1994-10-31
Title: Design Patterns: Elements of Reusable Object-Oriented Software
Summary: Guide to design patterns that can be used in any object-oriented language
Publisher: addison-wesley
Reviews: 45
Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides']
Score: 1.4816045

Range search

Returns documents that contain terms within a provided range.

The following example returns books that have at least 45 reviews.

[9]

ID: IgOa7osBiUNHLMdf3q2r
Publication date: 2008-08-11
Title: Clean Code: A Handbook of Agile Software Craftsmanship
Summary: A guide to writing code that is easy to read, understand and maintain
Publisher: prentice hall
Reviews: 55
Authors: ['robert c. martin']
Score: 1.0

ID: JQOa7osBiUNHLMdf3q2r
Publication date: 1994-10-31
Title: Design Patterns: Elements of Reusable Object-Oriented Software
Summary: Guide to design patterns that can be used in any object-oriented language
Publisher: addison-wesley
Reviews: 45
Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides']
Score: 1.0

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 1.0

Prefix search

Returns documents that contain a specific prefix in a provided field.

Read more

[10]

ID: JAOa7osBiUNHLMdf3q2r
Publication date: 2018-12-04
Title: Eloquent JavaScript
Summary: A modern introduction to programming
Publisher: no starch press
Reviews: 38
Authors: ['marijn haverbeke']
Score: 1.0

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 1.0

Fuzzy search

Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.

An edit distance is the number of one-character changes needed to turn one term into another. These changes can include:

  • Changing a character (box β†’ fox)
  • Removing a character (black β†’ lack)
  • Inserting a character (sic β†’ sick)
  • Transposing two adjacent characters (act β†’ cat)

Read more

[11]

ID: JAOa7osBiUNHLMdf3q2r
Publication date: 2018-12-04
Title: Eloquent JavaScript
Summary: A modern introduction to programming
Publisher: no starch press
Reviews: 38
Authors: ['marijn haverbeke']
Score: 1.6246022

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 1.3651271

Combining Query Conditions

Compound queries wrap other compound or leaf queries, either to combine their results and scores, or to change their behaviour. They also allow you to switch from query to filter context, but that will be covered later in the Filtering section.

bool.must (AND)

The clauses must appear in matching documents and will contribute to the score. This effectively performs an "AND" logical operation on the given sub-queries.

[12]

ID: JQOa7osBiUNHLMdf3q2r
Publication date: 1994-10-31
Title: Design Patterns: Elements of Reusable Object-Oriented Software
Summary: Guide to design patterns that can be used in any object-oriented language
Publisher: addison-wesley
Reviews: 45
Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides']
Score: 3.788629

bool.should (OR)

The clause should appear in the matching document. This performs an "OR" logical operation on the given sub-queries.

[13]

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 2.3070245

ID: HwOa7osBiUNHLMdf3q2r
Publication date: 2019-10-29
Title: The Pragmatic Programmer: Your Journey to Mastery
Summary: A guide to pragmatic programming for software engineers and developers
Publisher: addison-wesley
Reviews: 30
Authors: ['andrew hunt', 'david thomas']
Score: 1.4816045

ID: JQOa7osBiUNHLMdf3q2r
Publication date: 1994-10-31
Title: Design Patterns: Elements of Reusable Object-Oriented Software
Summary: Guide to design patterns that can be used in any object-oriented language
Publisher: addison-wesley
Reviews: 45
Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides']
Score: 1.4816045

Filtering

In a filter context, a query clause answers the question β€œDoes this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, for example:

  • Does this timestamp fall into the range 2015 to 2016?
  • Is the status field set to "published"?

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query.

Read more

bool.filter

The clause (query) must appear for the document to be included in the results. Unlike query context searches such as term, bool.must or bool.should, a matching score isn't calculated because filter clauses are executed in filter context.

[14]

ID: IgOa7osBiUNHLMdf3q2r
Publication date: 2008-08-11
Title: Clean Code: A Handbook of Agile Software Craftsmanship
Summary: A guide to writing code that is easy to read, understand and maintain
Publisher: prentice hall
Reviews: 55
Authors: ['robert c. martin']
Score: 0.0

ID: JgOa7osBiUNHLMdf3q2r
Publication date: 2011-05-13
Title: The Clean Coder: A Code of Conduct for Professional Programmers
Summary: A guide to professional conduct in the field of software engineering
Publisher: prentice hall
Reviews: 20
Authors: ['robert c. martin']
Score: 0.0

bool.must_not

The clause (query) must not appear in the matching documents. Because this query also runs in filter context, no scores are calculated; the filter just determines if a document is included in the results or not.

[15]

ID: IgOa7osBiUNHLMdf3q2r
Publication date: 2008-08-11
Title: Clean Code: A Handbook of Agile Software Craftsmanship
Summary: A guide to writing code that is easy to read, understand and maintain
Publisher: prentice hall
Reviews: 55
Authors: ['robert c. martin']
Score: 0.0

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 0.0

Using Filters with Queries

Filters are often added to search queries with the intention of limiting the search to a subset of the documents. A filter can cleanly eliminate documents from a search, without altering the relevance scores of the results.

The next example returns books that have the word "javascript" in their title, only among the books that have more than 45 reviews.

[16]

ID: JwOa7osBiUNHLMdf3q2r
Publication date: 2008-05-15
Title: JavaScript: The Good Parts
Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code
Publisher: oreilly
Reviews: 51
Authors: ['douglas crockford']
Score: 1.7064086