01 Keyword Querying Filtering
Keyword querying and filtering
This interactive notebook will introduce you to the basic Elasticsearch queries, using the official Elasticsearch Python client. Before getting started on this section you should work through our quick start, as you will be using the same dataset.
Install and import libraries
Create the client instance
Enable Telemetry
Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!
Test the Client
Before you continue, confirm that the client has connected with this test.
Pretty printing Elasticsearch responses
Let's add a helper function to print Elasticsearch responses in a readable format. This function is similar to the one that was used in the quickstart guide.
Querying
π NOTE: to run the queries that follow you need the book_index dataset from our quick start. If you haven't worked through the quick start, please follow the steps described there to create an Elasticsearch deployment with the dataset in it, and then come back to run the queries here.
In the query context, a query clause answers the question βHow well does this document match this query clause?β. In addition to deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field.
Full text queries
Full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.
- match. The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
- multi-match. The multi-field version of the match query.
Match query
Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
The match query is the standard query for performing a full-text search, including options for fuzzy matching.
ID: HwOa7osBiUNHLMdf3q2r Publication date: 2019-10-29 Title: The Pragmatic Programmer: Your Journey to Mastery Summary: A guide to pragmatic programming for software engineers and developers Publisher: addison-wesley Reviews: 30 Authors: ['andrew hunt', 'david thomas'] Score: 0.7042277 ID: IAOa7osBiUNHLMdf3q2r Publication date: 2019-05-03 Title: Python Crash Course Summary: A fast-paced, no-nonsense guide to programming in Python Publisher: no starch press Reviews: 42 Authors: ['eric matthes'] Score: 0.7042277 ID: JgOa7osBiUNHLMdf3q2r Publication date: 2011-05-13 Title: The Clean Coder: A Code of Conduct for Professional Programmers Summary: A guide to professional conduct in the field of software engineering Publisher: prentice hall Reviews: 20 Authors: ['robert c. martin'] Score: 0.6771651 ID: IgOa7osBiUNHLMdf3q2r Publication date: 2008-08-11 Title: Clean Code: A Handbook of Agile Software Craftsmanship Summary: A guide to writing code that is easy to read, understand and maintain Publisher: prentice hall Reviews: 55 Authors: ['robert c. martin'] Score: 0.62883455 ID: JQOa7osBiUNHLMdf3q2r Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 0.62883455
Multi-match query
The multi_match query builds on the match query to allow multi-field queries.
ID: JAOa7osBiUNHLMdf3q2r Publication date: 2018-12-04 Title: Eloquent JavaScript Summary: A modern introduction to programming Publisher: no starch press Reviews: 38 Authors: ['marijn haverbeke'] Score: 2.0307527 ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 1.7064086 ID: IwOa7osBiUNHLMdf3q2r Publication date: 2015-03-27 Title: You Don't Know JS: Up & Going Summary: Introduction to JavaScript and programming as a whole Publisher: oreilly Reviews: 36 Authors: ['kyle simpson'] Score: 1.6360576
Individual fields can be boosted with the caret (^) notation. Note in the following query how the score of the results that have "JavaScript" in their title is multiplied.
ID: JAOa7osBiUNHLMdf3q2r Publication date: 2018-12-04 Title: Eloquent JavaScript Summary: A modern introduction to programming Publisher: no starch press Reviews: 38 Authors: ['marijn haverbeke'] Score: 6.0922585 ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 5.1192265 ID: IwOa7osBiUNHLMdf3q2r Publication date: 2015-03-27 Title: You Don't Know JS: Up & Going Summary: Introduction to JavaScript and programming as a whole Publisher: oreilly Reviews: 36 Authors: ['kyle simpson'] Score: 1.6360576
Term-level Queries
You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.
Term search
Returns document that contain exactly the search term.
ID: HwOa7osBiUNHLMdf3q2r Publication date: 2019-10-29 Title: The Pragmatic Programmer: Your Journey to Mastery Summary: A guide to pragmatic programming for software engineers and developers Publisher: addison-wesley Reviews: 30 Authors: ['andrew hunt', 'david thomas'] Score: 1.4816045 ID: JQOa7osBiUNHLMdf3q2r Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 1.4816045
Range search
Returns documents that contain terms within a provided range.
The following example returns books that have at least 45 reviews.
ID: IgOa7osBiUNHLMdf3q2r Publication date: 2008-08-11 Title: Clean Code: A Handbook of Agile Software Craftsmanship Summary: A guide to writing code that is easy to read, understand and maintain Publisher: prentice hall Reviews: 55 Authors: ['robert c. martin'] Score: 1.0 ID: JQOa7osBiUNHLMdf3q2r Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 1.0 ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 1.0
ID: JAOa7osBiUNHLMdf3q2r Publication date: 2018-12-04 Title: Eloquent JavaScript Summary: A modern introduction to programming Publisher: no starch press Reviews: 38 Authors: ['marijn haverbeke'] Score: 1.0 ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 1.0
Fuzzy search
Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
An edit distance is the number of one-character changes needed to turn one term into another. These changes can include:
- Changing a character (box β fox)
- Removing a character (black β lack)
- Inserting a character (sic β sick)
- Transposing two adjacent characters (act β cat)
ID: JAOa7osBiUNHLMdf3q2r Publication date: 2018-12-04 Title: Eloquent JavaScript Summary: A modern introduction to programming Publisher: no starch press Reviews: 38 Authors: ['marijn haverbeke'] Score: 1.6246022 ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 1.3651271
Combining Query Conditions
Compound queries wrap other compound or leaf queries, either to combine their results and scores, or to change their behaviour. They also allow you to switch from query to filter context, but that will be covered later in the Filtering section.
bool.must (AND)
The clauses must appear in matching documents and will contribute to the score. This effectively performs an "AND" logical operation on the given sub-queries.
ID: JQOa7osBiUNHLMdf3q2r Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 3.788629
bool.should (OR)
The clause should appear in the matching document. This performs an "OR" logical operation on the given sub-queries.
ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 2.3070245 ID: HwOa7osBiUNHLMdf3q2r Publication date: 2019-10-29 Title: The Pragmatic Programmer: Your Journey to Mastery Summary: A guide to pragmatic programming for software engineers and developers Publisher: addison-wesley Reviews: 30 Authors: ['andrew hunt', 'david thomas'] Score: 1.4816045 ID: JQOa7osBiUNHLMdf3q2r Publication date: 1994-10-31 Title: Design Patterns: Elements of Reusable Object-Oriented Software Summary: Guide to design patterns that can be used in any object-oriented language Publisher: addison-wesley Reviews: 45 Authors: ['erich gamma', 'richard helm', 'ralph johnson', 'john vlissides'] Score: 1.4816045
Filtering
In a filter context, a query clause answers the question βDoes this document match this query clause?β The answer is a simple Yes or Noβββno scores are calculated. Filter context is mostly used for filtering structured data, for example:
- Does this
timestampfall into the range 2015 to 2016? - Is the
statusfield set to"published"?
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query.
bool.filter
The clause (query) must appear for the document to be included in the results. Unlike query context searches such as term, bool.must or bool.should, a matching score isn't calculated because filter clauses are executed in filter context.
ID: IgOa7osBiUNHLMdf3q2r Publication date: 2008-08-11 Title: Clean Code: A Handbook of Agile Software Craftsmanship Summary: A guide to writing code that is easy to read, understand and maintain Publisher: prentice hall Reviews: 55 Authors: ['robert c. martin'] Score: 0.0 ID: JgOa7osBiUNHLMdf3q2r Publication date: 2011-05-13 Title: The Clean Coder: A Code of Conduct for Professional Programmers Summary: A guide to professional conduct in the field of software engineering Publisher: prentice hall Reviews: 20 Authors: ['robert c. martin'] Score: 0.0
bool.must_not
The clause (query) must not appear in the matching documents. Because this query also runs in filter context, no scores are calculated; the filter just determines if a document is included in the results or not.
ID: IgOa7osBiUNHLMdf3q2r Publication date: 2008-08-11 Title: Clean Code: A Handbook of Agile Software Craftsmanship Summary: A guide to writing code that is easy to read, understand and maintain Publisher: prentice hall Reviews: 55 Authors: ['robert c. martin'] Score: 0.0 ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 0.0
Using Filters with Queries
Filters are often added to search queries with the intention of limiting the search to a subset of the documents. A filter can cleanly eliminate documents from a search, without altering the relevance scores of the results.
The next example returns books that have the word "javascript" in their title, only among the books that have more than 45 reviews.
ID: JwOa7osBiUNHLMdf3q2r Publication date: 2008-05-15 Title: JavaScript: The Good Parts Summary: A deep dive into the parts of JavaScript that are essential to writing maintainable code Publisher: oreilly Reviews: 51 Authors: ['douglas crockford'] Score: 1.7064086