Esql Getting Started
Getting started with ES|QL
In this notebook you'll learn the basics of the Elasticsearch Query Language (ES|QL). You'll be using the official Elasticsearch Python client.
You'll learn how to:
- Run an ES|QL query
- Use processing commands
- Sort a table
- Query data
- Chain processing commands
- Compute values
- Calculate statistics
- Access columns
- Create a histogram
- Enrich data
- Process data
ℹ️ ES|QL is generally available as of Elastic stack version 8.14.0.
Create Elastic Cloud deployment
If you don't have an Elastic Cloud deployment, sign up here for a free trial.
Once logged in to your Elastic Cloud account, go to the Create deployment page and select Create deployment. Leave all settings with their default values.
Install packages and import modules
To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.
First we need to install the elasticsearch Python client.
Collecting elasticsearch
Downloading elasticsearch-8.14.0-py3-none-any.whl (480 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.2/480.2 kB 3.0 MB/s eta 0:00:00
Collecting elastic-transport<9,>=8.13 (from elasticsearch)
Downloading elastic_transport-8.13.1-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.5/64.5 kB 5.3 MB/s eta 0:00:00
Requirement already satisfied: urllib3<3,>=1.26.2 in /usr/local/lib/python3.10/dist-packages (from elastic-transport<9,>=8.13->elasticsearch) (2.0.7)
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from elastic-transport<9,>=8.13->elasticsearch) (2024.6.2)
Installing collected packages: elastic-transport, elasticsearch
Successfully installed elastic-transport-8.13.1 elasticsearch-8.14.0
Initialize the Elasticsearch client
Now we can instantiate the Elasticsearch Python client, providing the CLOUD ID and API key for your deployment.
ℹ️ If you're running Elasticsearch locally or on self-managed infrastructure, you'll need to pass in the Elasticsearch host instead. Read the docs about how to connect to Elasticsearch locally.
Elastic Cloud ID: ·········· Elastic API Key: ··········
Add sample data to Elasticsearch
Before we index our sample dataset, let's create an index named sample_data with the correct mappings.
Now we can index the data using the Elasticsearch Python client's bulk helpers.
Successfully indexed 7 documents.
Your first ES|QL query
Each ES|QL query starts with a source command. A source command produces a table, typically with data from Elasticsearch.
The FROM source command returns a table with documents from a data stream, index, or alias. Each row in the resulting table represents a document. This query returns up to 500 documents from the sample_data index:
@timestamp | client_ip | event_duration | message -------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1
Each column corresponds to a field, and can be accessed by the name of that field.
ℹ️ ES|QL keywords are case-insensitive. FROM sample_data is identical to from sample_data.
Processing commands
A source command can be followed by one or more processing commands, separated by a pipe character: |. Processing commands change an input table by adding, removing, or changing rows and columns. Processing commands can perform filtering, projection, aggregation, and more.
For example, you can use the LIMIT command to limit the number of rows that are returned, up to a maximum of 10,000 rows:
@timestamp | client_ip | event_duration | message -------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected
Sort a table
Another processing command is the SORT command. By default, the rows returned by FROM don’t have a defined sort order. Use the SORT command to sort rows on one or more columns:
@timestamp | client_ip | event_duration | message -------------------------------------------------------------------------------- 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3
Query the data
Use the WHERE command to query the data. For example, to find all events with a duration longer than 5ms:
@timestamp | client_ip | event_duration | message -------------------------------------------------------------------------- 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error
@timestamp | client_ip | event_duration | message -------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1
Chain processing commands
You can chain processing commands, separated by a pipe character: |. Each
processing command works on the output table of the previous command. The result
of a query is the table produced by the final processing command.
The following example first sorts the table on @timestamp, and next limits the
result set to 3 rows:
@timestamp | client_ip | event_duration | message ------------------------------------------------------------------------------- 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error
ℹ️ The order of processing commands is important. First limiting the result set to 3 rows before sorting those 3 rows would most likely return a result that is different than this example, where the sorting comes before the limit.
Compute values
Use the EVAL command to append columns to a table, with calculated values. For example, the following query appends a duration_ms column. The values in the column are computed by dividing event_duration by 1,000,000. In other words: event_duration converted from nanoseconds to milliseconds.
@timestamp | client_ip | event_duration | message | duration_ms ---------------------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 3.450233 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 2.764889 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | 1.232382 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | 0.725448 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | 8.268153 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | 5.033755 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 1.756467
@timestamp | client_ip | event_duration | message | duration_ms ---------------------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 3.5 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 2.8 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | 1.2 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | 0.7 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | 8.3 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | 5.0 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 1.8
Calculate statistics
You can also use ES|QL to aggregate your data. Use the STATS ... BY command to calculate statistics.
For example, to calculate the median duration:
median_duration --------------- 2764889.0
You can calculate multiple stats with one command:
median_duration | max_duration ------------------------------ 2764889.0 | 8268153
Use BY to group calculated stats by one or more columns. For example, to calculate the median duration per client IP:
median_duration | client_ip ------------------------------ 1232382.0 | 172.21.0.5 2764889.0 | 172.21.2.113 3450233.0 | 172.21.2.162 3395111.0 | 172.21.3.15
Access columns
You can access columns by their name. If a name contains special characters, it needs to be quoted with backticks (`).
Assigning an explicit name to a column created by EVAL or STATS is optional. If you don’t provide a name, the new column name is equal to the function expression. For example:
@timestamp | client_ip | event_duration | message | event_duration/1000000.0 ----------------------------------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 3.450233 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 2.764889 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | 1.232382 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | 0.725448 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | 8.268153 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | 5.033755 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 1.756467
In this query, EVAL adds a new column named event_duration/1000000.0. Because its name contains special characters, to access this column, quote it with backticks:
MEDIAN(`event_duration/1000000.0`) ---------------------------------- 2.764889
Create a histogram
To track statistics over time, ES|QL enables you to create histograms using the BUCKET function. BUCKET creates human-friendly bucket sizes and returns a value for each row that corresponds to the resulting bucket the row falls into.
ℹ️ The
BUCKETfunction must be used together with theSTATS ... BYcommand. It replaces theAUTO_BUCKETfunction which was removed in 8.14.0.
For example, the following query creates hourly buckets for the data on October 23rd and creates a histogram to count the number of events per hour.
COUNT(*) | bucket ----------------------------------- 2 | 2023-10-23T12:00:00.000Z 5 | 2023-10-23T13:00:00.000Z
Or the median duration per hour:
median_duration | bucket ------------------------------------------ 3107561.0 | 2023-10-23T12:00:00.000Z 1756467.0 | 2023-10-23T13:00:00.000Z
Enrich data
ES|QL enables you to enrich a table with data from indices in Elasticsearch, using the ENRICH command.
ℹ️ Before you can use
ENRICH, you first need to create and execute an enrich policy.
The following requests create and execute a policy called clientip_policy. The policy links an IP address to an environment ("Development", "QA", or "Production").
ObjectApiResponse({'status': {'phase': 'COMPLETE'}}) After creating and executing a policy, you can use it with the ENRICH command:
@timestamp | event_duration | client_ip | env ---------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 3450233 | 172.21.2.162 | QA 2023-10-23T12:27:28.948Z | 2764889 | 172.21.2.113 | QA 2023-10-23T13:33:34.937Z | 1232382 | 172.21.0.5 | Development 2023-10-23T13:51:54.732Z | 725448 | 172.21.3.15 | Production 2023-10-23T13:52:55.015Z | 8268153 | 172.21.3.15 | Production 2023-10-23T13:53:55.832Z | 5033755 | 172.21.3.15 | Production 2023-10-23T13:55:01.543Z | 1756467 | 172.21.3.15 | Production
You can use the new env column that’s added by the ENRICH command in subsequent commands. For example, to calculate the median duration per environment:
median_duration | env ----------------------------- 3107561.0 | QA 1232382.0 | Development 3395111.0 | Production
For more about data enrichment with ES|QL, refer to Data enrichment.
Process data
Your data may contain unstructured strings that you want to structure to make it easier to analyze the data. For example, the sample data contains log messages like:
"Connected to 10.1.0.3"
By extracting the IP address from these messages, you can determine which IP has accepted the most client connections.
To structure unstructured strings at query time, you can use the ES|QL DISSECT and GROK commands. DISSECT works by breaking up a string using a delimiter-based pattern. GROK works similarly, but uses regular expressions. This makes GROK more powerful, but generally also slower.
In this case, no regular expressions are needed, as the message is straightforward: "Connected to ", followed by the server IP. To match this string, you can use the following DISSECT command:
@timestamp | client_ip | event_duration | message | server_ip -------------------------------------------------------------------------------------------- 2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233 | Connected to 10.1.0.3 | 10.1.0.3 2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889 | Connected to 10.1.0.2 | 10.1.0.2 2023-10-23T13:33:34.937Z | 172.21.0.5 | 1232382 | Disconnected | None 2023-10-23T13:51:54.732Z | 172.21.3.15 | 725448 | Connection error | None 2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153 | Connection error | None 2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755 | Connection error | None 2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467 | Connected to 10.1.0.1 | 10.1.0.1
This adds a server_ip column to those rows that have a message that matches this pattern. For other rows, the value of server_ip is null.
You can use the new server_ip column that’s added by the DISSECT command in subsequent commands. For example, to determine how many connections each server has accepted:
COUNT(*) | server_ip -------------------- 1 | 10.1.0.3 1 | 10.1.0.2 1 | 10.1.0.1
ℹ️ To learn more about data processing with ES|QL, refer to Data processing with DISSECT and GROK.