08 Learning To Rank
How to train and deploy Learning To Rank
In this notebook, we'll:
- Connect to an Elasticsearch deployment using the official Python client.
- Import and index a movie dataset into Elasticsearch.
- Extract features from our dataset using Elasticsearch's Query DSL, including custom
script_scorequeries. - Build a training dataset by combining extracted features with a human curated judgment list.
- Train a Learning To Rank model using XGBoost.
- Deploy the trained model to Elasticsearch using Eland.
- Use the model as a rescorer for second stage re-ranking.
- Evaluate the impact of the LTR model on search relevance, by comparing search results before and after applying the model.
NOTE:
- Learning To Rank is generally available for Elastic Stack versions 8.15.0 and newer and requires an Enterprise subscription or higher.
Install required packages
First we must install the packages we need for this notebook.
Configure your Elasticsearch deployment
For this example, we will be using an Elastic Cloud deployment (available with a free trial).
Enable Telemetry
Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!
Test the Client
Before you continue, confirm that the client has connected with this test.
Configure the dataset
We'll use a dataset derived from the MSRD (Movie Search Ranking Dataset).
The dataset is available here and contains the following files:
movies_corpus.jsonl.gz: Movie dataset to be indexed.movies_judgements.tsv.gz: Judgment list of relevance judgments for a set of queries.movies_index_settings.json: Settings to be applied to the documents and index.
Import the document corpus
This step will import the documents of the corpus into the movies index .
Documents contains the following fields:
| Field name | Description |
|---|---|
id | Id of the document |
title | Movie title |
overview | A short description of the movie |
actors | List of actors in the movies |
director | Director of the movie |
characters | List of characters that appear in the movie |
genres | Genres of the movie |
year | Year the movie was released |
budget | Budget of the movies in USD |
votes | Number of votes received by the movie |
rating | Average rating of the movie |
popularity | Number use to measure the movie popularity |
tags | A list of tags for the movies |
Deleting index if it already exists: movies Creating index: movies Loading the corpus from https://raw.githubusercontent.com/elastic/elasticsearch-labs/ltr-notebook/notebooks/search/sample_data/learning-to-rank/movies-corpus.jsonl.gz Indexing the corpus into movies ... Indexed 9750 documents into movies
Loading the judgment list
The judgment list contains human evaluations that we'll use to train our Learning To Rank model.
Each row represents a query-document pair with an associated relevance grade and contains the following columns:
| Column | Description |
|---|---|
query_id | Pairs for the same query are grouped together and received a unique id. |
query | Actual text of the query. |
doc_id | ID of the document. |
grade | The relevance grade of the document for the query. |
Note:
In this example the relevance grade is a binary value (relevant or not relavant).
You could also use a number that represents the degree of relevance (e.g. from 0 to 4).
Configure feature extraction
Features are the inputs to our model. They represent information about the query alone, a result document alone, or a result document in the context of a query, such as BM25 scores.
Features are defined using standard templated queries and the Query DSL.
To streamline the process of defining and refining feature extraction during training, we have incorporated a number of primitives directly in eland.
Building the training dataset
Now that we have our basic datasets loaded, and feature extraction configured, we'll use our judgment list to come up with the final dataset for training. The dataset will consist of rows containing <query, document> pairs, as well as all of the features we need to train the model. To generate this dataset, we'll run each query from the judgment list and add the extracted features as columns for each of the labelled result documents.
For example, if we have a query q1 with two labelled documents d3 and d9, the training dataset will end up with two rows — one for each of the pairs <q1, d3> and <q1, d9>.
Note that because this executes queries on your Elasticsearch cluster, the time to run this operation will vary depending on where the cluster is hosted and where this notebook runs. For example, if you run the notebook on the same server or host as the Elasticsearch cluster, this operation tends to run very quickly on the sample dataset (< 2 mins).
0%| | 0/16279 [00:00<?, ?it/s]100%|██████████| 16279/16279 [01:38<00:00, 165.18it/s]
Create and train the model
The LTR rescorer supports XGBRanker trained models.
Learn more in the XGBoost documentation.
[0] validation_0-ndcg@10:0.85757 [1] validation_0-ndcg@10:0.86397 [2] validation_0-ndcg@10:0.86582 [3] validation_0-ndcg@10:0.86694 [4] validation_0-ndcg@10:0.86738 [5] validation_0-ndcg@10:0.86704 [6] validation_0-ndcg@10:0.86777 [7] validation_0-ndcg@10:0.86823 [8] validation_0-ndcg@10:0.86925 [9] validation_0-ndcg@10:0.86903 [10] validation_0-ndcg@10:0.86973 [11] validation_0-ndcg@10:0.87008 [12] validation_0-ndcg@10:0.86990 [13] validation_0-ndcg@10:0.87030 [14] validation_0-ndcg@10:0.87067 [15] validation_0-ndcg@10:0.87027 [16] validation_0-ndcg@10:0.87144 [17] validation_0-ndcg@10:0.87159 [18] validation_0-ndcg@10:0.87195 [19] validation_0-ndcg@10:0.87159 [20] validation_0-ndcg@10:0.87171 [21] validation_0-ndcg@10:0.87234 [22] validation_0-ndcg@10:0.87243 [23] validation_0-ndcg@10:0.87256 [24] validation_0-ndcg@10:0.87294 [25] validation_0-ndcg@10:0.87327 [26] validation_0-ndcg@10:0.87371 [27] validation_0-ndcg@10:0.87406 [28] validation_0-ndcg@10:0.87410 [29] validation_0-ndcg@10:0.87426 [30] validation_0-ndcg@10:0.87455 [31] validation_0-ndcg@10:0.87485 [32] validation_0-ndcg@10:0.87482 [33] validation_0-ndcg@10:0.87499 [34] validation_0-ndcg@10:0.87505 [35] validation_0-ndcg@10:0.87557 [36] validation_0-ndcg@10:0.87594 [37] validation_0-ndcg@10:0.87592 [38] validation_0-ndcg@10:0.87618 [39] validation_0-ndcg@10:0.87623 [40] validation_0-ndcg@10:0.87648 [41] validation_0-ndcg@10:0.87632 [42] validation_0-ndcg@10:0.87657 [43] validation_0-ndcg@10:0.87670 [44] validation_0-ndcg@10:0.87724 [45] validation_0-ndcg@10:0.87766 [46] validation_0-ndcg@10:0.87765 [47] validation_0-ndcg@10:0.87744 [48] validation_0-ndcg@10:0.87800 [49] validation_0-ndcg@10:0.87824 [50] validation_0-ndcg@10:0.87822 [51] validation_0-ndcg@10:0.87838 [52] validation_0-ndcg@10:0.87867 [53] validation_0-ndcg@10:0.87869 [54] validation_0-ndcg@10:0.87873 [55] validation_0-ndcg@10:0.87878 [56] validation_0-ndcg@10:0.87899 [57] validation_0-ndcg@10:0.87907 [58] validation_0-ndcg@10:0.87891 [59] validation_0-ndcg@10:0.87909 [60] validation_0-ndcg@10:0.87914 [61] validation_0-ndcg@10:0.87934 [62] validation_0-ndcg@10:0.87920 [63] validation_0-ndcg@10:0.87930 [64] validation_0-ndcg@10:0.87915 [65] validation_0-ndcg@10:0.87913 [66] validation_0-ndcg@10:0.87956 [67] validation_0-ndcg@10:0.87952 [68] validation_0-ndcg@10:0.88009 [69] validation_0-ndcg@10:0.88007 [70] validation_0-ndcg@10:0.87995 [71] validation_0-ndcg@10:0.87988 [72] validation_0-ndcg@10:0.88003 [73] validation_0-ndcg@10:0.88031 [74] validation_0-ndcg@10:0.88023 [75] validation_0-ndcg@10:0.88025 [76] validation_0-ndcg@10:0.88039 [77] validation_0-ndcg@10:0.88038 [78] validation_0-ndcg@10:0.88064 [79] validation_0-ndcg@10:0.88053 [80] validation_0-ndcg@10:0.88062 [81] validation_0-ndcg@10:0.88067 [82] validation_0-ndcg@10:0.88077 [83] validation_0-ndcg@10:0.88131 [84] validation_0-ndcg@10:0.88132 [85] validation_0-ndcg@10:0.88128 [86] validation_0-ndcg@10:0.88164 [87] validation_0-ndcg@10:0.88171 [88] validation_0-ndcg@10:0.88180 [89] validation_0-ndcg@10:0.88206 [90] validation_0-ndcg@10:0.88209 [91] validation_0-ndcg@10:0.88195 [92] validation_0-ndcg@10:0.88197 [93] validation_0-ndcg@10:0.88209 [94] validation_0-ndcg@10:0.88189 [95] validation_0-ndcg@10:0.88240 [96] validation_0-ndcg@10:0.88259 [97] validation_0-ndcg@10:0.88265 [98] validation_0-ndcg@10:0.88268 [99] validation_0-ndcg@10:0.88272
Import the model into Elasticsearch
Once the model is trained we can use Eland to load it into Elasticsearch.
Please note that the MLModel.import_ltr_model method contains the LTRModelConfig object which defines how features should be extracted for the model being imported.
<eland.ml.ml_model.MLModel at 0x2ae5734c0>
Using the rescorer
Once the model is uploaded to Elasticsearch, you will be able to use it as a rescorer in the _search API, as shown in this example:
GET /movies/_search
{
"query" : {
"multi_match" : {
"query": "star wars",
"fields": ["title", "overview", "actors", "director", "tags", "characters"]
}
},
"rescore" : {
"window_size" : 50,
"learning_to_rank" : {
"model_id": "ltr-model-xgboost",
"params": {
"query": "star wars"
}
}
}
}
[('Star Wars', 10.971989, '11'),
, ('Star Wars: The Clone Wars', 9.923633, '12180'),
, ('Andor: A Disney+ Day Special Look', 8.9880295, '1022100'),
, ("Family Guy Presents: It's a Trap!", 8.845748, '278427'),
, ('Star Wars: The Rise of Skywalker', 8.053349, '181812'),
, ('Star Wars: The Force Awakens', 8.053349, '140607'),
, ('Star Wars: The Last Jedi', 8.053349, '181808'),
, ('Solo: A Star Wars Story', 8.053349, '348350'),
, ('The Star Wars Holiday Special', 8.053349, '74849'),
, ('Phineas and Ferb: Star Wars', 8.053349, '392216')] [('Star Wars', 4.1874104, '11'),
, ('Star Wars: The Clone Wars', 2.3627238, '12180'),
, ('Star Wars: The Rise of Skywalker', 1.7667875, '181812'),
, ('Star Wars: The Force Awakens', 1.3336482, '140607'),
, ('Star Wars: The Last Jedi', 1.3336482, '181808'),
, ('Rogue One: A Star Wars Story', 1.1134433, '330459'),
, ('LEGO Star Wars Summer Vacation', 1.082971, '980804'),
, ("Doraemon: Nobita's Little Star Wars 2021", 0.9138395, '782054'),
, ('LEGO Star Wars Terrifying Tales', 0.89640737, '857702'),
, ('Solo: A Star Wars Story', 0.65811557, '348350')] As also shown in the feature importance graph above, we can see in this results list that the title_bm25 and popularity features are weighted highly in our trained model. Now all results include the query terms in the title, showing the importance of the title_bm25 feature. Similarly, more popular movies now rank higher, for example Rogue One: A Star Wars Story is now in sixth position.