Re Ranking Elasticsearch Hosted
Reranking with a locally hosted reranker model from HuggingFace
Setup the notebook
Install required libs
Import the required python libraries
Create an Elasticsearch Python client
Free Trial
If you don't have an Elasticsearch cluster, or what one to test out. Head over to cloud.elastic.co and sign up. You can sign up and have a serverless project up and running in only a few mintues!
We are using an Elastic Cloud cloud_id and deployment (cluster) API key.
See this guide for finding the cloud_id and creating an api_key
Ready Elasticsearch
Hugging Face Reranking Model
Run this cell to:
- Use Eland's
eland_import_hub_modelcommand to upload the reranking model to Elasticsearch.
For this example we've chosen the cross-encoder/ms-marco-MiniLM-L-6-v2 text similarity model.
Note:
While we are importing the model for use as a reranker, Eland and Elasticsearch do not have a dedicated rerank task type, so we still use text_similarity
Create Inference Endpoint
Run this cell to:
- Create an inference Endpoint
- Deploy the reranking model we impoted in the previous section We need to create an endpoint queries can use for reranking
Key points about the model_config
service- in this caseelasticsearchwill tell the inference API to use a locally hosted (in Elasticsearch) modelnum_allocationssets the number of allocations to 1- Allocations are independent units of work for NLP tasks. Scaling this allows for an increase in concurrent throughput
num_threads- sets the number of threads per allocation to 1- Threads per allocation affect the number of threads used by each allocation during inference. Scaling this generally increased the speed of inference requests (to a point).
model_id- This is the id of the model as it is named in Elasticsearch
{'inference_id': 'semantic-reranking',
, 'task_type': 'rerank',
, 'service': 'elasticsearch',
, 'service_settings': {'num_allocations': 1,
, 'num_threads': 1,
, 'model_id': 'cross-encoder__ms-marco-minilm-l-6-v2'},
, 'task_settings': {'return_documents': True}} Verify it was created
- Run the two cells in this section to verify:
- The Inference Endpoint has been completed
- The model has been deployed
You should see JSON output with information about the semantic endpoint
{'endpoints': [{'inference_id': 'semantic-reranking',
, 'task_type': 'rerank',
, 'service': 'elasticsearch',
, 'service_settings': {'num_allocations': 1,
, 'num_threads': 1,
, 'model_id': 'cross-encoder__ms-marco-minilm-l-6-v2'},
, 'task_settings': {'return_documents': True}}]} Create the index mapping
We are going to index the title and abstract from the dataset.
Index 'arxiv-papers' created successfully.
Ready the dataset
We are going to use the CShorten/ML-ArXiv-Papers dataset.
Download Dataset
Note You may get a warning The secret HF_TOKEN does not exist in your Colab secrets.
You can safely ignore this.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn(
Index into Elasticsearch
We will loop through the dataset and send batches of rows to Elasticsearch
- This may take a couple minutes depending on your cluster sizing.
Query with Reranking
This contains a text_similarity_reranker retriever, which:
- Uses a standard retriever to:
- Perform a lexical query against the
titlefield
- Perform a lexical query against the
- Performs a reranking:
- Takes as input the top 100 results from the previous search
rank_window_size: 100- Takes as input the query
inference_text: query
- Uses our previously created reranking API and model
Print the table comparing the scored and reranked results
Print out Title and Abstract
This will print the title and the abstract for the top 10 results after semantic reranking.
Title Compact Speaker Embedding: lrx-vector Abstract: Deep neural networks (DNN) have recently been widely used in speaker recognition systems, achieving state-of-the-art performance on various benchmarks. The x-vector architecture is especially popular in this research community, due to its excellent performance and manageable computational complexity. In this paper, we present the lrx-vector system, which is the low-rank factorized version of the x-vector embedding network. The primary objective of this topology is to further reduce the memory requirement of the speaker recognition system. We discuss the deployment of knowledge distillation for training the lrx-vector system and compare against low-rank factorization with SVD. On the VOiCES 2019 far-field corpus we were able to reduce the weights by 28% compared to the full-rank x-vector system while keeping the recognition rate constant (1.83% EER). Title Quantum Sparse Support Vector Machines Abstract: We analyze the computational complexity of Quantum Sparse Support Vector Machine, a linear classifier that minimizes the hinge loss and the $L_1$ norm of the feature weights vector and relies on a quantum linear programming solver instead of a classical solver. Sparse SVM leads to sparse models that use only a small fraction of the input features in making decisions, and is especially useful when the total number of features, $p$, approaches or exceeds the number of training samples, $m$. We prove a $\Omega(m)$ worst-case lower bound for computational complexity of any quantum training algorithm relying on black-box access to training samples; quantum sparse SVM has at least linear worst-case complexity. However, we prove that there are realistic scenarios in which a sparse linear classifier is expected to have high accuracy, and can be trained in sublinear time in terms of both the number of training samples and the number of features. Title Sparse Support Vector Infinite Push Abstract: In this paper, we address the problem of embedded feature selection for ranking on top of the list problems. We pose this problem as a regularized empirical risk minimization with $p$-norm push loss function ($p=\infty$) and sparsity inducing regularizers. We leverage the issues related to this challenging optimization problem by considering an alternating direction method of multipliers algorithm which is built upon proximal operators of the loss function and the regularizer. Our main technical contribution is thus to provide a numerical scheme for computing the infinite push loss function proximal operator. Experimental results on toy, DNA microarray and BCI problems show how our novel algorithm compares favorably to competitors for ranking on top while using fewer variables in the scoring function. Title The Sparse Vector Technique, Revisited Abstract: We revisit one of the most basic and widely applicable techniques in the literature of differential privacy - the sparse vector technique [Dwork et al., STOC 2009]. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. It allows to ask an unbounded number of queries as long as the answer is close to what we expect, and halts following the first query for which this is not the case. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries whose answer deviates substantially form what we expect. Our analysis is subtle and some of its ingredients may be more widely applicable. In some cases our new algorithm allows to privately extract much more information from the database than the original. We demonstrate this by applying our algorithm to the shifting heavy-hitters problem: On every time step, each of $n$ users gets a new input, and the task is to privately identify all the current heavy-hitters. That is, on time step $i$, the goal is to identify all data elements $x$ such that many of the users have $x$ as their current input. We present an algorithm for this problem with improved error guarantees over what can be obtained using existing techniques. Specifically, the error of our algorithm depends on the maximal number of times that a single user holds a heavy-hitter as input, rather than the total number of times in which a heavy-hitter exists. Title L-Vector: Neural Label Embedding for Domain Adaptation Abstract: We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains. With NLE method, we distill the knowledge from a powerful source-domain DNN into a dictionary of label embeddings, or l-vectors, one for each senone class. Each l-vector is a representation of the senone-specific output distributions of the source-domain DNN and is learned to minimize the average L2, Kullback-Leibler (KL) or symmetric KL distance to the output vectors with the same label through simple averaging or standard back-propagation. During adaptation, the l-vectors serve as the soft targets to train the target-domain model with cross-entropy loss. Without parallel data constraint as in the teacher-student learning, NLE is specially suited for the situation where the paired target-domain data cannot be simulated from the source-domain data. We adapt a 6400 hours multi-conditional US English acoustic model to each of the 9 accented English (80 to 830 hours) and kids' speech (80 hours). NLE achieves up to 14.1% relative word error rate reduction over direct re-training with one-hot labels. Title Spaceland Embedding of Sparse Stochastic Graphs Abstract: We introduce a nonlinear method for directly embedding large, sparse, stochastic graphs into low-dimensional spaces, without requiring vertex features to reside in, or be transformed into, a metric space. Graph data and models are prevalent in real-world applications. Direct graph embedding is fundamental to many graph analysis tasks, in addition to graph visualization. We name the novel approach SG-t-SNE, as it is inspired by and builds upon the core principle of t-SNE, a widely used method for nonlinear dimensionality reduction and data visualization. We also introduce t-SNE-$\Pi$, a high-performance software for 2D, 3D embedding of large sparse graphs on personal computers with superior efficiency. It empowers SG-t-SNE with modern computing techniques for exploiting in tandem both matrix structures and memory architectures. We present elucidating embedding results on one synthetic graph and four real-world networks. Title Sparse Signal Recovery in the Presence of Intra-Vector and Inter-Vector Correlation Abstract: This work discusses the problem of sparse signal recovery when there is correlation among the values of non-zero entries. We examine intra-vector correlation in the context of the block sparse model and inter-vector correlation in the context of the multiple measurement vector model, as well as their combination. Algorithms based on the sparse Bayesian learning are presented and the benefits of incorporating correlation at the algorithm level are discussed. The impact of correlation on the limits of support recovery is also discussed highlighting the different impact intra-vector and inter-vector correlations have on such limits. Title Stable Sparse Subspace Embedding for Dimensionality Reduction Abstract: Sparse random projection (RP) is a popular tool for dimensionality reduction that shows promising performance with low computational complexity. However, in the existing sparse RP matrices, the positions of non-zero entries are usually randomly selected. Although they adopt uniform sampling with replacement, due to large sampling variance, the number of non-zeros is uneven among rows of the projection matrix which is generated in one trial, and more data information may be lost after dimension reduction. To break this bottleneck, based on random sampling without replacement in statistics, this paper builds a stable sparse subspace embedded matrix (S-SSE), in which non-zeros are uniformly distributed. It is proved that the S-SSE is stabler than the existing matrix, and it can maintain Euclidean distance between points well after dimension reduction. Our empirical studies corroborate our theoretical findings and demonstrate that our approach can indeed achieve satisfactory performance. Title Auto-weighted Mutli-view Sparse Reconstructive Embedding Abstract: With the development of multimedia era, multi-view data is generated in various fields. Contrast with those single-view data, multi-view data brings more useful information and should be carefully excavated. Therefore, it is essential to fully exploit the complementary information embedded in multiple views to enhance the performances of many tasks. Especially for those high-dimensional data, how to develop a multi-view dimension reduction algorithm to obtain the low-dimensional representations is of vital importance but chanllenging. In this paper, we propose a novel multi-view dimensional reduction algorithm named Auto-weighted Mutli-view Sparse Reconstructive Embedding (AMSRE) to deal with this problem. AMSRE fully exploits the sparse reconstructive correlations between features from multiple views. Furthermore, it is equipped with an auto-weighted technique to treat multiple views discriminatively according to their contributions. Various experiments have verified the excellent performances of the proposed AMSRE. Title Embedding Words in Non-Vector Space with Unsupervised Graph Learning Abstract: It has become a de-facto standard to represent words as elements of a vector space (word2vec, GloVe). While this approach is convenient, it is unnatural for language: words form a graph with a latent hierarchical structure, and this structure has to be revealed and encoded by word embeddings. We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes. We adopt a recent method learning a representation of data in the form of a differentiable weighted graph and use it to modify the GloVe training algorithm. We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks. Our analysis reveals that the structure of the learned graphs is hierarchical and similar to that of WordNet, the geometry is highly non-trivial and contains subgraphs with different local topology.