Notebooks
E
Elastic
Re Ranking Elasticsearch Hosted

Re Ranking Elasticsearch Hosted

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasePythonsearchgenaistacksupporting-blog-contentvectorelasticsearch-labsre-ranking-elasticsearch-hostedlangchainapplications

Reranking with a locally hosted reranker model from HuggingFace

Setup the notebook

Install required libs

[ ]

Import the required python libraries

[ ]

Create an Elasticsearch Python client

Free Trial

If you don't have an Elasticsearch cluster, or what one to test out. Head over to cloud.elastic.co and sign up. You can sign up and have a serverless project up and running in only a few mintues!

We are using an Elastic Cloud cloud_id and deployment (cluster) API key.

See this guide for finding the cloud_id and creating an api_key

[ ]

Ready Elasticsearch

Hugging Face Reranking Model

Run this cell to:

  • Use Eland's eland_import_hub_model command to upload the reranking model to Elasticsearch.

For this example we've chosen the cross-encoder/ms-marco-MiniLM-L-6-v2 text similarity model.

Note: While we are importing the model for use as a reranker, Eland and Elasticsearch do not have a dedicated rerank task type, so we still use text_similarity

[ ]

Create Inference Endpoint

Run this cell to:

  • Create an inference Endpoint
  • Deploy the reranking model we impoted in the previous section We need to create an endpoint queries can use for reranking

Key points about the model_config

  • service - in this case elasticsearch will tell the inference API to use a locally hosted (in Elasticsearch) model
  • num_allocations sets the number of allocations to 1
    • Allocations are independent units of work for NLP tasks. Scaling this allows for an increase in concurrent throughput
  • num_threads - sets the number of threads per allocation to 1
    • Threads per allocation affect the number of threads used by each allocation during inference. Scaling this generally increased the speed of inference requests (to a point).
  • model_id - This is the id of the model as it is named in Elasticsearch
[ ]
{'inference_id': 'semantic-reranking',
, 'task_type': 'rerank',
, 'service': 'elasticsearch',
, 'service_settings': {'num_allocations': 1,
,  'num_threads': 1,
,  'model_id': 'cross-encoder__ms-marco-minilm-l-6-v2'},
, 'task_settings': {'return_documents': True}}

Verify it was created

  • Run the two cells in this section to verify:
  • The Inference Endpoint has been completed
  • The model has been deployed

You should see JSON output with information about the semantic endpoint

[ ]
{'endpoints': [{'inference_id': 'semantic-reranking',
,   'task_type': 'rerank',
,   'service': 'elasticsearch',
,   'service_settings': {'num_allocations': 1,
,    'num_threads': 1,
,    'model_id': 'cross-encoder__ms-marco-minilm-l-6-v2'},
,   'task_settings': {'return_documents': True}}]}

Create the index mapping

We are going to index the title and abstract from the dataset.

[ ]
Index 'arxiv-papers' created successfully.

Ready the dataset

We are going to use the CShorten/ML-ArXiv-Papers dataset.

Download Dataset

Note You may get a warning The secret HF_TOKEN does not exist in your Colab secrets.

You can safely ignore this.

[ ]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

Index into Elasticsearch

We will loop through the dataset and send batches of rows to Elasticsearch

  • This may take a couple minutes depending on your cluster sizing.
[ ]

Query with Reranking

This contains a text_similarity_reranker retriever, which:

  • Uses a standard retriever to:
    • Perform a lexical query against the title field
  • Performs a reranking:
    • Takes as input the top 100 results from the previous search
    • rank_window_size: 100
    • Takes as input the query
    • inference_text: query
  • Uses our previously created reranking API and model
[ ]

Print the table comparing the scored and reranked results

[ ]

Print out Title and Abstract

This will print the title and the abstract for the top 10 results after semantic reranking.

[ ]
Title Compact Speaker Embedding: lrx-vector 
  Abstract:   Deep neural networks (DNN) have recently been widely used in speaker
recognition systems, achieving state-of-the-art performance on various
benchmarks. The x-vector architecture is especially popular in this research
community, due to its excellent performance and manageable computational
complexity. In this paper, we present the lrx-vector system, which is the
low-rank factorized version of the x-vector embedding network. The primary
objective of this topology is to further reduce the memory requirement of the
speaker recognition system. We discuss the deployment of knowledge distillation
for training the lrx-vector system and compare against low-rank factorization
with SVD. On the VOiCES 2019 far-field corpus we were able to reduce the
weights by 28% compared to the full-rank x-vector system while keeping the
recognition rate constant (1.83% EER).

Title Quantum Sparse Support Vector Machines 
  Abstract:   We analyze the computational complexity of Quantum Sparse Support Vector
Machine, a linear classifier that minimizes the hinge loss and the $L_1$ norm
of the feature weights vector and relies on a quantum linear programming solver
instead of a classical solver. Sparse SVM leads to sparse models that use only
a small fraction of the input features in making decisions, and is especially
useful when the total number of features, $p$, approaches or exceeds the number
of training samples, $m$. We prove a $\Omega(m)$ worst-case lower bound for
computational complexity of any quantum training algorithm relying on black-box
access to training samples; quantum sparse SVM has at least linear worst-case
complexity. However, we prove that there are realistic scenarios in which a
sparse linear classifier is expected to have high accuracy, and can be trained
in sublinear time in terms of both the number of training samples and the
number of features.

Title Sparse Support Vector Infinite Push 
  Abstract:   In this paper, we address the problem of embedded feature selection for
ranking on top of the list problems. We pose this problem as a regularized
empirical risk minimization with $p$-norm push loss function ($p=\infty$) and
sparsity inducing regularizers. We leverage the issues related to this
challenging optimization problem by considering an alternating direction method
of multipliers algorithm which is built upon proximal operators of the loss
function and the regularizer. Our main technical contribution is thus to
provide a numerical scheme for computing the infinite push loss function
proximal operator. Experimental results on toy, DNA microarray and BCI problems
show how our novel algorithm compares favorably to competitors for ranking on
top while using fewer variables in the scoring function.

Title The Sparse Vector Technique, Revisited 
  Abstract:   We revisit one of the most basic and widely applicable techniques in the
literature of differential privacy - the sparse vector technique [Dwork et al.,
STOC 2009]. This simple algorithm privately tests whether the value of a given
query on a database is close to what we expect it to be. It allows to ask an
unbounded number of queries as long as the answer is close to what we expect,
and halts following the first query for which this is not the case.
  We suggest an alternative, equally simple, algorithm that can continue
testing queries as long as any single individual does not contribute to the
answer of too many queries whose answer deviates substantially form what we
expect. Our analysis is subtle and some of its ingredients may be more widely
applicable. In some cases our new algorithm allows to privately extract much
more information from the database than the original.
  We demonstrate this by applying our algorithm to the shifting heavy-hitters
problem: On every time step, each of $n$ users gets a new input, and the task
is to privately identify all the current heavy-hitters. That is, on time step
$i$, the goal is to identify all data elements $x$ such that many of the users
have $x$ as their current input. We present an algorithm for this problem with
improved error guarantees over what can be obtained using existing techniques.
Specifically, the error of our algorithm depends on the maximal number of times
that a single user holds a heavy-hitter as input, rather than the total number
of times in which a heavy-hitter exists.

Title L-Vector: Neural Label Embedding for Domain Adaptation 
  Abstract:   We propose a novel neural label embedding (NLE) scheme for the domain
adaptation of a deep neural network (DNN) acoustic model with unpaired data
samples from source and target domains. With NLE method, we distill the
knowledge from a powerful source-domain DNN into a dictionary of label
embeddings, or l-vectors, one for each senone class. Each l-vector is a
representation of the senone-specific output distributions of the source-domain
DNN and is learned to minimize the average L2, Kullback-Leibler (KL) or
symmetric KL distance to the output vectors with the same label through simple
averaging or standard back-propagation. During adaptation, the l-vectors serve
as the soft targets to train the target-domain model with cross-entropy loss.
Without parallel data constraint as in the teacher-student learning, NLE is
specially suited for the situation where the paired target-domain data cannot
be simulated from the source-domain data. We adapt a 6400 hours
multi-conditional US English acoustic model to each of the 9 accented English
(80 to 830 hours) and kids' speech (80 hours). NLE achieves up to 14.1%
relative word error rate reduction over direct re-training with one-hot labels.

Title Spaceland Embedding of Sparse Stochastic Graphs 
  Abstract:   We introduce a nonlinear method for directly embedding large, sparse,
stochastic graphs into low-dimensional spaces, without requiring vertex
features to reside in, or be transformed into, a metric space. Graph data and
models are prevalent in real-world applications. Direct graph embedding is
fundamental to many graph analysis tasks, in addition to graph visualization.
We name the novel approach SG-t-SNE, as it is inspired by and builds upon the
core principle of t-SNE, a widely used method for nonlinear dimensionality
reduction and data visualization. We also introduce t-SNE-$\Pi$, a
high-performance software for 2D, 3D embedding of large sparse graphs on
personal computers with superior efficiency. It empowers SG-t-SNE with modern
computing techniques for exploiting in tandem both matrix structures and memory
architectures. We present elucidating embedding results on one synthetic graph
and four real-world networks.

Title Sparse Signal Recovery in the Presence of Intra-Vector and Inter-Vector
  Correlation 
  Abstract:   This work discusses the problem of sparse signal recovery when there is
correlation among the values of non-zero entries. We examine intra-vector
correlation in the context of the block sparse model and inter-vector
correlation in the context of the multiple measurement vector model, as well as
their combination. Algorithms based on the sparse Bayesian learning are
presented and the benefits of incorporating correlation at the algorithm level
are discussed. The impact of correlation on the limits of support recovery is
also discussed highlighting the different impact intra-vector and inter-vector
correlations have on such limits.

Title Stable Sparse Subspace Embedding for Dimensionality Reduction 
  Abstract:   Sparse random projection (RP) is a popular tool for dimensionality reduction
that shows promising performance with low computational complexity. However, in
the existing sparse RP matrices, the positions of non-zero entries are usually
randomly selected. Although they adopt uniform sampling with replacement, due
to large sampling variance, the number of non-zeros is uneven among rows of the
projection matrix which is generated in one trial, and more data information
may be lost after dimension reduction. To break this bottleneck, based on
random sampling without replacement in statistics, this paper builds a stable
sparse subspace embedded matrix (S-SSE), in which non-zeros are uniformly
distributed. It is proved that the S-SSE is stabler than the existing matrix,
and it can maintain Euclidean distance between points well after dimension
reduction. Our empirical studies corroborate our theoretical findings and
demonstrate that our approach can indeed achieve satisfactory performance.

Title Auto-weighted Mutli-view Sparse Reconstructive Embedding 
  Abstract:   With the development of multimedia era, multi-view data is generated in
various fields. Contrast with those single-view data, multi-view data brings
more useful information and should be carefully excavated. Therefore, it is
essential to fully exploit the complementary information embedded in multiple
views to enhance the performances of many tasks. Especially for those
high-dimensional data, how to develop a multi-view dimension reduction
algorithm to obtain the low-dimensional representations is of vital importance
but chanllenging. In this paper, we propose a novel multi-view dimensional
reduction algorithm named Auto-weighted Mutli-view Sparse Reconstructive
Embedding (AMSRE) to deal with this problem. AMSRE fully exploits the sparse
reconstructive correlations between features from multiple views. Furthermore,
it is equipped with an auto-weighted technique to treat multiple views
discriminatively according to their contributions. Various experiments have
verified the excellent performances of the proposed AMSRE.

Title Embedding Words in Non-Vector Space with Unsupervised Graph Learning 
  Abstract:   It has become a de-facto standard to represent words as elements of a vector
space (word2vec, GloVe). While this approach is convenient, it is unnatural for
language: words form a graph with a latent hierarchical structure, and this
structure has to be revealed and encoded by word embeddings. We introduce
GraphGlove: unsupervised graph word representations which are learned
end-to-end. In our setting, each word is a node in a weighted graph and the
distance between words is the shortest path distance between the corresponding
nodes. We adopt a recent method learning a representation of data in the form
of a differentiable weighted graph and use it to modify the GloVe training
algorithm. We show that our graph-based representations substantially
outperform vector-based methods on word similarity and analogy tasks. Our
analysis reveals that the structure of the learned graphs is hierarchical and
similar to that of WordNet, the geometry is highly non-trivial and contains
subgraphs with different local topology.