Notebooks
P
Pinecone
Question Answering

Question Answering

learnquestion-answeringsearchpinecone-examples

Open In Colab Open nbviewer

Question Answering with Similarity Search

This notebook demonstrates how Pinecone's similarity search as a service helps you build a question answering application. We will index a set of questions and retrieve the most similar stored questions for a new (unseen) question. That way, we can link a new question to answers we might already have.

You can build a questions answering application with Pinecone in three steps:

  • Represent questions as vector embeddings so that semantically similar questions are in close proximity within the same vector space.
  • Index vectors using Pinecone.
  • Given a new question, query the index to fetch similar questions. This can allow us to store answers associated with these questions

In this notebook we will be dealing with indexing a set of quetions and retrieving similar questions for a new and unseen question.

Dependencies

[ ]
[2]

Pinecone Installation and Setup

Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a free API key and enter it below where we will initialize our connection to Pinecone and create a new index.

[3]

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all available providers and regions here.

[ ]

Create the index:

[4]
[5]

Uploading Questions

The dataset used in this notebook is the Quora Question Pairs Dataset.

Let's download the dataset and load the data.

[8]
[9]
     qid1  \
0  216488   
1  424959   
2  300233   
3  302677   
4  468590   

                                                                                                                                     question1  
0                                                                                               I would love to give a TED talk. What do I do?  
1                                                                Do all caps titles on YouTube videos attract more viewers than normal titles?  
2                                                                                                How do I start self-learning ethical hacking?  
3                                                                           Should learning musical instruments in schools be made compulsory?  
4  Does the success of a self proclaimed Acharya Pankaj Pathak in Assam prove that we, as a state, are regressing back instead of progressing?  

Define the model

We will use the Averarage Word Embeddings Model for this example. This model has a high computation speed but relatively low quality of embeddings. You can look into other sentence embeddings models such as the Sentence Embeddings Models trained on Paraphrases for improving quality of embeddings.

[10]
Downloading:   0%|          | 0.00/690 [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/480M [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/4.61M [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/164 [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/2.15k [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/122 [00:00<?, ?B/s]
Downloading:   0%|          | 0.00/248 [00:00<?, ?B/s]

Creating Vector Embeddings

[11]
Batches:   0%|          | 0/9083 [00:00<?, ?it/s]
[12]

Index the Vectors

[13]
[14]

Search

Once you have indexed the vectors it is very straightforward to query the index. These are the steps you need to follow:

  • Select a set of questions you want to query with
  • Use the Average Embedding Model to transform questions into embeddings.
  • Send each question vector to the Pinecone index and retrieve most similar indexed questions
[15]



 Original question : What is best way to make money online?

 Most similar questions based on pinecone vector search: 

       id                                             question     score
0      57               What is best way to make money online?  1.000000
1  297469           What is the best way to make money online?  1.000000
2   55585        What is the best way for making money online?  0.989930
3   28280         What are the best ways to make money online?  0.981526
4  157045  What is the best way to make money on the internet?  0.978538



 Original question : How can i build an e-commerce website?

 Most similar questions based on pinecone vector search: 

       id                                                   question     score
0  119383                   How can I develop an e-commerce website?  0.925466
1    1713                 How would I develop an e-commerce website?  0.925466
2    1714                     How do I create an e-commerce website?  0.919407
3   79063             How do I build and host an e-commerce website?  0.918379
4  245780  What is the best platform to build an e-commerce website?  0.894444

Delete the Index

Delete the index once you are sure that you do not want to use it anymore. Once it is deleted, you cannot reuse it.

[16]