Pinecone Quickstart
Install an SDK
Pinecone provides SDKs in multiple languages.
For this quickstart, install the Python SDK and a library that makes it easy to sign up with Pinecone:
Get an API key
You need an API key to make calls to your Pinecone project.
Use the widget below to generate a key. If you don't have a Pinecone account, the widget will sign you up for the free Starter plan.
Initialize a client
Use the generated API key to intialize a Pinecone client:
Create an index
In Pinecone, there are two types of indexes for storing vector data: Dense indexes store dense vectors for semantic search, and sparse indexes store sparse vectors for lexical/keyword search.
For this quickstart, create a dense index that is integrated with an embedding model hosted by Pinecone. With integrated models, you upsert and search with text and have Pinecone generate vectors automatically.
Note: If you prefer to use external embedding models, see Bring your own vectors.
Upsert records
Prepare a sample dataset of factual statements from different domains like history, physics, technology, and music. Format the data as records with an ID, text, and category. These objects are expected to contain a chunk_text key because of the field_map we specified when creating the index above.
Other fields which are not mapped in the field mapping, like category, will become metadata on the upserted records.
Upsert the sample dataset into a new namespace in your index.
Because your index is integrated with an embedding model, you provide the textual statements and Pinecone converts them to dense vectors automatically.
Check index stats
Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. You can view index stats to check if the current vector count matches the number of vectors you upserted (50):
{'dimension': 1024,
, 'index_fullness': 0.0,
, 'metric': 'cosine',
, 'namespaces': {'example-namespace': {'vector_count': 50}},
, 'total_vector_count': 50,
, 'vector_type': 'dense'} Semantic search
Search the dense index for ten records that are most semantically similar to the query, Famous historical structures and monuments.
Again, because your index is integrated with an embedding model, you provide the query as text and Pinecone converts the text to a dense vector automatically.
id: rec17 | score: 0.252 | category: history | text: The Pyramids of Giza are among the Seven Wonders of the Ancient World. id: rec5 | score: 0.186 | category: literature | text: Shakespeare wrote many famous plays, including Hamlet and Macbeth. id: rec38 | score: 0.186 | category: history | text: The Taj Mahal is a mausoleum built by Emperor Shah Jahan. id: rec50 | score: 0.098 | category: energy | text: Renewable energy sources include wind, solar, and hydroelectric power. id: rec15 | score: 0.096 | category: art | text: Leonardo da Vinci painted the Mona Lisa. id: rec26 | score: 0.084 | category: history | text: Rome was once the center of a vast empire. id: rec1 | score: 0.078 | category: history | text: The Eiffel Tower was completed in 1889 and stands in Paris, France. id: rec47 | score: 0.072 | category: history | text: The Industrial Revolution transformed manufacturing and transportation. id: rec7 | score: 0.072 | category: history | text: The Great Wall of China was built to protect against invasions. id: rec21 | score: 0.061 | category: history | text: The Statue of Liberty was a gift from France to the United States.
Notice that most of the results are about historical structures and monuments. However, a few unrelated statements are included as well and are ranked high in the list, for example, statements about Shakespeare and renewable energy.
To get a more accurate ranking, search again but this time rerank the initial results based on their relevance to the query.
id: rec1 | score: 0.107 | category: history | text: The Eiffel Tower was completed in 1889 and stands in Paris, France. id: rec38 | score: 0.064 | category: history | text: The Taj Mahal is a mausoleum built by Emperor Shah Jahan. id: rec7 | score: 0.063 | category: history | text: The Great Wall of China was built to protect against invasions. id: rec21 | score: 0.019 | category: history | text: The Statue of Liberty was a gift from France to the United States. id: rec17 | score: 0.015 | category: history | text: The Pyramids of Giza are among the Seven Wonders of the Ancient World. id: rec26 | score: 0.011 | category: history | text: Rome was once the center of a vast empire. id: rec15 | score: 0.008 | category: art | text: Leonardo da Vinci painted the Mona Lisa. id: rec5 | score: 0.0 | category: literature | text: Shakespeare wrote many famous plays, including Hamlet and Macbeth. id: rec47 | score: 0.0 | category: history | text: The Industrial Revolution transformed manufacturing and transportation. id: rec50 | score: 0.0 | category: energy | text: Renewable energy sources include wind, solar, and hydroelectric power.
Notice that all of the most relevant results about historical structures and monuments are now ranked highest.
Improve results
Reranking results is one of the most effective ways to improve search accuracy and relevance, but there are many other techniques to consider. For example:
-
Filtering by metadata: When records contain additional metadata, you can limit the search to records matching a filter expression.
-
Hybrid search: You can add lexical search to capture precise keyword matches (e.g., product SKUs, email addresses, domain-specific terms) in addition to semantic matches.
-
Chunking strategies: You can chunk your content in different ways to get better results. Consider factors like the length of the content, the complexity of queries, and how results will be used in your application.
Clean up
When you no longer need your example index, you can delete it to save resources. After you delete an index, you cannot use it again or recover it.