Namespacing
Namespacing with Pinecone
Namespacing is a feature in Pinecone that allows you to partition your data in an index. When you read from or write to a namespace in an index, you only access data in that particular namespace. Namespacing is useful when you want to reuse the same data processing pipeline but maintain strict separation between subsets of your data.
If your use-case is one where you feel a temptation to create multiple indexes programatically, consider whether the sort of multitenancy provided by namespaces would be a better solution to isolate different parts of your data.
For example, if you were building a movie recommender system, you could use namespacing to separate recommendations by genre. But if you need more flexibility in how you group and search records, putting genre information into metadata and using metadata filtering would probably be a better fit.
Prerequisites
Install dependencies.
Creating an Index
We begin by instantiating an instance of the Pinecone client. To do this we need a free API key.
Creating a Pinecone Index
When creating the index we need to define several configuration properties.
namecan be anything we like. The name is used as an identifier for the index when performing other operations such asdescribe_index,delete_index, and so on.metricspecifies the similarity metric that will be used later when you make queries to the index.dimensionshould correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.specholds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all available providers and regions here.
There are more configurations available, but this minimal set will get us started.
{
, "name": "pinecone-namespacing",
, "metric": "euclidean",
, "host": "pinecone-namespacing-dojoi3u.svc.aped-4627-b74a.pinecone.io",
, "spec": {
, "serverless": {
, "cloud": "aws",
, "region": "us-east-1"
, }
, },
, "status": {
, "ready": true,
, "state": "Ready"
, },
, "vector_type": "dense",
, "dimension": 2,
, "deletion_protection": "disabled",
, "tags": null
,} The index host is pinecone-namespacing-dojoi3u.svc.aped-4627-b74a.pinecone.io
Working with the Index
Data operations such as upsert and query are sent directly to the index host instead of api.pinecone.io, so we use a different client object object for these operations. By using the .Index() helper method to construct this client object, it will automatically inherit your API Key and any other configurations from the parent Pinecone instance.
Generate movie data
For this simple example scenario, we will make up some small vectors to represent different movies.
Insert vectors without specifying a namespace
{'upserted_count': 4} {'dimension': 2,
, 'index_fullness': 0.0,
, 'metric': 'euclidean',
, 'namespaces': {'': {'vector_count': 4}},
, 'total_vector_count': 4,
, 'vector_type': 'dense'} Insert vectors into a namespace
{'dimension': 2,
, 'index_fullness': 0.0,
, 'namespaces': {'': {'vector_count': 4},
, 'romantic-comedy': {'vector_count': 2}},
, 'total_vector_count': 6} Query top-3 results, without a namespace
{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
, {'id': 'Up', 'score': 1.99999905, 'values': []},
, {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
, 'namespace': ''} Query top-3 results, with a namespace
We should expect to see only romantic comedies in the query results.
{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
, {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
, 'namespace': 'romantic-comedy'} Delete the index
Once we're done, delete the index to save resources.