Pinecone Namespacing

Namespacing

vector-databasesemantic-searchAIquick-tourLLMPythondocsjupyter-notebookpinecone-examples

alph-notebooks/pinecone-examples / namespacing.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Namespacing with Pinecone

Namespacing is a feature in Pinecone that allows you to partition your data in an index. When you read from or write to a namespace in an index, you only access data in that particular namespace. Namespacing is useful when you want to reuse the same data processing pipeline but maintain strict separation between subsets of your data.

If your use-case is one where you feel a temptation to create multiple indexes programatically, consider whether the sort of multitenancy provided by namespaces would be a better solution to isolate different parts of your data.

For example, if you were building a movie recommender system, you could use namespacing to separate recommendations by genre. But if you need more flexibility in how you group and search records, putting genre information into metadata and using metadata filtering would probably be a better fit.

Prerequisites

Install dependencies.

[12]

Creating an Index

We begin by instantiating an instance of the Pinecone client. To do this we need a free API key.

[13]

Creating a Pinecone Index

When creating the index we need to define several configuration properties.

name can be anything we like. The name is used as an identifier for the index when performing other operations such as describe_index, delete_index, and so on.
metric specifies the similarity metric that will be used later when you make queries to the index.
dimension should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
spec holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all available providers and regions here.

There are more configurations available, but this minimal set will get us started.

[14]

[15]

[16]

{
,    "name": "pinecone-namespacing",
,    "metric": "euclidean",
,    "host": "pinecone-namespacing-dojoi3u.svc.aped-4627-b74a.pinecone.io",
,    "spec": {
,        "serverless": {
,            "cloud": "aws",
,            "region": "us-east-1"
,        }
,    },
,    "status": {
,        "ready": true,
,        "state": "Ready"
,    },
,    "vector_type": "dense",
,    "dimension": 2,
,    "deletion_protection": "disabled",
,    "tags": null
,}

[18]

The index host is pinecone-namespacing-dojoi3u.svc.aped-4627-b74a.pinecone.io

Working with the Index

Data operations such as upsert and query are sent directly to the index host instead of api.pinecone.io, so we use a different client object object for these operations. By using the .Index() helper method to construct this client object, it will automatically inherit your API Key and any other configurations from the parent Pinecone instance.

[19]

Generate movie data

For this simple example scenario, we will make up some small vectors to represent different movies.

[20]

Insert vectors without specifying a namespace

[21]

{'upserted_count': 4}

[22]

{'dimension': 2,
, 'index_fullness': 0.0,
, 'metric': 'euclidean',
, 'namespaces': {'': {'vector_count': 4}},
, 'total_vector_count': 4,
, 'vector_type': 'dense'}

Insert vectors into a namespace

[8]

[9]

{'dimension': 2,
, 'index_fullness': 0.0,
, 'namespaces': {'': {'vector_count': 4},
,                'romantic-comedy': {'vector_count': 2}},
, 'total_vector_count': 6}

Query top-3 results, without a namespace

[10]

{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
,             {'id': 'Up', 'score': 1.99999905, 'values': []},
,             {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
, 'namespace': ''}

Query top-3 results, with a namespace

We should expect to see only romantic comedies in the query results.

[11]

{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
,             {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
, 'namespace': 'romantic-comedy'}

Delete the index

Once we're done, delete the index to save resources.

[12]