Pinecone Read Units Demonstrated

Read Units Demonstrated

vector-databasesemantic-searchAILLMPythondocsjupyter-notebookpinecone-examples

alph-notebooks/pinecone-examples / read-units-demonstrated.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Demonstrating Read Units (RUs)

Pinecone serverless indexes use a pricing structure where reads and writes have separate cost structures.

This notebook demonstrates how to build a Pinecone serverless index, populate it with data, and observe the Read Units (RUs) associated with different types of queries.

Install the Pinecone SDK

We'll install the gRPC version of the Pinecone SDK to maximize performance of upserts and other data operations.

[3]

Requirement already satisfied: pinecone-client==3.0.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client[grpc]==3.0.0) (3.0.0)
Requirement already satisfied: certifi>=2019.11.17 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client==3.0.0->pinecone-client[grpc]==3.0.0) (2023.7.22)
Requirement already satisfied: tqdm>=4.64.1 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client==3.0.0->pinecone-client[grpc]==3.0.0) (4.66.1)
Requirement already satisfied: typing-extensions>=3.7.4 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client==3.0.0->pinecone-client[grpc]==3.0.0) (4.7.1)
Requirement already satisfied: urllib3>=1.26.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client==3.0.0->pinecone-client[grpc]==3.0.0) (1.26.16)
Requirement already satisfied: googleapis-common-protos>=1.53.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client[grpc]==3.0.0) (1.61.0)
Requirement already satisfied: grpc-gateway-protoc-gen-openapiv2==0.1.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client[grpc]==3.0.0) (0.1.0)
Requirement already satisfied: grpcio>=1.44.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client[grpc]==3.0.0) (1.57.0)
Requirement already satisfied: lz4>=3.1.3 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client[grpc]==3.0.0) (4.3.2)
Requirement already satisfied: protobuf<3.21.0,>=3.20.0 in /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages (from pinecone-client[grpc]==3.0.0) (3.20.3)

[2]

Python 3.10.12

Connect to Pinecone and create an index

[ ]

[9]

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all available providers and regions here.

[ ]

[12]

Index name:  rus-demo

[13]

Creating index "rus-demo"...

Successfully created index "rus-demo"!

[14]

[15]

{'dimension': 1536,
, 'index_fullness': 0.0,
, 'namespaces': {'': {'vector_count': 0}},
, 'total_vector_count': 0}

[Skip this section if your index exists already]

Batch upsert vectors into different namespaces

We'll create and populate three namespaces with 50k, 100k, and 200k vectors, respectively. Namespaces are optional, but they are a best practice for limiting queries to relevant records, which both speeds up queries and reduces the RUs consumed.

[ ]

Populating namespace "50k":
100%|██████████████████████████████████████████████████████████████████████████████████| 5/5 [03:34<00:00, 42.93s/it]
Populating namespace "100k":
 60%|████████████████████████████████████████████████▌                                | 6/10 [04:16<02:50, 42.63s/it]

Validate everything looks as expected.

[13]

{'dimension': 1536,
, 'index_fullness': 0.0,
, 'namespaces': {'100k': {'vector_count': 100000},
,                '200k': {'vector_count': 200000},
,                '50k': {'vector_count': 50000}},
, 'total_vector_count': 350000}

Inspect Read Costs

We'll now execute a simple query on the first namespace ('50k') and inspect its response.

You should see 'usage': {'read_units': 5} at the way bottom. Those are our 'RUs'!

[49]

[50]

{'matches': [{'id': '7ba8b4aa-f883-4c4e-b0b9-b7fc43179750',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': 'e83c6d3e-4706-4ada-acbe-e825d11bcdf4',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': 'd334f5f6-e62d-47ac-ad7b-3124dcc4cb4d',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': '303aa521-b7b0-45d8-9cef-8ab3c8c23f9f',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': '9fe84421-3ee1-46f5-9af6-a158983791f5',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': '0ec0005e-32a0-431a-b0d9-b31fd7d02e7c',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': '729eda28-9032-4b7b-b3b3-d3ff0efb603b',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': 'f44c5eda-a92c-42dc-8461-aff5bd1cb968',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': '6e21d46d-895e-45e2-b4b6-f69e74eac19a',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []},
,             {'id': '522fa5ed-f918-4fe4-b720-cd279134adbd',
,              'metadata': None,
,              'score': 0.0,
,              'sparse_values': {'indices': [], 'values': []},
,              'values': []}],
, 'namespace': '50k',
, 'usage': {'read_units': 5}}

Since every query consumes read units, every query's response will have a usage field. This usage field contains the exact number of RUs your query incurred.

We can drill down to only our query's corresponding cost in RUs by doing the following:

[51]

Querying the "50k" namespace consumed 5 RUs, which is the minimum value a query can use.

Let's query the "100k" namespace to see how the result changes:

[53]

When we queried the "50k" namespace, we consumed 5 RUs. When we now query a namespace that has 2x the vectors (the "100k" namespace), we see that we only consumed 1 extra RU.

Let's see what happens when we 2x the size again, querying the "200k" namespace:

[55]

When we query the "200k" namespace, our RU cost goes from 6 to 8. Note that this is sub-linear scaling in action!

Toggling `top_k`

Now let's stay querying the "200k" namespace, but increase our top_k value from 10 to 100 to see its effect:

[56]

Increasing our top_k from 10 to 100 in the "200k" namespace has not changed the number of RUs incurred.

This is because Pinecone's initial scan of the "200k" namespace was enough to produce the IDs of the top_k results for both 10 and 100.

Toggling `include_metadata`

But what if we set include_metadata to True? This should trigger a "post-scan" Fetch stage with an additional cost of 1 RU per 10 items in our result set:

[57]

Looks like that worked! By including metadata in our query's response (include_metadata=True), we went from a cost of 8 RUs to a cost of 18 RUs, because we added 1 RU per 10 items returned (and we returned 100 items by setting our top_k to 100). Our original cost of 8 RUs plus our new overhead of 10 RUs (1 additional RU for every 10 items out of our total of 100 items), equals 18 RUs.

Putting it all together

Now let's increase the top_k even more to see how it affects the RU cost:

[58]

By increasing our top_k from 100 to 1000, and continuing to include metadata in our response, we are now at a cost of 108 RUs.

Play with the cost of your queries on your own and let us know what you find!

Cleanup

Delete the index to avoid incurring ongoing storage costs.

[ ]