Cardinal Weaviate
vector-searchdata-platformsvector-databaseretrieval-augmented-generationllm-frameworksfunction-callingweaviate-recipesintegrationsPythongenerative-aicardinal
Export
For this demo, we're using version 4.6.5 Weaviate python client, and the Cardinal API.
Author: Jianna Liu from Cardinal
Jianna's X handle: @jianna_liu Jianna's LinkedIn: https://www.linkedin.com/in/jianna-liu-90747413b/
Cardinal's site: https://trycardinal.ai/
Cardinal ↔ Weaviate RAG Demo
We will:
- Pull PDFs from S3 (or use URLs)
- Send each file to Cardinal /rag
- Convert Cardinal’s inch-based boxes → points → normalized percentages
- Upsert chunks to Weaviate
- Run aggregate, hybrid, vector, and generative queries
Requires: Cardinal API key, Weaviate (Cloud/Embedded/Local), AWS creds (if using S3)
If you enjoyed this, follow our socials!
0) Install deps
[2]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 325.7/325.7 kB 5.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.0/40.0 kB 3.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.3/139.3 kB 11.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.1/24.1 MB 82.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 97.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.7/2.7 MB 88.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 7.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.7/85.7 kB 7.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.0/322.0 kB 27.6 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. firebase-admin 6.9.0 requires httpx[http2]==0.28.1, but you have httpx 0.27.0 which is incompatible. grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 6.32.1 which is incompatible. tensorflow 2.19.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.3, but you have protobuf 6.32.1 which is incompatible. mcp 1.13.1 requires httpx>=0.27.1, but you have httpx 0.27.0 which is incompatible. google-ai-generativelanguage 0.6.15 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2, but you have protobuf 6.32.1 which is incompatible. google-genai 1.33.0 requires httpx<1.0.0,>=0.28.1, but you have httpx 0.27.0 which is incompatible.
1) Load environment variables
2) Connect to Weaviate
[ ]
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
[60]
Weaviate client version: 4.6.5 Connected to Weaviate: True
3) Create a Weaviate collection (CardinalDemo)
[61]
Deleted existing CardinalDemo collection Created CardinalDemo collection
4) Import helper functions
[ ]
5) Get S3 URLs
[48]
6) Process files and collect objects
[63]
(1, ['s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf'])
[ ]
Processing files: 0%| | 0/1 [00:00<?, ?it/s]
Processing: https://public-cardinal-bucket.s3.us-east-2.amazonaws.com/menus/Butterflake%20Croissant%20Sandwiches.pdf
Processing files: 100%|██████████| 1/1 [00:20<00:00, 20.89s/it]
Extracted 25 objects from Butterflake Croissant Sandwiches.pdf
First object structure:
{
"properties": {
"text": "BUTTER\nFLAKE\nCROISSANT SANDWICHES",
"type": "paragraph",
"element_id": "s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf#p1:999507151",
"page_number": 1,
"page_width_pts": 612.0,
"page_height_pts": 792.0,
"bbox_in": {
"min_x": 0.6523,
"min_y": 0.5002,
"max_x": 2.8522,
"max_y": 3.1314
},
"bbox_pts": {
"x": 46.9656,
"y": 36.014399999999995,
"w": 158.3928,
"h": 18...
Total objects to insert: 25
7) Batch insert into Weaviate using the recommended batch method
[65]
Successfully inserted all 25 objects using batch method
8) Test query
[66]
=== Testing Query === Total documents in collection: 25 --- Fetching sample documents to verify structure --- Sample 1: Text: DRINKS Coffee $3.49 Orange Juice $3.49 Apple Juice $3.49 Milk $3.49... Filename: Butterflake Croissant Sandwiches.pdf Page: 1 Type: paragraph Source URL: s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf BBox: left=59.9%, top=89.3%, width=10.7%, height=5.7% Sample 2: Text: # MENU... Filename: Butterflake Croissant Sandwiches.pdf Page: 1 Type: paragraph Source URL: s3://public-cardinal-bucket/menus/Butterflake Croissant Sandwiches.pdf BBox: left=68.8%, top=42.2%, width=11.2%, height=3.3% --- Hybrid Search Results --- Found 3 results for 'Croissant sandwich': --- Result 1 --- Text: CROISSANT SANDWICHES all served on a toasted Croissant THE STANDARD $9.99 Scrambled Egg | American Cheese | Frank's Redhot Add Bacon $ Add Sausage $ SWEET HAMMY $10.99 Scrambled Egg | Smoked Ham | Swi... Filename: Butterflake Croissant Sandwiches.pdf Page: 1 Score: 0.8672 --- Result 2 --- Text: BUTTER FLAKE CROISSANT SANDWICHES... Filename: Butterflake Croissant Sandwiches.pdf Page: 1 Score: 0.7295 --- Result 3 --- Text: BREAD: Croissant, 2.5oz DAIRY: American Cheese, Swiss, Feta Crumbles, Milk PROTEIN: Eggs, Sausage Patty, Bacon, Smoked Ham PRODUCE: Iceberg Lettuce, Tomato, Green Pepper, Onion, Spinach CONDIMENTS/SPI... Filename: Butterflake Croissant Sandwiches.pdf Page: 1 Score: 0.6735