Notebooks
M
Milvus
Multimodal Demo May30

Multimodal Demo May30

image-searchvector-databasesemantic-searchmilvusWorkshopsembeddingsunstructured-dataquestion-answeringLLMmilvus-bootcampdeep-learningimage-recognitionimage-classificationaudio-searchPythonbootcampragmultimodalNLP

Download data

Data is 25K .jpg images from two existing datasets.

  • images.csv metadata from Unsplash, sorted and converted to CSV.
  • images/ in 250x250 resolution by kaggle/@jettchentt.
  • images.fbin is a binary file with UForm image embeddings.
  • images.usearch is a binary file with a serialized USearch index. The original images.tsv from Unsplash has been filtered to avoid missing images.

👉🏼 Download images.zip file directly from:
https://huggingface.co/datasets/unum-cloud/ann-unsplash-25k/tree/main

[1]
(24292, 31)

Create a Milvus Collection

[2]
Pymilvus: 2.4.3
Milvus server: v2.4.1
[3]
Successfully dropped collection: `Demo_multimodal`
Successfully created collection: `Demo_multimodal`

Inference the embedding model

Using Unum's UForm Pocket-Sized Multimodal Encoders.

Supports text-to-image queries in 21 languages including: English German Spanish French Italian Russian Japanese Korean Turkish Chinese Polish.

[4]
[5]
2024-05-28 20:22:01.881705 [W:onnxruntime:, helper.cc:67 IsInputSupported] CoreML does not support input dim > 16384. Input:word_embeddings.weight_quantized, shape: {250037,384}
2024-05-28 20:22:01.882543 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 74 number of nodes in the graph: 714 number of nodes supported by CoreML: 483
2024-05-28 20:22:03.051915 [W:onnxruntime:, coreml_execution_provider.cc:81 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 100 number of nodes in the graph: 1056 number of nodes supported by CoreML: 727
[6]
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Embedding time for batch size 10: 5.01 seconds
Embedding time for batch size 10: 4.25 seconds
Embedding time for batch size 10: 5.03 seconds
Embedding time for batch size 10: 4.65 seconds
Embedding time for batch size 10: 4.88 seconds
Embedding time for batch size 10: 5.41 seconds
Embedding time for batch size 10: 5.39 seconds
Embedding time for batch size 10: 4.47 seconds
Embedding time for batch size 10: 2.64 seconds
Image error: ./images/_bQFVR3DF68.jpg
Embedding time for batch size 9: 4.45 seconds
Embedding time for batch size 10: 5.61 seconds
Embedding time for batch size 10: 5.26 seconds
Embedding time for batch size 10: 5.29 seconds
Embedding time for batch size 10: 4.98 seconds
Image error: ./images/_GZBJppR7Hk.jpg
Embedding time for batch size 9: 3.66 seconds
Embedding time for batch size 10: 4.44 seconds
Embedding time for batch size 10: 4.2 seconds
Embedding time for batch size 10: 3.33 seconds
Embedding time for batch size 10: 4.53 seconds
Embedding time for batch size 10: 4.35 seconds
Embedding time for batch size 10: 3.41 seconds
Embedding time for batch size 10: 4.11 seconds
Embedding time for batch size 10: 4.47 seconds
Embedding time for batch size 10: 3.81 seconds
Embedding time for batch size 10: 5.43 seconds
Embedding time for batch size 10: 5.0 seconds
Embedding time for batch size 10: 4.51 seconds
Embedding time for batch size 10: 5.38 seconds
Embedding time for batch size 10: 4.19 seconds
Embedding time for batch size 10: 2.47 seconds
[7]
output fields: ['id', 'chunk', 'image_filepath']
[8]
black and white nike athletic shoe
Output
green leafed plant
Output
red rose flowers
Output
a white dog sitting in the snow looking at the camera
Output
cloudy sky during golden hour
Output
selective focus photography of waterfalls during daytime
Output
a close up of a cat with an open mouth
Output
bird's-eye photography of pine trees covered by snow
Output
grey mountains during sunset
Output
woman with wings willow tree figurine
Output

Now the fun part, search!

[9]
Count rows: 298
timing: 0.0066 seconds

[10]
[11]
a close up of a cat with an open mouth
Output
[17]
Milvus search time: 0.0044231414794921875 seconds
Output
[16]
Milvus search time: 0.0057070255279541016 seconds
Output
[19]
silhouette of person sitting on rock formation during golden hour
Output
[21]
Milvus search time: 0.004724025726318359 seconds
Output
[27]
<Figure size 640x480 with 0 Axes>
Output
[15]
Author: Christy Bergman

Python implementation: CPython
Python version       : 3.11.8
IPython version      : 8.22.2

torch   : 2.3.0
pymilvus: 2.4.3
uform   : 3.0.2

conda environment: py311-unum