Umap On LanceDB Cloud
Umap Visualization
๐ If you havenโt signed up for LanceDB Cloud yet, click here to get started!
What is UMAP and Why Should You Care?
Imagine you have a messy closet with hundreds of items scattered around. It's hard to see patterns or find what you need. Now imagine organizing everything onto a simple table where similar items are grouped together - suddenly you can see relationships and make sense of your belongings.
UMAP (Uniform Manifold Approximation and Projection) does something similar for data. It takes complex datasets with hundreds or thousands of dimensions
- think of these as having hundreds of different characteristics
- and creates a simple 2D or 3D map that you can actually look at and understand.
Why is this useful?
When working with AI and machine learning, we often deal with vector embeddings - these are just numerical representations of things like words, images, or documents. But these vectors typically have hundreds of dimensions, making them impossible to visualize directly.
UMAP helps by:
- Making the invisible visible: Transforming complex data into something you can see
- Preserving relationships: Items that were similar in the original data stay close together on the map
- Revealing patterns: Clusters, outliers, and trends become obvious once visualized
What You'll Learn
In this notebook, we'll walk you through the process of taking high-dimensional vector embeddings and creating beautiful, informative 2D and 3D visualizations that reveal hidden patterns in your data.
Step 1: Install Required Libraries
Step 2: Obtain the API key from the dashboard
- Get the db uri
db uri starts with db://, which can be obtained from the project page on the dashboard. In the following example, db uri is db://test-sfifxz.
- Get the API Key Obtain a LanceDB Cloud API key by clicking on the GENERATE API KEY from the table page.
๐ก Copy the code block for connecting to LanceDB Cloud that is shown at the last step of API key generation.
You can adjust these umap parameters. checkout https://umap-learn.readthedocs.io/en/latest/parameters.html#metric for more details.
Step 3: Copy the dataset
We prepared a people dataset for demo purposes and you can use your own dataset.