Main

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aitutorialsmachine-learningembeddingsfine-tuningcli-sdk-to-convert-image-datasets-to-lancedeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Convert any Image dataset to Lance

This notebook demonstrates for transforming any Image Dataset into the Lance format. It provides a straightforward solution for converting diverse image datasets into a standardized Lance format.

image (1).png

Imports

[ ]

Set the variable according to your Image dataset

Assign the path to your image dataset to the variable image_dataset. This dataset should contain your images organized into training, testing, and validation folders. These images will be used to convert them into Lance format.

[ ]

Processing the Images

The process_images function is the central component of this notebook, responsible for transforming images from the training, testing, and validation folders into Lance format. This format typically includes essential attributes such as image, filename, category, and data_type.

Specifically, image represents the actual image data, filename denotes the name of the file, category indicates the category to which the image belongs, and data_type specifies whether the image is from the training, testing, or validation set.

[ ]

Creating a Lance Dataset

This function, write_to_lance, is designed to convert a PyArrow Table into a Lance dataset. It begins by defining the schema for the Lance dataset, specifying fields such as image, filename, category, and data_type , make sure the schema is the same as the one defined in the process_images function.

Once the schema is established, the function determines the path for saving the Lance file, leveraging the current working directory and the provided image_dataset variable. It then initializes a RecordBatchReader using the defined schema and the data obtained from the process_images function.

[ ]

Load a Lance Dataset and Visualize it in Pandas Dataframe

loading_into_pandas function is designed to load a Lance dataset into a Pandas dataframe. It let's you see your Lance dataset in a pandas dataframe.

The function takes the path to the Lance file as an argument and returns a pandas dataframe. Make sure the schema is the same as the one defined during the Lance dataset generation, refer to process_images function and also make sure the path to the Lance file is correct.

[ ]