Main

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learninggpt-4-visionllama-indexprompttools-eval-promptsragmultimodallangchainlancedb-recipes

PromptTools : LLM Output Evaluation using LanceDB

picture

Find more on the Prompttools page.

Installations

[ ]

Run an experiment

One common use case is to compare two different embedding functions and how it may impact your document retrieval. We have can define what embedding functions we'd like to test here.

Note: If you previously haven't downloaded these embedding models. This may kick off downloads.

[3]

Load data

[4]

We can then run the experiment to get results.

[5]
WARNING: rate limit only support up to 3.10, proceeding without rate limiter
WARNING: rate limit only support up to 3.10, proceeding without rate limiter
WARNING: rate limit only support up to 3.10, proceeding without rate limiter

We can visualize the result. In this case, the result of the second query "This is a another query document" is different.

paraphrase-MiniLM-L3-v2: [id2, id3, id1]

default (all-MiniLM-L6-v2) : [id2, id1, id3]

Let's visualize the outputs

[6]

Evaluate the model response

To evaluate the results, we'll define an evaluation function. Sometimes, you know order of the most relevant document should be given a query, and you can compute the correlation between expected ranking and actual ranking.

[7]

Finally, we can evaluate and visualize the results.

[8]
[9]

You can also use auto evaluation. We will add an example of this in the near future.