Model Explorer Streaming
Streaming model explorer for Haystack
notebook by Tilde Thurium: Mastodon || Twitter || LinkedIn
Problem: there are so many LLMs these days! Which model is the best for my use case?
This notebook uses Haystack to compare the results of sending the same prompt to several different models.
This is a very basic demo where you can only compare a few models that support streaming responses. I'd like to support more models in the future, so watch this space for updates.
Models
Haystack's OpenAIGenerator and CohereGenerator support streaming out of the box.
The other models use the HuggingFaceAPIGenerator.
Prerequisites
- You need HuggingFace, Cohere, and OpenAI API keys. Save them as secrets in your Colab. Click on the key icon in the left menu or see detailed instructions here.
- To use Mistral-7B-v0.1, you should also accept Mistral conditions here: https://huggingface.co/mistralai/Mistral-7B-v0.1
In order for userdata.get to work, these keys need to be saved as secrets in your Colab. Click on the key icon in the left menu or see detailed instructions here.
tokenizer_config.json: 0%| | 0.00/967 [00:00<?, ?B/s]
tokenizer.model: 0%| | 0.00/493k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/1.80M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/72.0 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/287 [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/2.73M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/281 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/222 [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/14.5M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/85.0 [00:00<?, ?B/s]
The AppendToken dataclass formats the output so that the model name is printed, and the text follows in chunks of 5 tokens.
HBox(children=(Output(layout=Layout(border='1px solid black')), Output(layout=Layout(border='1px solid black')…