Ollama

code01.Introducemicrosoft-phi-cookbook

Ollama + OpenAI + Python

1. Specify the model name

If you pulled in a different model than "phi3:mini", change the value in the cell below. That variable will be used in code throughout the notebook.

[ ]

2. Setup the Open AI client

Typically the OpenAI client is used with OpenAI.com or Azure OpenAI to interact with large language models. However, it can also be used with Ollama, since Ollama provides an OpenAI-compatible endpoint at "http://localhost:11434/v1".

[ ]
[ ]

3. Generate a chat completion

Now we can use the OpenAI SDK to generate a response for a conversation. This request should generate a haiku about cats:

[ ]

4. Prompt engineering

The first message sent to the language model is called the "system message" or "system prompt", and it sets the overall instructions for the model. You can provide your own custom system prompt to guide a language model to generate output in a different way. Modify the SYSTEM_MESSAGE below to answer like your favorite famous movie/TV character, or get inspiration for other system prompts from Awesome ChatGPT Prompts.

Once you've customized the system message, provide the first user question in the USER_MESSAGE.

[ ]

5. Few shot examples

Another way to guide a language model is to provide "few shots", a sequence of example question/answers that demonstrate how it should respond.

The example below tries to get a language model to act like a teaching assistant by providing a few examples of questions and answers that a TA might give, and then prompts the model with a question that a student might ask.

Try it first, and then modify the SYSTEM_MESSAGE, EXAMPLES, and USER_MESSAGE for a new scenario.

[ ]

6. Retrieval Augmented Generation

RAG (Retrieval Augmented Generation) is a technique to get a language model to answer questions accurately for a particular domain, by first retrieving relevant information from a knowledge source and then generating a response based on that information.

We have provided a local CSV file with data about hybrid cars. The code below reads the CSV file, searches for matches to the user question, and then generates a response based on the information found. Note that this will take longer than any of the previous examples, as it sends more data to the model. If you notice the answer is still not grounded in the data, you can try system engineering or try other models. Generally, RAG is more effective with either larger models or with fine-tuned versions of SLMs.

[ ]