Notebooks
N
NVIDIA
Rapids Notebook

Rapids Notebook

gpu-accelerationretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-exampleslarge-language-modelsmicroservicetriton-inference-servercommunityLLMnotebooksragnemorag-developer-chatbot

Developer RAG Chatbot

In this notebook, we are going to build a basic developer chatbot. The Developer RAG Chatbot is intended to provide an example RAG workflow for developers. This example uses RAPIDS cuDF source code and API documentation as a representative dataset of a developer's codebase. We will use this dataset to create a code chatbot/assistant that can answer questions about cuDF and provide examples of using the API. Note that the example is intended to make it easier for developers to interact and come up to speed with a code base, but not necessarily fully generate code for the developer.

To build this application, we'll be using Llama3 70B hosted on NV AI Foundation as the LLM and the E5-Large embedding model. We'll add the embeddings into a FAISS vector database and use Langchain to build the logic tying the pieces together. Finally, we'll use Gradio as the interface for accessing the chatbot.

title

Prerequisites

  1. Setup your NVIDIA NGC account and generate an API Key: https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/#setup
  2. An NVIDIA GPU with at least 4 GB of memory is required to run the embedding model and create the necessary vectorstores

Step 1: Pull cuDF Dataset

First, we pull the cuDf 24.04 release from GitHub.

[ ]

Step 2: Parse Source Code and Documentation

Next, we parse the relevant python source code and related documentation.

[ ]

Step 3: Split Data to Prepare for Embedding

In this step, we split our data into smaller chunks for the embedding process.

Note: It may take several minutes for the e5-large-v2 model to download.

[ ]

Step 4: Generate Embeddings and Store Embeddings in the Vector Store

Next, we generate our embeddings from our dataset, and store them in the appropriate vector stores. This process will generally take several minutes, depending on your hardware. A cached version of each vector store will be saved locally for use in future notebook runs.

[ ]

Step 5: Test Embeddings

Here we pass in a simple test query to ensure we are pulling relevant chunks from our code vector store. Notice it should include the 'size' function definition from the frame.py script as part of the retrieved context.

[ ]

Step 6: Connect to LLM

Here we create the connection to the Llama3-70b model via the NVIDIA AI Foundation Endpoint.

[ ]

Step 7: Create prompt pipeline

Next we create the prompt for our chatbot. We've broken it into several pieces to make it easier to understand the individual portions of the prompt. We bring the individual pieces of the prompt together using a pipeline prompt at the end of the section.

[ ]

Step 8: Create Retrievers

In this section, we create retrievers to access the data in our vector stores. We add additional parameters and filtering to ensure only the most relevant documents are returned.

[ ]

Step 9: Implement Chatbot Logic

In this section, we implement the main logic for our chatbot. This includes the chatbot response function, managing the size of the chat history, and adding sources to the response.

[ ]

Step 10: Start Chatbot

We're finally ready to start our chatbot. Run the cell below to create the Gradio interface and begin interacting with your chatbot! Note the differences in the responses when enabling the various knowledge bases, and compare that to the same response without using any knowledge bases.

[ ]