Rapids Notebook
Developer RAG Chatbot
In this notebook, we are going to build a basic developer chatbot. The Developer RAG Chatbot is intended to provide an example RAG workflow for developers. This example uses RAPIDS cuDF source code and API documentation as a representative dataset of a developer's codebase. We will use this dataset to create a code chatbot/assistant that can answer questions about cuDF and provide examples of using the API. Note that the example is intended to make it easier for developers to interact and come up to speed with a code base, but not necessarily fully generate code for the developer.
To build this application, we'll be using Llama3 70B hosted on NV AI Foundation as the LLM and the E5-Large embedding model. We'll add the embeddings into a FAISS vector database and use Langchain to build the logic tying the pieces together. Finally, we'll use Gradio as the interface for accessing the chatbot.

Prerequisites
- Setup your NVIDIA NGC account and generate an API Key: https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/#setup
- An NVIDIA GPU with at least 4 GB of memory is required to run the embedding model and create the necessary vectorstores
Step 1: Pull cuDF Dataset
First, we pull the cuDf 24.04 release from GitHub.
Step 2: Parse Source Code and Documentation
Next, we parse the relevant python source code and related documentation.
Step 3: Split Data to Prepare for Embedding
In this step, we split our data into smaller chunks for the embedding process.
Note: It may take several minutes for the e5-large-v2 model to download.
Step 4: Generate Embeddings and Store Embeddings in the Vector Store
Next, we generate our embeddings from our dataset, and store them in the appropriate vector stores. This process will generally take several minutes, depending on your hardware. A cached version of each vector store will be saved locally for use in future notebook runs.
Step 5: Test Embeddings
Here we pass in a simple test query to ensure we are pulling relevant chunks from our code vector store. Notice it should include the 'size' function definition from the frame.py script as part of the retrieved context.
Step 6: Connect to LLM
Here we create the connection to the Llama3-70b model via the NVIDIA AI Foundation Endpoint.
Step 7: Create prompt pipeline
Next we create the prompt for our chatbot. We've broken it into several pieces to make it easier to understand the individual portions of the prompt. We bring the individual pieces of the prompt together using a pipeline prompt at the end of the section.
Step 8: Create Retrievers
In this section, we create retrievers to access the data in our vector stores. We add additional parameters and filtering to ensure only the most relevant documents are returned.
Step 9: Implement Chatbot Logic
In this section, we implement the main logic for our chatbot. This includes the chatbot response function, managing the size of the chat history, and adding sources to the response.
Step 10: Start Chatbot
We're finally ready to start our chatbot. Run the cell below to create the Gradio interface and begin interacting with your chatbot! Note the differences in the responses when enabling the various knowledge bases, and compare that to the same response without using any knowledge bases.