NVIDIA RAG Chain Server API Client

RAG Chain Server API Client

gpu-accelerationsrcretrieval-augmented-generationllm-inferencetensorrtnvidia-generative-ai-exampleslarge-language-modelsmicroservicetriton-inference-serverchain_serverLLMragnemo

alph-notebooks/nvidia-generative-ai-examples / RAG_Chain_Server_API_Client.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Press Release Chat Bot

As part of this generative AI workflow, we create a NVIDIA PR chatbot that answers questions from the NVIDIA news and blogs from years of 2022 and 2023. For this, we have created a REST FastAPI server that wraps llama-index. The API server has two methods, upload_document and generate. The upload_document method takes a document from the user's computer and uploads it to a Milvus vector database after splitting, chunking and embedding the document. The generate API method generates an answer from the provided prompt optionally sourcing information from a vector database.

Step-1: Load the pdf files from the dataset folder.

You can upload the pdf files containing the NVIDIA blogs to query:8081/uploadDocument API endpoint

[ ]

Step-2 : Ask a question without referring to the knowledge base

Ask Tensorrt LLM llama-2 13B model a question about "the nvidia grace superchip" without seeking help from the vectordb/knowledge base by setting use_knowledge_base to false

[ ]

Now ask it the same question by setting use_knowledge_base to true

[ ]

Next steps

We have setup a playground UI for you to upload files and get answers from, the UI is available on the same IP address as the notebooks: host_ip:8090/converse