Pinecone Groq Llama 3 Rag

Groq Llama 3 Rag

vector-databasegroqsemantic-searchAIintegrationsLLMPythonjupyter-notebookpinecone-examples

alph-notebooks/pinecone-examples / groq-llama-3-rag.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

RAG with Groq and Llama 3

To begin, we setup our prerequisite libraries.

[1]

Data Preparation

We start by downloading a dataset that we will encode and store. The dataset jamescalam/ai-arxiv2-semantic-chunks contains scraped data from many popular ArXiv papers centred around LLMs and GenAI.

[2]

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

Dataset({
,    features: ['id', 'title', 'content', 'prechunk_id', 'postchunk_id', 'arxiv_id', 'references'],
,    num_rows: 10000
,})

We have 200K chunks, where each chunk is roughly the length of 1-2 paragraphs in length. Here is an example of a single record:

[3]

{'id': '2401.04088#0',
, 'title': 'Mixtral of Experts',
, 'content': '4 2 0 2 n a J 8 ] G L . s c [ 1 v 8 8 0 4 0 . 1 0 4 2 : v i X r a # Mixtral of Experts Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, LÃ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, ThÃ©ophile Gervet, Thibaut Lavril, Thomas Wang, TimothÃ©e Lacroix, William El Sayed Abstract We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a router network selects two experts to process the current state and combine their outputs. Even though each token only sees two experts, the selected experts can be different at each timestep. As a result, each token has access to 47B parameters, but only uses 13B active parameters during inference. Mixtral was trained with a context size of 32k tokens and it outperforms or matches Llama 2 70B and GPT-3.5 across all evaluated benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks. We also provide a model fine- tuned to follow instructions, Mixtral 8x7B â Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B â chat model on human bench- marks. Both the base and instruct models are released under the Apache 2.0 license.',
, 'prechunk_id': '',
, 'postchunk_id': '2401.04088#1',
, 'arxiv_id': '2401.04088',
, 'references': ['1905.07830']}

Format the data into the format we need, this will contain id, text (which we will embed), and metadata.

[4]

Dataset({
,    features: ['id', 'metadata'],
,    num_rows: 10000
,})

We need to define an embedding model to create our embedding vectors for retrieval, for that we will be using a variation of the e5-base model with a longer context length of 4k tokens. Ideally we should be running this on GPU for optimal runtimes.

[5]

We can check whether our encoder will use cpu or a cuda GPU (where available).

[6]

'cuda'

We can create embeddings now like so:

[7]

We can view the dimensionality of our returned embeddings, which we'll need soon when initializing our vector index:

[8]

Now we create our vector DB to store our vectors. For this we need to get a free Pinecone API key — the API key can be found in the "API Keys" button found in the left navbar of the Pinecone dashboard.

[9]

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all available providers and regions here.

[10]

Creating an index, we set dimension equal to the dimensionality of our encoder (384), and use a metric also compatible with the model (this can be cosine). We also pass our spec to index initialization.

[11]

{'dimension': 768,
, 'index_fullness': 0.0,
, 'namespaces': {},
, 'total_vector_count': 0}

We can see the index is currently empty with a total_vector_count of 0. We can begin populating it with our embeddings.

[12]

  0%|          | 0/79 [00:00<?, ?it/s]

[13]

[{'content': '4 . So the answer is 9Ï 4 . Are there variables in the solution? the form of "1. variable is defined as...". If so, please list the definition of variable in The underlined parts are the type of question, the question itself and the steps in its solution, respectively. The output from the LLM is: Yes. There are variables in the solution. x + yi, where xxx and yyy are real numbers. x + yi 1. zzz is defined as a complex number of the form x + yi',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'The bold part is then saved to form a part of the input in the regeneration stage. Target extraction To get a brief and clear target of the current step, the input to the LLM is: The following is a part of the solution to the problem: Let S be the set of complex numbers z such that the real part of 1 6 . This set forms a curve. Find the area of the 12 region inside the curve. (Step 0) Let z = x + yi be a complex number, where x and y are real numbers. (Step 5) Completing the square, we obtain (a â 3)? +y= Q. 3)? +y= Q. 2 . . . : 2 What specific action does the step "Completing the square, we obtain (a â 3) take? Please give a brief answer using a single sentence and do not copy the steps. 2 2 +y= 4 ." The underlined parts are the question and reasoning steps before the current one, including the current one. The output of the LLM is: The step completes the square to rewrite the equation in standard form of a circle. The whole sentence is saved and forms the most important part of the input in the regeneration stage. Information Collection To get sentences in the question and previous steps in the solution that are directly related to the current step, the input to the LLM is: This is a math question: Question: Let S be the set of complex numbers z such that the real part of 1 6 . This set forms a curve. Find the area of the region inside the curve.',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'The following is information extracted from the question: Information 0: Let S be the set of complex numbers z such that the real part of 1 1 6 . Information 1: This set forms a curve. Information 2: Find the area of the region inside the curve. The following is the first a few steps in a solution to the problem: Step 0: Let z = x + yi be a complex number, where x and y are real numbers. ... Step 4: Cross-multiplying and rearranging, we get the equation 6x = x2 + y2. Which previous steps or information does the next step "Completing the square, we obtain (x â 3) _ p= on" directly follow from? _ p= 4 ." directly follow from? 2 The underlined information (contents after Information X:) is simply a list of sentences in the question. The underlined steps are reasoning steps before (not including) the current one in the solution. The output from the LLM is: The next step "Completing the square, we obtain (x â 3)? +y? = 2." directly follows from Step 4. Then we use regular expression to extract the information and step id from LLMâ s output. Some steps may only rely on information from the question while others may only use results from previous steps. Step regeneration In this stage, we instruct the LLM to achieve the extracted target with only collected information with the following prompt: We are in a process of solving a math problem.',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'Variables are defined as: 1. z is defined as a complex number of the form x + yi, where x and y are real numbers. The following are some previous steps: Step 0: Cross-multiplying and rearranging, we get the equation 6x = x2 + y2. 13 The target for next step is: The step completes the square to rewrite the equation in standard form of a circle. Please try to achieve the target with the information from previous steps. The underlined parts are the type of question, variable definition (if any), collected information from the question (if any), collected previous steps (if any) and the extracted target. Please notice that the current step does not directly rely on any information from the question, we removed the corresponding block from the input. The output from the LLM is:',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'To rewrite the equation in standard form of a circle, we need to complete the square for both xxx and yyy. 6x = x2 + y2 Starting from the given equation: 6x = x2 + y2 6x = x2 + y2. (6/2)2 = 9 We add and subtract (6/2)2 = 9 (6/2)2 = 9 on the right-hand side: 6x + 9 â 9 = x2 + y2 + 9 â 9 6x + 9 â 9 = x2 + y2 + 9 â 9. 6x + 9 â 9 = x2 + y2 + 9 â 9 (x â 3)2 + y2 = 9 Simplifying: (x â 3)2 + y2 = 9 (x â 3)2 + y2 = 9. This is now in the standard form of a circle with (3, 0) and radius 333. (3, 0) center (3, 0)',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'Then we can directly compare the regenerated step with the original one in the next stage. Result comparison The input to the LLM is: The following are 2 solutions to a math problem: Solution 1: To rewrite the equation in standard form of a circle, we need to complete the square for both x and y. Starting from the given equation: 6x = x2 + y2. We add and subtract (6/2)2 = 9 on the right-hand side: 6x + 9 â 9 = x2 + y2 + 9 â 9. Simplifying: (x â 3)2 + y2 = 9. This is now in the standard form of a circle with center (3, 0) and radius 3. Solution 2: Completing the square, we obtain (2 â 2 a)? +y= 8. Compare the key points from both solutions step by step and then check whether Solution 1 "supports", "contradicts" or "is not directly related to" the conclusion in Solution 2. Pay special attention to difference in numbers. The underlined parts are simply the problem type, the regenerated step and the original step. The output is: In both solutions, the goal is to rewrite the given equation in standard form of a circle. However, in Solution 1, the completion of the square results in the center of the circle being (3, 0) and a radius of 3. In Solution 2, the completion of the square results in the center of the circle being (3/2, 0) and a radius of 3/2. Therefore, Solution 1 contradicts the conclusion in Solution 2 as the centers and radii of the circles are different.',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'If the word â supportâ or â contradictâ appears in the last line of the output, SelfCheck outputs a step score 1 or -1. Otherwise, SelfCheck outputs 0 meaning that two steps are not related. # APPENDIX B EXAMPLAR FOR ONE-SHOT ERROR CHECKING The following example from Ling et al. (2023) is used to perform one-shot error checking. Here is some information: â 1. Alice gets 2 apples. 2. Alice gets twice as many oranges as bananas. 3. Original price per apple: $2',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': '14 Discount: 50% Price per apple after discount: $2 * (1 - 0.50) = $2 * 0.50 = $1" Based on the given information, here is a reasoning process: "Calculate Aliceâ s cost of the 5 apples after a 50% discount. Price per apple after discount: $1 Apples be eaten: 3 Oranges be eaten: 6 Total apples: 5 Total oranges: x Total bananas: 2x Aliceâ s total number of fruits: 5 + x + 2x = 5 + 4x Aliceâ s total cost of 5 apples: $1 * (5 - 3) = $2â Double-check the reasoning process, letâ s analyze its correctness, and end with "yes" or "no". Answer: Letâ s think step by step.',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'Grounding check Purpose: Find references for numbers in the reasoning process sequentially Rule: Any ungrounded number makes the check fails "x" is a variable, we do not care about variables. "y" is a variable, we do not care about variables. "5" is a quantity from the given information that represents the number of apples. "50%" is a quantity from the given information that represents the discount. "$1" is a quantity from the given information that represents the price per apple after the discount. "3" is a quantity not from the given information, it is ungrounded. "6" is a quantity not from the given information, it is ungrounded. "$2" is a quantity from the given information that represents the original price per apple.',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': 'Result: Based on the check, there are ungrounded numbers, so the grounding check fails. Reasoning check: Purpose: Check whether the reasoning logic correctly answers the question Rule: Any logic error makes the check fails To calculate the total cost of apples after a 50% discount, we should multiply the number of apples by the discounted price. But to answer the question, we do not need to calculate the total number of fruit Alice gets. Result: Based on the check, there are some logic mistakes, so the reasoning check fails. Calculation check: Purpose: Check whether the calculation process is consistent Rule: Any inconsistent calculation makes the check fails calculation1: equation: $1 * (5 - 3), answer: $2 (5 - 3) = 2 $1 * 2 = $2 is consistent with the answer, so the calculation is correct. calculation2: equation: 5 + x + 2x, answer: 5 + 4x x + 2x = 3x 5 + 3x is inconsistent with the answer, so the calculation is incorrect. Result: Based on the check, the calculation process is inconsistent, so the calculation check fails. Check results: Ground check fails, Reasoning check fails, Calculation check fails. Rule: Any failed check makes the reasoning incorrect. So the answer is "no".',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': '15',
, 'title': 'SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning'},
, {'content': '3 2 0 2 v o N 6 ] I A . s c [ 5 v 2 5 3 0 0 . 8 0 3 2 : v i X r a Preprint # METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK Sirui Hong1â , Mingchen Zhuge2â , Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4, Ceyao Zhang4, Jinlin Wang1, Zili Wang, Steven Ka Shing Yau5, Zijuan Lin4, Liyang Zhou6, Chenyu Ran1, Lingfeng Xiao1,7, Chenglin Wu1â , J Â¨urgen Schmidhuber2,8 1DeepWisdom, 2AI Initiative, King Abdullah University of Science and Technology, 3Xiamen University, 5Nanjing University, 7University of California, Berkeley, # ABSTRACT Remarkable progress has been made on automated problem solving through so- cieties of agents based on large language models (LLMs). Existing LLM-based multi-agent systems can already solve simple dialogue tasks. Solutions to more complex tasks, however, are complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs. Here we introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT en- codes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. On col- laborative software engineering benchmarks, MetaGPT generates more coherent solutions than previous chat-based multi-agent systems. Our project can be found at https://github.com/geekan/MetaGPT',
, 'title': 'MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework'},
, {'content': '1 # INTRODUCTION Autonomous agents utilizing Large Language Models (LLMs) offer promising opportunities to en- hance and replicate human workflows. In real-world applications, however, existing systems (Park et al., 2023; Zhuge et al., 2023; Cai et al., 2023; Wang et al., 2023c; Li et al., 2023; Du et al., 2023; Liang et al., 2023; Hao et al., 2023) tend to oversimplify the complexities. They struggle to achieve effective, coherent, and accurate problem-solving processes, particularly when there is a need for meaningful collaborative interaction (Zhang et al., 2023; Dong et al., 2023; Zhou et al., 2023; Qian et al., 2023). Through extensive collaborative practice, humans have developed widely accepted Standardized Operating Procedures (SOPs) across various domains (Belbin, 2012; Manifesto, 2001; DeMarco & Lister, 2013). These SOPs play a critical role in supporting task decomposition and effective coor- dination. Furthermore, SOPs outline the responsibilities of each team member, while establishing standards for intermediate outputs. Well-defined SOPs improve the consistent and accurate exe- cution of tasks that align with defined roles and quality standards (Belbin, 2012; Manifesto, 2001; DeMarco & Lister, 2013; Wooldridge & Jennings, 1998). For instance, in a software company, Product Managers analyze competition and user needs to create Product Requirements Documents (PRDs) using a standardized structure, to guide the developmental process. Inspired by such ideas, we design a promising GPT-based Meta-Programming framework called MetaGPT that significantly benefits from SOPs. Unlike other works (Li et al., 2023; Qian et al., 2023), MetaGPT requires agents to generate structured outputs, such as high-quality requirements',
, 'title': 'MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework'},
, {'content': 'â These authors contributed equally to this work. â Chenglin Wu (alexanderwu@fuzhi.ai) is the corresponding author, affiliated with DeepWisdom. 1 Preprint MetaGPT Agents Collaboration with Developing SOP 1/5 | | One-line requirement Define Write a classic and simple Flappy Bird game. 2/5 : Design Boss makes acceptance: check and payment Planning Requirement Analysis Architectural Design Meta Programming System Design 1 3/5 % | Plan&Code Pretty good ! I can %, 1 directly use the %, * a| Â¢ interface and %, . 1 â _ keyboard to play oe, 7 4/5 Flappy Bird. % â Testin; 7 Test ms bd QA Engineer W â N am Â°R . a4 cn 4 5/5] Figure 1:',
, 'title': 'MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework'},
, {'content': 'The software development SOPs between MetaGPT and real-world human teams. In software engineering, SOPs promote collaboration among various roles. MetaGPT showcases its ability to decompose complex tasks into specific actionable procedures assigned to various roles (e.g., Product Manager, Architect, Engineer, etc.). documents, design artifacts, flowcharts, and interface specifications. The use of intermediate struc- tured outputs significantly increases the success rate of target code generation. More graphically, in a company simulated by MetaGPT, all employees follow a strict and streamlined workflow, and all their handovers must comply with certain established standards. This reduces the risk of hallucina- tions caused by idle chatter between LLMs, particularly in role-playing frameworks, like: â',
, 'title': 'MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework'},
, {'content': 'Hi, hello and how are you?â â Alice (Product Manager); â Great! Have you had lunch?â â Bob (Architect). Benefiting from SOPs, MetaGPT offers a promising approach to meta-programming. In this context, we adopt meta-programming1 as â programming to programâ , in contrast to the broader fields of meta learning and â learning to learnâ (Schmidhuber, 1987; 1993a; Hochreiter et al., 2001; Schmidhuber, 2006; Finn et al., 2017). This notion of meta-programming also encompasses earlier efforts like CodeBERT (Feng et al., 2020) and recent projects such as CodeLlama (Rozi`ere et al., 2023) and WizardCoder (Luo et al., 2023). However, MetaGPT stands out as a unique solution that allows for efficient meta- programming through a well-organized group of specialized agents. Each agent has a specific role and expertise, following some established standards. This allows for automatic requirement analysis, system design, code generation, modification, execution, and debugging during runtime, highlight- ing how agent-based techniques can enhance meta-programming. To validate the design of MetaGPT, we use publicly available HumanEval (Chen et al., 2021a) and MBPP (Austin et al., 2021) for evaluations. Notably, in code generation benchmarks, MetaGPT achieves a new state-of-the-art (SoTA) with 85.9% and 87.7% in Pass@1. When compared to other popular frameworks for creating complex software projects, such as AutoGPT (Torantulino et al., 2023), LangChain (Chase, 2022), AgentVerse (Chen et al., 2023), and ChatDev (Qian et al., 2023). MetaGPT also stands out in handling higher levels of software complexity and offering extensive functionality. Remarkably, in our experimental evaluations, MetaGPT achieves a 100% task com- pletion rate, demonstrating the robustness and efficiency (time and token costs) of our design. We summarize our contributions as follows:',
, 'title': 'MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework'}]

Now let's test retrieval!

[14]

[15]

Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023. William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang, Jonathan Ward, and Jan Leike. Self-critiquing models for assisting human evaluators. CoRR, abs/2206.05802, 2022. doi: 10.48550/arXiv.2206.05802. URL https://doi.org/10.48550/arXiv.2206.05802.
---
Chinese-oriented Models (Continue Pretraining) F F T T F F T T Llama1 BLOOMZ-7B1-mt Llama1 Llama1 Llama2 Llama2 Llama2 Llama2 7B, 13B 7B 7B, 13B, 33B 13B 7B 7B 7B 13B 2k 1k 8k 2k 4k 4k 4k 4k 5w 200w 200w, 300w, 430w 110w 1000w â 120w 100w English-oriented Models Llama2-chat (Touvron et al. 2023) Vicuna-V1.3 (Zheng et al. 2023) Vicuna-V1.5 (Zheng et al. 2023) WizardLM (Xu et al. 2023b) LongChat-V1 (Li* et al. 2023) LongChat-V1.5 (Li* et al. 2023) OpenChat-V3.2 (Wang et al. 2023a) GPT-3.5-turbo GPT-4 Llama2 Llama1 Llama2 Llama1 Llama1 Llama2 Llama2 - - 7B, 13B, 70B 7B, 13B, 33B 7B, 13B 13B 7B, 13B 7B 13B - - N/A N/A N/A N/A N/A N/A N/A N/A N/A 4k 2k 16k 2k 16k 32k 4k 16k 16k 10w 12w 12w 25w 8w, 2w â 0.6w â â
---
MMLU: MMLU includes 57 multiple-choice tasks covering subjects spanning STEM to social science [17]. The tasks differ significantly in complexity, with many STEM-oriented questions demanding domain-specific professional knowledge and intricate reasoning to be solved. TruthfulQA: TruthfulQA contains 817 factual questions to detect model falsehoods caused by naively mimicking human language patterns [27]. The solutions to these questions are closely associated with English Wikipedia sources. The task probes a modelâ s factual knowledge and resistance to popular misconceptions. Table 3: Performance of FLM-101B and baselines including LLAMA series and GLM-130B. In order to visually compare the performance and cost, we estimate the floating-point opera- tions (zetta = 1021) of the training process. Model Cost (zettaFLOPs) Average ARC HellaSwag MMLU TruthfulQA LLAMA-2 (13B) LLAMA-2 (7B) LLAMA (13B) LLAMA (7B) GLM-130B 201.37 106.60 94.81 49.54 210.80 (Â±28.77) (Â±15.23) (Â±13.54) (Â±7.08) 58.66 54.32 56.08 49.72 48.11 28.22 43.94 59.39 53.07 56.23 51.02 42.15 39.76 82.13 78.59 80.93 77.82 67.91 66.23 55.77 46.87 47.67 35.71 42.59 28.30â 37.38 38.76 39.48 34.33 39.80 41.47 # FLM-101B â 44.50 for a knowledge-enhanced eFLM-16B (Section 2.2, 4.2). Table 3 details the performance of FLM-101B and strong baselines, including LLAMA series and GLM-130B. Because GPT-3 is closed-source, we could not get the probability values for a fair comparison. As a result, we cannot list GPT-3 here.
---
LLaMA LLaMA (Touvron et al., 2023a) is an auto-regressive, decoder-only large language model based on the Transformer architecture. The model is characterized by its billions of param- eters, pre-trained on a vast amount of web data. Being uni-directional means that the modelâ s at- tention mechanism only considers the preceding elements in the input sequence when making pre- dictions. Specifically, given an input sequence x = [t1, t2, ..., tnâ 1], the model computes the prob- ability of the next token tn based solely on the preceding tokens. The prediction process can be mathematically represented as P (tn|t1, ..., tnâ 1), where P denotes the probability and tn represents the next element in the sequence.
---
One notable development in this area is the emergence of open-source LLMs, specifically LLaMA (Touvron et al., 2023a) and LLAMA 2 (Touvron et al., 2023b), which have been recognized as the most powerful open-source language models ever created. This has led to a surge of activity in the open-source community (Wolf et al., 2019), with a series of large language models being developed collaboratively to build upon this progress (Mosaic ML, 2023; Almazrouei et al., 2023; ChatGLM2 Team, 2023; Yang et al., 2023; InternLM Team, 2023).

Our retrieval component works, now let's try feeding this into a Llama 3 70B model hosted by Groq to produce an answer.

[16]

Now we can generate responses using Llama 3, we'll wrap this logic into a help function called generate:

[17]

[18]

The LLaMA LLMs refer to a series of large language models developed by researchers, specifically LLaMA and LLAMA 2. These models are notable for being open-source, which has led to a surge of activity in the open-source community.

LLaMA is an auto-regressive, decoder-only large language model based on the Transformer architecture. It's characterized by its billions of parameters and pre-training on a vast amount of web data. Being uni-directional, the model's attention mechanism only considers the preceding elements in the input sequence when making predictions.

From the comparison tables, we can see that LLaMA and LLAMA 2 come in different sizes, such as 7B, 13B, and 33B, indicating the number of parameters in each model. These models have been tested on various tasks, including MMLU, TruthfulQA, ARC, and HellaSwag, and have shown competitive performance compared to other strong baselines.

The emergence of open-source LLMs like LLaMA and LLAMA 2 has enabled the open-source community to build upon this progress, leading to the development of new large language models through collaborative efforts.

Don't forget to delete your index when you're done to save resources!

[19]