Amazon Web Services Text Generation Open Llama

Text Generation Open Llama

data-scienceinferencearchivedamazon-sagemaker-examplesreinforcement-learningmachine-learningawsexamplesdeep-learningsagemakerjupyter-notebooktrainingmlops

alph-notebooks/amazon-sagemaker-examples / text-generation-open-llama.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Introduction to SageMaker Built-In Algorithms - Text Generation

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy Open-LLAMA model for text generation. It is a permissively licensed (Apache-2.0) open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset which is a reproduction of the LLaMA training dataset containing over 1.2 trillion tokens.

[ ]

Supported parameters

This model supports many parameters while performing inference. They include:

max_length: Model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
max_new_tokens: Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
num_beams: Number of beams used in the greedy search. If specified, it must be integer greater than or equal to num_return_sequences.
no_repeat_ngram_size: Model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
temperature: Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If temperature -> 0, it results in greedy decoding. If specified, it must be a positive float.
early_stopping: If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be boolean.
do_sample: If True, sample the next word as per the likelihood. If specified, it must be boolean.
top_k: In each step of text generation, sample from only the top_k most likely words. If specified, it must be a positive integer.
top_p: In each step of text generation, sample from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.
return_full_text: If True, input text will be part of the output generated text. If specified, it must be boolean. The default value for it is False.
stop: If specified, it must a list of strings. Text generation stops if any one of the specified strings is generated.

We may specify any subset of the parameters mentioned above while invoking an endpoint. Next, we show an example of how to invoke endpoint with these arguments.

Clean up the endpoint

[ ]

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.