Amazon Web Services Falcon 7b Deepspeed

Falcon 7b Deepspeed

data-scienceinferencearchivedamazon-sagemaker-examplesreinforcement-learningmachine-learningdeploy-falcon-40b-and-7bWorkshopsawsexamplesdeep-learningsagemakerjupyter-notebooktrainingmlops

alph-notebooks/amazon-sagemaker-examples / falcon-7b-deepspeed.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Serve Falcon 7B model with Amazon SageMaker Hosting

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

In this example we walk through how to deploy and perform inference on the Falcon 7B model using the Large Model Inference(LMI) container provided by AWS using DJL Serving and DeepSpeed. The Falcon 7B model is a casual decoder model simlilar to the larger Falcon 40B model. We will deploy using a ml.g5.2xlarge instance for efficiency

Setup

Installs the dependencies required to package the model and run inferences using Amazon SageMaker. Update SageMaker, boto3 etc

[ ]

Imports and variables

[ ]

1. Create SageMaker compatible model artifacts

In order to prepare our model for deployment to a SageMaker Endpoint for hosting, we will need to prepare a few things for SageMaker and our container. We will use a local folder as the location of these files including serving.properties that defines parameters for the LMI container and requirements.txt to detail what dependies to install.

[ ]

In the serving.properties files define the the engine to use and model to host. Note the tensor_parallel_degree parameter which is set to a value of 1 in this scenario. Since the entire model can fit on a sigle GPU we do not have to divide the model into multiple parts. In this case we will use a 'ml.g5.2xlarge' instance which provides 1 GPU. Be careful not to specify a value larger than the instance provides or your deployment will fail.

[ ]

2. Create a model.py with custom inference code

SageMaker allows you to bring your own script for inference. Here we create our model.py file with the appropriate code for the Falcon 7B model.

[ ]

3. Create the Tarball and then upload to S3 location

Next, we will package our artifacts as *.tar.gz files for uploading to S3 for SageMaker to use for deployment

[ ]

4. Define a serving container, SageMaker Model and SageMaker endpoint

Now that we have uploaded the model artifacts to S3, we can create a SageMaker endpoint.

Define the serving container

Here we define the container to use for the model for inference. We will be using SageMaker's Large Model Inference(LMI) container using DeepSpeed.

[ ]

Create SageMaker model, endpoint configuration and endpoint.

[ ]

Run Inference

[ ]

Clean Up

[ ]

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.