Amazon Web Services Deploy Clip Model On Sagemaker

Deploy Clip Model On Sagemaker

data-scienceinferencearchivedamazon-sagemaker-exampleslab13-clip-interrogatorreinforcement-learningmachine-learningWorkshopsawsexamplesdeep-learningsagemakerjupyter-notebooktrainingmlops

alph-notebooks/amazon-sagemaker-examples / deploy-clip-model-on-sagemaker.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Deploy Clip model on SageMaker

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

In this notebook, we will deploy a CLIP model on the SageMaker endpoint with DJLServing container image.

Setup

[ ]

Prepare inference script and container image

[ ]

In this notebook, we will store the model artifacts on S3 and load the model directly from S3

The Large Model Inference (LMI) container uses s5cmd to download data from S3 which significantly reduces the speed when loading model during deployment. Therefore, we recommend to load the model from S3 by following the below section to download the model from Hugging Face and upload the model on S3. Note that, if you choose to load the model directly from Hugging Face during model deployment, you can prepare the model tarbal file and upload to S3 without downloading the model locally.

Download the model from Hugging Face and upload the model artifacts on Amazon S3

If you intend to download your copy of the model and upload it to a s3 location in your AWS account, please follow the below steps, else you can skip to the next step.

[ ]

Please make sure the file is downloaded correctly by checking the files exist in the newly created folder blip2-model/models--Salesforce--<model-name>/snapshots/... before running the below cell.

[ ]

SageMaker Large Model Inference containers can be used to host models without providing your own inference code. This is extremely useful when there is no custom pre-processing of the input data or post-processing of the model's predictions.

However, in this notebook, we demonstrate how to deploy a model with custom inference code.

SageMaker needs the model artifacts to be in a Tarball format. In this example, we provide the following files - serving.properties, model.py, and requirements.txt.

serving.properties is the configuration file that can be used to indicate to DJL Serving which model parallelization and inference optimization libraries you would like to use. Depending on your need, you can set the appropriate configuration. For more details on the configuration options and an exhaustive list, you can refer the documentation.
model.py is the script handles any requests for serving.
requirements.txt is the text file containing any additional pip wheel need to install.

If you want to download the model from huggingface.co, you can set option.model_id. The model id of a pretrained model hosted inside a model repository on huggingface.co (https://huggingface.co/models). The container uses this model id to download the corresponding model repository on huggingface.co. If you set the model_id to a s3 url, the DJL will download the model artifacts from s3 and swap the model_id to the actual location of the model artifacts. In your script, you can point to this value to load the pre-trained model.

option.tensor_parallel_degree: Set to the number of GPU devices over which the model needs to be partitioned. This parameter also controls the number of workers per model which will be started up when DJL serving runs. As an example if we have a 8 GPU machine, and we are creating 8 partitions then we will have 1 worker per model to serve the requests.

[ ]

Prepare the model tarball file and upload to S3

[ ]

Deploy model

[ ]

Test Inference Endpoint

[ ]

[18]

Image captioning

Let's ask our model to classify the objects appearing in the above image

[22]

['cats']

Clean up

Uncomment the below cell to delete the endpoint and model when you finish the experiment

[ ]

Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.