Sm Triton Python Stablediff
Deploy Stable Diffusion on a SageMaker GPU Multi-Model Endpoint with Triton
This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
In this notebook we will deploy multiple variations of Stable Diffusion on a SageMaker Multi-Model GPU Endpoint (MME GPU) powered by NVIDIA Triton Inference Server.
⚠ Warning: This notebook requires a minimum of an
ml.m5.largeinstance to build the conda environment required for hosting the Stable Diffusion models.
Skip to:
The models directory contains the inference code and the Triton configuration file for each of the Stable Diffusion models. In addition to these, we also need to download the pretrained model weights and save them to ther respective subdirectory within models directory. Once we have these downloaded, we can package the inference code and the model weights into a tarball and upload it to S3.
When using the Triton Python backend (which our Stable Diffusion model will run on), you can include your own environment and dependencies. The recommended way to do this is to use conda pack to generate a conda environment archive in tar.gz format, and point to it in the config.pbtxt file of the models that should use it, adding the snippet:
parameters: {
key: "EXECUTION_ENV_PATH",
value: {string_value: "path_to_your_env.tar.gz"}
}
You can use a different environment per model, or the same for all models (read more on this here). Since the all of the models that we'll be deploying have the same set of environment requirements, we will create a single conda environment and will use a Python backend to copy that environment into a location where it can be accessed by all models.
⚠ Warning: The approach for a creating a shared conda environment highlighted here is limited to a single instance deployment only. In the event of auto-scaling, there is no guarantee that the new instance will have the conda environment configured. Since the conda environment for hosting Stable Diffusion models is quite large the recommended approach for production deployments is to create shared environment by extending the Triton Inference Image.
Let's start by creating the conda environment with the necessary dependencies; running these cells will output a sd_env.tar.gz file.
Now we can create the environment using the above environment yaml spec
🛈 It could take up to 5 min to create the conda environment. Make sure you are running this notebook in an ml.m5.large instance or above
Now, we get the correct URI for the SageMaker Triton container image. Check out all the available Deep Learning Container images that AWS maintains here.
The next step is to package the model subdirectories and weights into individual tarballs and upload them to S3. This process can take a about 10 to 15 minutes.
We are now ready to configure and deploy the multi-model endpoint
Create a SageMaker endpoint configuration.
Create the endpoint, and wait for it to transition to InService state.
Prior to invoking any of the Stable Diffusion Models, we first invoke the setup_conda which will copy the conda environment into a directory that can be shared with all the other models. Refer to the model.py file in the models/setup_conda/1 directory for more details on the implementation.
Let's take the output from the Standard Model and modify it using the depth model.
We can use the same model to change the style of the original image into an oil panting or change the setting from New York City Central Park to the Yellowstone National Park while preserving the orientation of the original image
For the final example we will downsize our original output image from 512x512 to 128x128. We will then use the upscaling model to upscale the image back to its original 512 resolution
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.