Kohya Ss Fine Tuning
Stable Diffusion XL Fine-Tuning with Kohya SS
This solution creates all the necessary components to get you started quickly with fine-tuning Stable Diffusion XL with a custom dataset, using a custom training container that leverages Kohya SS to do the fine-tuning. Stable Diffusion allows you to generate images from text prompts. The training is coordinated with an Amazon SageMaker pipeline and an Amazon SageMaker Training job. This solution automates many of the tedious tasks you must do to set up the necessary infrastructure to run your training. You will use this Notebook to set up the solution. For a general overview of the solution components, see the README file.
This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.
Step One - Create the necessary resources through AWS CloudFormation
This solution has been automated using an AWS CloudFormation template, located in this project directory. You may either run it through the AWS console, or by the CLI command below. In the template.yml file, you may update the "KOHYA_SS_VERSION" environment variable to use a specific version of Kohya SS, otherwise it will use v22.6.2.
Option 1: To run the AWS CloudFormation template via the AWS console, follow the steps below:
Note: Your user that is logged into the AWS Console must have the appropriate permissions to execute the stack.
- Navigate to the AWS CloudFormation console and click "Create Stack", then select "With new resources (standard)".
- Select "Upload a template file" and click "Choose file". Select the template.yml file located in this project directory and click "Next".
- Enter a stack name. Modify the resource name parameters if required, or leave the defaults. Click "Next". On the next page, click "Next" again.
- Scroll to the bottom of the page. In the "Capabilities and transforms" section, acknowledge the three checkbox items to confirm potential IAM updates.
- Click "Submit" to create the stack.
Option 2: To run the AWS CloudFormation template with all defaults via the AWS CLI v2, run the next command:
Note: Your SageMaker Execution role must have permissions to execute AWS CloudFormation commands, certain IAM permissions, S3 permissions, etc. If you are using the CLI and run into permission errors, you must update your Sagemaker Execution role and then continue the process.
Wait for the AWS CloudFormation stack to finish creating before moving on to the next step. You may check the status of the stack creation in the AWS CloudFormation console. This step takes about 2 minutes to complete.
Step Two - Upload the fine-tuning configuration file, and your custom images to the S3 bucket
The next step is to upload the following to the S3 Bucket that was created as part of Step One:
Kohya SS SDXL Configuration File: this .toml file is used to define the fine-tuning configuration parameters (instead of using the Kohya GUI)
Custom Image Assets: You will need to provide a set of images for the fine-tuning process, which you will upload to the S3 Bucket
The structure of the S3 Bucket is intended to be the following:
bucket/0001-dataset/kohya-sdxl-config.toml
bucket/0001-dataset/<asset-folder-name>/ (images and caption files go here)
bucket/0002-dataset/kohya-sdxl-config.toml
bucket/0002-dataset/<asset-folder-name>/ (images and captions files go here)
...
The "asset-folder-name" must be named properly for the fine-tuning to be successful. This format will be described in the Asset Upload section below. Note that each "xxxx-dataset" prefix may contain separate datasets, with different config file contents. Do not change the "kohya-sdxl-config.toml" file name. If you change it, you will also have to change the file name in the "train" file. The config file and asset folder will be downloaded by the SageMaker Training job during the training process.
Keep in mind that whatever name you specify for "xxxx-dataset", will be the same parameter name you will specify when launching the SageMaker Pipeline, so it knows which files to pull.
To upload the config file to the S3 Bucket, run the next command after you confirm the bucket name is correct:
Now, you will upload your custom image assets to the same S3 Bucket. You will create an asset folder, and upload your images and caption files in that prefix in S3.
The "asset-folder-name" must be named properly, according to the Kohya SS guidelines. This naming convention is what defines the number of repetitions and the trigger word for the prompt.
For example, a folder name of "60_dwjz" signifies 60 repetitions with the trigger prompt word of "dwjz". Name this prefix in Amazon S3 properly according to your requirements, and manually upload your images to this prefix directory. You may change the number of repetitions, the trigger word, etc. It's a good idea to change the "output_name" parameter in the kohya-sdxl-config.toml file to contain your trigger word. At the end of your upload, your S3 structure should look like this:
bucket/0001-dataset/kohya-sdxl-config.toml
bucket/0001-dataset/60_dwjz/
bucket/0001-dataset/60_dwjz/1.jpg
bucket/0001-dataset/60_dwjz/1.caption
bucket/0001-dataset/60_dwjz/2.jpg
bucket/0001-dataset/60_dwjz/2.caption
...
The *.jpg files are your image assets. The *.caption files are your captions that help the model understand your prompts. The 1.caption file will contain a prompt that describes the image in 1.jpg, such as "dwjz wearing a vest and sunglasses, serious facial expression, headshot view".
You must upload your assets before continuing with the next steps. Caption files are optional, but encouraged.
To become more familiar with Kohya SS fine-tuning, visit the references here: https://github.com/bmaltais/kohya_ss. There are many variables to fine-tuning, and currently no accepted single pattern for generating great results. To ensure good results, ensure you have enough steps in the training, as well as good resolution assets, and make sure to have enough images.
Step Three - Upload the necessary code to the AWS CodeCommit repository
The code required for this solution is in the "code" directory of this project. In the next step, these files will be uploaded to the AWS CodeCommit repository that was created by the AWS CloudFormation template. This repository contains the code required to build the custom training container. Any updates to the code in this repository will trigger the container image to be built and pushed to ECR (i.e. through an EventBridge rule). Once you run the next steps, it will kick off the process that creates the training container image. This step takes about 15 minutes to complete.
- The "buildspec.yml" file creates the container image by leveraging the GitHub repository for Kohya SS, and pushes the training image to ECR
- The "Dockerfile" file is used to override the Dockerfile in the Kohya SS project, enabling it for use with SageMaker Training
- The "train" file calls the Kohya SS program to do the fine-tuning, and is invoked when the SageMaker Training job kicks off
To copy these files to the AWS CodeCommit repository, run the next commands:
Step Four - Initiate the Amazon SageMaker Pipeline to start training
Note: If you are running through this Notebook for the first time, you must ensure the previous step has finished uploading the container image to ECR before you continue. Every time you make a code change in the AWS CodeCommit repository, you must wait until the CodeBuild job completes so that it pushes the newest container image to ECR for use by this next step.
To run a SageMaker pipeline, navigate to SageMaker Studio and follow the steps below:
- In the left navigation pane, click the Home button, and click "Pipelines".
- Navigate to the pipeline named "kohya-ss-fine-tuning-pipeline" and click it.
- Click "Create execution". Then enter a name for the execution.
- Update the parameter values if necessary, and click "Start" to execute the pipeline.
- As the pipeline is running, you may view the logs by clicking the step in the pipeline, and clicking the "Logs" button. You may also view related details in the SageMaker Training job console.
- Wait for the pipeline to complete.
Parameters:
-InputS3DatasetLocation: the S3 prefix containing the training resources (e.g. sagemaker-kohya-ss-fine-tuning-<aws-account-id>/0001-dataset)
-OutputS3ModelLocation: where the resulting model will be output (e.g. sagemaker-kohya-ss-fine-tuning-<aws-account-id>/model-outputs)
-TrainingDockerImage: the latest ECR image tag
-TrainingInstanceType: the instance type to run the training on
-TrainingVolumeSizeInGB: the volume size of the training instance
-MaxTrainingRuntimeInSeconds: the maximum time the training is allowed to run
For training that will require many epochs/steps, also consider updating the MaxTrainingRuntimeInSeconds (currently set for 24 hours). The number of total steps is affected by the number of repetitions (ie the number in the asset folder name), the number of images, the number of epochs, and batch size. The more steps, the longer the training. You might also consider different instance types and volume sizes if your use case requires it.
Step Five - Inference
Once training is complete, the SageMaker Pipeline will show green for Status. Alternatively, you may also view job details in the SageMaker Training job console. By clicking on the SageMaker Pipeline step labeled "TrainNewFineTunedModel", you can view input/output details as well as logs. The Output tab shows where in S3 the output model has been uploaded to.
In future iterations of this solution, a custom inference container may be created to run inference using this fine-tuned model. For now, we may use other tools to run inference. The Automatic1111 Stable Diffusion Web UI is a GUI that allows you to run inference on your models locally.
- Create an Amazon EC2 Windows instance and Connect to it using the instructions here. I chose to use a Windows Server 2022 Base Amazon Machine Image, a g5.8xlarge instance type, and 100GiB of storage. Alternatively, you may use your local machine.
- Install NVIDIA drivers using this guide to enable the GPU.
- Install Automatic1111 Stable Diffusion Web UI using the instructions here. This solution has been tested with version 1.9.3. The last step of installation will ask you to run webui-user.bat, which will launch the Stable Diffusion Web UI in a web browser.
- Download the Stable Diffusion XL 1.0 Base model from Hugging Face. Move the downloaded file sd_xl_base_1.0.safetensors to the directory ../stable-diffusion-webui/models/Stable-diffusion/. Scroll to the bottom of the page and click Reload UI. Select sd_xl_base_1.0.safetensors from the Stable Diffusion checkpoint dropdown at the top of the page.
- Adjust the default Width and Height to 1024 x 1024. Experiment with the remaining parameters to achieve your desired result. Specifically, try adjusting the Sampling method, Sampling steps, CFG Scale, and Seed.
- The input prompt is extremely important to achieve great results. Extensions may be added to assist with your creative workflow. This style selector extension is great at supplementing prompts. To install, navigate to the Extensions tab, select Install from URL, enter the style selector extension URL, and click Install. Reload the UI for changes to take effect. You will notice a new section SDXL Styles which you may select from to add to your prompts.
- Download the fine-tuned model that was created by the Amazon SageMaker Pipeline training step. The model is stored in Amazon S3 with the filename model.tar.gz. This is the OutputS3ModelLocation parameter from the Pipeline.
- Unzip the contents of the model.tar.gz file, and copy the custom_lora_model.safetensors LoRA model file to the directory ../stable-diffusion-webui/models/Lora. Click the Refresh icon on the Lora tab to verify that your custom_lora_model is available.
- Click custom_lora_model, and it will populate the prompt input box with the text <lora:custom_lora_model:1>. Append a prompt to the text (see examples below). Note that you may decrease/increase the multiplier of your LoRA model by changing the "1" value. This adjusts the influence of your LoRA model accordingly. Click Generate to run inference against your fine-tuned LoRA model.
Note: This example demonstrates LoRA fine-tuning. We trained a LoRA model in the previous steps by specifying the LoRA network type in the configuration file.
Example Results
Results generated from a model trained on ~30 high-resolution images of myself.
Prompt: concept art <lora:custom_lora_model:1.0> aallzz professional headshot, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, 8k, cinemascope, moody, epic, gorgeous, digital artwork, illustrative, painterly, matte painting
Negative Prompt: photo, photorealistic, realism, anime, abstract, glitch
Sampler: DPM2
Sampling Steps: 90
CFG Scale: 8.5
Width/Height: 1024x1024
Prompt: cinematic film still <lora:custom_lora_model:1> aallzz eating a burger, cinematic, bokeh, dramatic lighting, shallow depth of field, vignette, highly detailed, high budget, cinemascope, moody, epic, gorgeous, film grain, grainy
Negative Prompt: anime, cartoon, graphic, painting, graphite, abstract, glitch, mutated, disfigured
Sampler: DPM2
Sampling Steps: 70
CFG Scale: 8
Width/Height: 1024x1024
Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, character, mountain background, sun backlight, digital artwork, illustrative, painterly, matte painting, highly detailed
Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses
Sampler: DPM2
Sampling Steps: 100
CFG Scale: 9
Width/Height: 1024x1024
Prompt: concept art <lora:custom_lora_model:1> aallzz 3D profile picture avatar, vector icon, vector illustration, vector art, realistic cartoon character, professional attire, digital artwork, illustrative, painterly, matte painting, highly detailed
Negative Prompt: photo, photorealistic, realism, glitch, mutated, disfigured, glasses, hat
Sampler: DPM2
Sampling Steps: 100
CFG Scale: 10
Width/Height: 1024x1024
Prompt: cinematic photo <lora:custom_lora_model:1> aallzz portrait, sitting, magical elephant with large tusks, wearing safari clothing, majestic scenery in the background, river, natural lighting, 50mm, highly detailed, photograph, film, bokeh, professional, 4k, highly detailed
Negative Prompt: drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, glitch, mutated, disfigured, glasses, hat
Sampler: DPM2
Sampling Steps: 100
CFG Scale: 9.5
Width/Height: 1024x1024
Cleaning up
To avoid incurring future charges, delete the resources created as part of this solution.
- Delete objects in your Amazon S3 bucket. You must delete objects before deleting the stack.
- Delete your container image in Amazon ECR. You must delete the image before deleting the stack.
- Use the AWS CloudFormation console to delete the stack named "kohya-ss-fine-tuning-stack".
- If you created an Amazon EC2 instance for running inference, stop or terminate the instance.
- Stop or delete your Amazon SageMaker Studio instances, applications, and spaces.
Congratulations!
You have successfully fine-tuned a custom SDXL model, and ran inference on it!
Appendix
The Kohya configuration .toml file
This file contains the config values that are fed into the Kohya program for training. If you change the config filename, you must also change it in the "train" file. This configuration is not specific to just Stable Diffusion XL. It's flexible to apply to other pre-trained models (however, if you modify the config file to apply to other models, also change the entrypoint file in the "train" file, as it currently points to "sdxl_train_network.py"). The configuration instance contained in this sample repository is one possible configuration for SDXL. This is the reason that some parameters are commented out - because they are either optional for SDXL, or don't apply to SDXL. There is currently no consensus for optimal parameter values. You will need to try different permutations of the configuration and compare your output model. This is a good starting point as to what the parameters mean: https://github.com/bmaltais/kohya_ss/wiki/LoRA-training-parameters
To give you some initial direction, try modifying these hyperparameters first:
- learning_rate
- text_encoder_lr
- unet_lr
- optimizer_type
- network_dim
- the number of repetitions (you set this by the asset folder name prefix, e.g. the 60 in 60_dwjz_man signifies the number of repetitions)
Please note that some config parameters rely on underlying hardware/GPU type (e.g. mixed_precision=bf16, xformers, etc). You must ensure that your training instance has the proper hardware configuration.
Solution enhancements
There are a few enhancements that may be made to the Kohya component, to allow for the following. These are currently not enabled.
- Sampling. Support may be added for adding sampling, which outputs images regularly during the training process. This variable is specified by the "sample_*" parameters in the configuration file.
- Regularization. Support may be added for adding regularization images in a specific directory. This variable is specified by the "reg_data_dir" parameter in the configuration file.
- Captions. Support may be added for auto-generating caption files for the images before training. Currently, you must manually add caption files to the S3 directory.
AWS CloudFormation template enhancements
PERMISSIONS: Consider restricting the permissions for the SageMakerServiceRole. Currently, it uses Administrator permissions.
INFERENCE: This solution outputs a model to be used for inference, but does not automate the inference component. Enhancements may be made to build a custom inference container using this fine-tuned model.
SAGEMAKER: Consider restricting internet access and instead using AWS PrivateLink, as well as using SageMaker inside a VPC.
Notebook CI Test Results
This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.




