Usually, when writing articles, there is a need for images, so I came to learn and record this open-source AI painting tool, Stable Diffusion.
What is Stable Diffusion#
Converts text information into image information through Prompts (hints/descriptions).
Stable Diffusion (SD) is an open-source AIGC painting model characterized by being open-source, fast, and rapidly updated.
How to Use#
Install GUI#
To facilitate usage, you need to install a WebGUI for SD first.
Installation link: https://github.com/AUTOMATIC1111/stable-diffusion-webui
There are two types of installation: one is to deploy it on Google’s google.colab (an online running environment), and the other is to run it locally on your own machine.
Local Installation Steps#
Since my computer is a Mac and uses an Apple Silicon chip, the following steps are for this type of machine only.
Fresh Installation
If you have not installed it before, you can do so via Homebrew.
If you don't have Homebrew, you can install it by entering this command in the terminal.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Before installing the WebUI, you need to prepare the running environment. Open the terminal.
1. Install Python version 3.10 or above
brew install cmake protobuf rust [email protected] git wget
2. Pull the WebUI code from the GitHub repository
Pulling the code allows SD to update in real-time and use the latest features.
In any directory, run the following command:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
After pulling, the directory looks like this:
3. Download the Stable Diffusion model
I downloaded the newer model, version 2.1. The common formats for models are ckpt
and safetensors
. Download link: https://huggingface.co/stabilityai/stable-diffusion-2-1
Place the downloaded model in the stable-diffusion-webui/models/Stable-diffusion
directory that you just pulled.
Since version 2.1 also requires a configuration file, the method to download the configuration file is:
Download link: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#downloading-stable-diffusion-models
Hold down the option key on the keyboard and click the mouse on here
to download.
The downloaded file is named: v2-inference-v.yaml
.
Then we need to rename this file to match the name of the downloaded model. My model name is v2-1_768-ema-pruned.ckpt
, so the configuration file name needs to be changed to v2-1_768-ema-pruned.yaml
.
4. Execute the script to run the web UI
# Go to the root directory
cd stable-diffusion-webui
# Run the script
./webui.sh
During the execution of the script, the necessary dependency files will be automatically downloaded. This may take a while, so please be patient; it usually takes between half an hour to 2 hours.
Once the access address appears, you have succeeded: http://127.0.0.1:7860/
.
After success, do not close or stop the terminal; directly access this address in your browser: http://127.0.0.1:7860/.
In the future, every time you open it, just execute the webui.sh
script. If you want to update, just execute git pull
in the root directory.
Special Case Handling#
When generating images, you may encounter the following error:
NansException: A tensor with all NaNs was produced in Unet. This could be either because there's not enough precision to represent the picture, or because your video card does not support half type. Try setting the "Upcast cross attention layer to float32" option in Settings > Stable Diffusion or using the --no-half commandline argument to fix this. Use --disable-nan-check commandline argument to disable this check.
If you encounter the same error, you can resolve it as follows:
Open the webui-user.sh
file in the root directory.
Modify the COMMANDLINE_ARGS
parameter as follows:
COMMANDLINE_ARGS="--lowvram --precision full --no-half --skip-torch-cuda-test"
- Re-execute
./webui.sh
. - Finally, check the option "Upcast cross attention layer to float32" in the settings under Stable Diffusion to run normally.
Create Your First Painting#
Generate Images from Text#
Before drawing, first get to know what each part of this interface is.
Several important parameters in the interface:
- Sampling Steps: This parameter affects time and effect, usually set around 30, mainly controlling the degree of denoising.
- Seed: Determines the content of the image, mainly affecting the random noise during the image iteration.
- CFG Scale: Determines the artist's degree of freedom.
- 2 ~ 6: Random generation, basically not following the prompts.
- 7 ~ 10: The most common setting, providing a good balance.
- 10 ~ 15: Requires the prompt to be very good and specific, and above 10, the saturation will increase.
Generate Images from Images#
Generate images based on prompts + images.
Similarly, let's take a look at some settings in this interface. Scroll down to find the settings area.
The most common use of image-to-image is: to change the style of an image.
Image Extension#
Generate images based on prompts + MASK (mask) + images.
Common scenarios: removing watermarks, changing outfits, extending image boundaries.
How to Write Prompts#
Keywords#
Separate different features with commas#
a girl, long hair, twin ponytails
Separate similar features with |#
a girl, long black | grey hair, twin ponytails
Adjusting Weights#
If you want to adjust the proportion or weight of a certain feature in the image, you can do so as follows:
(prompt: weight value)
- Value < 1: Weaken weight.
- Value > 1: Strengthen weight.
a girl, long hair, (twin ponytails:0.2)
Gradient Effects#
If you want the image to have a gradient, you can do it like this:
[keyword 1:keyword2]
a girl, long [white:black:0.5] hair, (twin ponytails:0.2)
Alternating Fusion#
If you want one half of the image to have one style and the other half to have another style, you can do it like this:
[keyword|keyword]
[cat|dog] in the field
Reinforcement Effects#
Add high-quality keywords, such as: best quality, masterpiece.
best quality, ultra-detailed, masterpiece, finely detail, high res, 8K wallpaper, a girl, long hair, (twin ponytails:0.2)
Adding Inverse Words#
Common inverse words:
nsfw
: Pornographic or violent.bad face
: Bad face.
Reinforcing Shapes#
Control the emphasis on the overall shape of the image: such as whether it is a full body shot.
- Lighting
- cinematic lighting
- dynamic lighting
- Gaze
- looking at viewer
- looking at another
- looking away
- looking back
- looking up
- Art Style
- sketch
- Perspective
- dynamic angle
- from above
- from below
- wide shot
Photo of real life girl, cinematic lighting, peeking from window, vibrant colors, bokeh, movie poster style
Combining with ChatGPT#
Boundless prairies, Verdant meadows, Azure skies and white clouds, Rushing rivers, Wildflowers aplenty, Majestic sunsets
Advanced Play#
Different Models#
- Common model search and download sites:
Commonly used models in the market:
- Anime style model
- Traditional Chinese style model
- GuoFeng3: A model with a gorgeous ancient Chinese style, featuring a 2.5D texture.
- Midjourney style
- Dreamlike Diffusion 1.0: Particularly vibrant colors and a flashy art style.
- Realistic style
You can download these models and place them in the stable-diffusion-webui/models/Stable-diffusion
directory. Click the refresh button next to the model selection on the page to use these models.