Usually, when writing articles, there is a need for images, so I came to learn and record this open-source AI painting tool, Stable Diffusion.
What is Stable Diffusion#
Converts text information into image information through Prompts (hints/descriptions).
Stable Diffusion (SD) is an open-source AIGC painting model characterized by being open-source, fast, and rapidly updated.
How to Use#
Install GUI#
To facilitate usage, you need to install a WebGUI for SD first.
Installation link: https://github.com/AUTOMATIC1111/stable-diffusion-webui
There are two types of installation: one is to deploy it on Google’s google.colab (an online running environment), and the other is to run it locally on your own machine.
Local Installation Steps#
Since my computer is a Mac and uses an Apple Silicon chip, the following steps are for this type of machine only.
Fresh Installation
If you have not installed it before, you can do so via Homebrew.
If you don't have Homebrew, you can install it by entering this command in the terminal.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Before installing the WebUI, you need to prepare the running environment. Open the terminal.
1. Install Python version 3.10 or above
2. Pull the WebUI code from the GitHub repository
Pulling the code allows SD to update in real-time and use the latest features.
In any directory, run the following command:
After pulling, the directory looks like this:
3. Download the Stable Diffusion model
I downloaded the newer model, version 2.1. The common formats for models are ckpt
and safetensors
. Download link: https://huggingface.co/stabilityai/stable-diffusion-2-1
Place the downloaded model in the stable-diffusion-webui/models/Stable-diffusion
directory that you just pulled.
Since version 2.1 also requires a configuration file, the method to download the configuration file is:
Download link: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon#downloading-stable-diffusion-models
Hold down the option key on the keyboard and click the mouse on here
to download.
The downloaded file is named: v2-inference-v.yaml
.
Then we need to rename this file to match the name of the downloaded model. My model name is v2-1_768-ema-pruned.ckpt
, so the configuration file name needs to be changed to v2-1_768-ema-pruned.yaml
.
4. Execute the script to run the web UI
During the execution of the script, the necessary dependency files will be automatically downloaded. This may take a while, so please be patient; it usually takes between half an hour to 2 hours.
Once the access address appears, you have succeeded: http://127.0.0.1:7860/
.
After success, do not close or stop the terminal; directly access this address in your browser: http://127.0.0.1:7860/.
In the future, every time you open it, just execute the webui.sh
script. If you want to update, just execute git pull
in the root directory.
Special Case Handling#
When generating images, you may encounter the following error:
If you encounter the same error, you can resolve it as follows:
Open the webui-user.sh
file in the root directory.
Modify the COMMANDLINE_ARGS
parameter as follows:
- Re-execute
./webui.sh
. - Finally, check the option "Upcast cross attention layer to float32" in the settings under Stable Diffusion to run normally.
Create Your First Painting#
Generate Images from Text#
Before drawing, first get to know what each part of this interface is.
Several important parameters in the interface:
- Sampling Steps: This parameter affects time and effect, usually set around 30, mainly controlling the degree of denoising.
- Seed: Determines the content of the image, mainly affecting the random noise during the image iteration.
- CFG Scale: Determines the artist's degree of freedom.
- 2 ~ 6: Random generation, basically not following the prompts.
- 7 ~ 10: The most common setting, providing a good balance.
- 10 ~ 15: Requires the prompt to be very good and specific, and above 10, the saturation will increase.
Generate Images from Images#
Generate images based on prompts + images.
Similarly, let's take a look at some settings in this interface. Scroll down to find the settings area.
The most common use of image-to-image is: to change the style of an image.
Image Extension#
Generate images based on prompts + MASK (mask) + images.
Common scenarios: removing watermarks, changing outfits, extending image boundaries.
How to Write Prompts#
Keywords#
Separate different features with commas#
Separate similar features with |#
Adjusting Weights#
If you want to adjust the proportion or weight of a certain feature in the image, you can do so as follows:
(prompt: weight value)
- Value < 1: Weaken weight.
- Value > 1: Strengthen weight.
Gradient Effects#
If you want the image to have a gradient, you can do it like this:
[keyword 1:keyword2]
Alternating Fusion#
If you want one half of the image to have one style and the other half to have another style, you can do it like this:
[keyword|keyword]
Reinforcement Effects#
Add high-quality keywords, such as: best quality, masterpiece.
Adding Inverse Words#
Common inverse words:
nsfw
: Pornographic or violent.bad face
: Bad face.
Reinforcing Shapes#
Control the emphasis on the overall shape of the image: such as whether it is a full body shot.
- Lighting
- cinematic lighting
- dynamic lighting
- Gaze
- looking at viewer
- looking at another
- looking away
- looking back
- looking up
- Art Style
- sketch
- Perspective
- dynamic angle
- from above
- from below
- wide shot
Combining with ChatGPT#
Advanced Play#
Different Models#
- Common model search and download sites:
Commonly used models in the market:
- Anime style model
- Traditional Chinese style model
- GuoFeng3: A model with a gorgeous ancient Chinese style, featuring a 2.5D texture.
- Midjourney style
- Dreamlike Diffusion 1.0: Particularly vibrant colors and a flashy art style.
- Realistic style
You can download these models and place them in the stable-diffusion-webui/models/Stable-diffusion
directory. Click the refresh button next to the model selection on the page to use these models.