Below is the easiest way to get up and running on A1111 and Stable Diffusion XL without switching weights:
Stable Diffusion has rolled out its XL weights for its Base and Refiner model generation:
Just so you’re caught up in how this works, Base will generate an image from scratch, and then run through the Refiner weights to uplevel the detail of the image. It’s a new concept, to first create a low res image then upscale it with a different model. These models are 7GB and 6GB respectively (you’ll need to download both twice because of the new 0.9 VAE version which fixes some stuff).
To use both of these, first clone the repositories locally, you can use the following commands for both in terminal:
git lfs install
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
and then:
git lfs install
git clone https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0
These are both the repos for base and refiner. Next up, you’ll probably ask about A1111 (aka Automatic 1111). You’ll need to update this too to 1.5.1 or higher, if you haven’t already, do a git pull or install a fresh copy
Copy both the safetensors (I chose 0.9VAE) and you’re off to the races. I have heard the full VAE version can get “noisy”. You can also run with --no-half-vae to fix this or —medram or —lowram for smaller GPU sizes.
However, one thing is loading these weights can take a minute or more for each. The workflow is also not intuitive, you have to load the base model on txt2img, then load the refiner model img2img. On the surface, this is what you should NOT be doing.
A1111 has now updated to support multiple models:
In theory, you can run a git checkout hires_checkpoint, but another method is using the Refiner extension: https://github.com/wcde/sd-webui-refiner
Judging by all the methods, this is the easiest!
All you need to do is go to your extensions tab and add this URL and restart. When you run your web-ui make sure there is no parameters (unless you need to reduce RAM). Here’s my settings on getting it to work on the UI:
Now you’re off to the races, this doesn’t require model swapping and it only takes a few seconds for each image to be produced:
Here’s a picture of a knight, the sword does look off, however it looks pretty realistic compared to Stable Diffusion 1.5:
a super futuristic sports car
a female marvel action hero, this is great detail
Joe Biden in a boxing outfit, notice it almost got his name right on his belt:
Here is the potential election for 2024:
Okay, that’s enough fun for one night.
This is only the beginning, soon all the rest of the extensions, particularly ControlNet, will eventually be XL compatible. We will also get new fine tuned models that are more purpose driven and even more photo realistic than this base model.
I hope these instructions make it pretty clear on how to get set up with Stable Diffusion XL in the most efficient way possible. You won’t have to run any complicated workflows and it will only take a few seconds to generate images on a decent GPU. From the initial try outs, it looks like this is getting pretty close to Midjourney, however I will still need to play with this a lot more to really get a sense of what it can do. Until then I hope to see everyone push the limits of XL and give me and the rest of the community tips on how to best utilize it!