Key Highlights of Stable Diffusion 3.5 Release

Images

November 6th, 2024

Key Highlights of Stable Diffusion 3.5 Release

Today marks the release of Stable Diffusion 3.5, featuring multiple model variants including Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo, and as of October 29th, Stable Diffusion 3.5 Medium.

These models are designed to be highly customizable, optimized for consumer hardware, and available for free under the permissive Stability AI Community License for both commercial and non-commercial use. You can access the Stable Diffusion 3.5 models on Hugging Face, and the inference code is now available on GitHub.

In June, we introduced Stable Diffusion 3 Medium, our initial release in the Stable Diffusion 3 series. While it was our first step, it didn’t fully meet our goals or community expectations. Taking this feedback seriously, we have since dedicated time to refining a more robust version that advances our goal of transforming visual media.

Stable Diffusion 3.5 represents our dedication to providing creators and developers with powerful, accessible tools at the forefront of technology. We encourage all users—whether in fine-tuning, LoRA applications, optimization, or artwork creation—to freely distribute and monetize their work across the pipeline.

What’s New in Stable Diffusion 3.5?

Stable Diffusion 3.5 offers a selection of models built to meet the diverse needs of researchers, hobbyists, startups, and enterprises:

Stable Diffusion 3.5 Large: With 8.1 billion parameters, this base model delivers superior quality and prompt adherence, ideal for professional applications up to a resolution of 1 megapixel.
Stable Diffusion 3.5 Large Turbo: This streamlined version of the Large model produces high-quality images with strong prompt adherence in only four steps, offering faster performance than the standard 3.5 Large model.
Stable Diffusion 3.5 Medium: With 2.5 billion parameters and an upgraded MMDiT-X architecture, this model runs smoothly on consumer hardware and offers a balance of quality and customization. It supports image resolutions between 0.25 and 2 megapixels, making it versatile for a wide range of creative tasks.

Development Insights

In developing these models, customizability was our core focus, allowing users a flexible foundation for further development. The addition of Query-Key Normalization in the transformer blocks has helped stabilize the training process and made fine-tuning easier.

To achieve this flexibility, we made a few calculated trade-offs. Output variation may increase when using the same prompt with different seeds—an intentional choice to maintain a rich, diverse knowledge base in the models. However, prompts with low specificity may result in more varied outputs and fluctuations in aesthetic quality.

Specific architectural and training adjustments were applied to the Medium model to improve quality, coherence, and the ability to generate images across multiple resolutions.

Where Stable Diffusion 3.5 Shines

Stable Diffusion 3.5 is designed to be one of the most adaptable and accessible image generation models available, while excelling in prompt adherence and image quality:

Customizability: This model is easy to fine-tune for specific creative needs or to support custom workflows and applications.
Efficient Performance: Optimized for consumer hardware, Stable Diffusion 3.5 models, especially the Medium and Large Turbo versions, are compatible with most consumer GPUs without heavy resource demands.

In our tests, the Stable Diffusion 3.5 Medium model requires only 9.9 GB of VRAM (excluding text encoders), making it compatible with a wide range of consumer-grade GPUs. Stable Diffusion 3.5 Large, on the other hand, leads the market in prompt adherence and holds its own in image quality against much larger models.

Stable Diffusion 3.5 Large Turbo also stands out with some of the fastest inference times available for models of its size, providing competitive image quality and prompt adherence, even when compared to larger non-distilled models.

Lastly, Stable Diffusion 3.5 Medium excels among medium-sized models, offering a great balance of prompt adherence and image quality, making it an ideal choice for those who prioritize both efficiency and high-quality results.