Using Stable Diffusion at different resolutions

Stable Diffusion is a revolutionary AI image generation model that can create highly realistic and diverse images from text prompts. Unlike previous AI art models, Stable Diffusion generates images through a diffusion process that iteratively adds detail while maintaining coherence. This enables it to produce intricate images across a wide range of artistic styles and subjects.

One key consideration when using Stable Diffusion is the impact of image resolution and aspect ratio on the generation results. The native resolution and aspect ratio that a Stable Diffusion model was trained on can significantly influence the quality, speed, and memory usage during image generation. This article provides best practices and expert tips for optimizing your use of Stable Diffusion at different resolutions.

Native Resolution of Stable Diffusion Models

The latest publicly available Stable Diffusion models have the following native resolutions:

  • SD 1.4 and SD 1.5: 512×512 pixels
  • SD 2.0 and SD 2.1: 768×768 pixels
  • SDXL: 1024×1024 pixels

Generating images above a model’s native resolution can cause visual artifacts, degraded image quality, and longer generation times. This is because the model has to extrapolate beyond what it was trained on, resulting in more errors and inconsistencies.

For best results, it’s generally recommended to stay close to the native resolution of whichever Stable Diffusion version you are using. However, there are techniques to mitigate quality loss at higher resolutions, which will be covered later in this article.

Recommended Resolutions and Aspect Ratios

Based on extensive testing and benchmarking, here are the recommended image resolutions and aspect ratios to use with each Stable Diffusion model:

SD 1.4 and SD 1.5:

  • 512×512 (1:1 aspect ratio)
  • 640×512 (4:3 aspect ratio)
  • 512×640 (3:4 aspect ratio)

SD 2.0 and SD 2.1:

  • 768×768 (1:1 aspect ratio)
  • 960×768 (4:3 aspect ratio)
  • 768×960 (3:4 aspect ratio)

SDXL:

  • 1024×1024 (1:1 aspect ratio)
  • 1280×1024 (4:3 aspect ratio)
  • 1024×1280 (3:4 aspect ratio)

These resolutions and aspect ratios strike the best balance between image quality, generation speed, and memory usage for each model.

When prompting Stable Diffusion, make sure to specify your desired image dimensions and aspect ratio within the prompt. For example:

A scenic landscape photograph, SD 1.5, 512x640 px

This helps the model optimize image generation for the given resolution and composition.

Tips for Higher Resolutions

Attempting to generate images above Stable Diffusion’s native resolutions can result in independent “cells” appearing across the image. This happens because errors accumulate in different regions rather than being diffused across the whole image.

Here are some tips to improve quality when going above native resolutions:

  • Enable the “highres fix” setting in Automatic1111’s Web UI. This helps diffuse information across the image to reduce cell artifacts.
  • Use upscaling to enlarge a natively generated image rather than having Stable Diffusion render higher resolutions directly. Upscaled images maintain better coherence and consistency.

For example, here is a comparison between a 768×768 image directly generated by SD 2.1 vs a 512×512 image upscaled to 768×768:

![Upscaled vs native high res]

The upscaled image on the right has fewer artifacts and better detail in some regions. Upscaling can be done as a post-process using AI image enhancers.

Optimizing for Speed and Memory Usage

Higher resolution generation places more demand on GPU memory (VRAM) and typically takes longer per image. Here are some settings to help optimize speed and memory usage:

  • Enable “Memory efficient attention” in Automatic1111. This reduces VRAM needs by around 25-50% with minimal quality loss.
  • Start with your desired end-resolution, then progressively scale down prompts that are slow or hit out-of-memory errors. Scaling down from 1024×1024 to 768×768 makes a big difference.
  • Compare generation times per prompt at different resolutions – tune this based on your priorities for quality vs speed.

As a benchmark, here are approximate generation times on an Nvidia RTX 3090:

| Resolution | Time per image |
|-|-|
| 512×512 | ~15 sec |
| 768×768 | ~25 sec |
| 1024×1024 | ~40 sec |

Conclusion

The key takeaways when leveraging Stable Diffusion at higher resolutions:

  • Stay close to the native resolution that your SD version was trained on
  • Enable the “highres fix” and upscale images post-generation if going above native res
  • Use memory-efficient attention to optimize VRAM usage
  • Benchmark prompts at different resolutions to find the optimal balance of quality and speed

By following these best practices, you can fine-tune Stable Diffusion to produce maximally detailed and coherent images tailored to your specific needs. Paying attention to resolution, aspect ratio, and memory usage unlocks this model’s full potential for creating high-fidelity AI art.