Resolution guide for image and video generation

Caution

This is very much a WIP. Not all models or resolutions are listed yet, and there are placeholder "TODO" scattered in the document. It will be updated as soon as possible.

Introduction

This documents serves as a reference cheatsheet for choosing the right resolution for the most common image and video generation AI models.

While most visual generation models support any resolution in theory, their training procedure constrains the resolution of generated images / videos to certain values only. They are the resolutions the models were trained on and that are natively supported, meaning the generation quality will be optimal.

For each model, we give a list of contrainsts the generation resolutions must follow and/or a table listing all known optimal resolution alongside other info such as aspect ratio or video length if necessary. For each model, we provide the sources.

Note

The tables are as exhausive as possible but might miss some values or you might need a new model to be added to the list, so feel free to request a change (please provide appropriate sources).

Notation

In the rest of this document, resolutions are given in ComfyUI Tensor order, meaning: heigh, width, frame count.

We denote the width, the height and the frame count respectively with the letters w, h and f. The resolutions are thus often noted h×w×f (e.g. 1280×720×129). If only two values are provided (for images or for videos), they are indicating the spatial resolution: h×w.

Image generation

Stable Diffusion 1.5 (SD1.5)

TODO

Stable Diffusion XL (SDXL)

Height	Width	Aspect Ratio
512	2048	1:4
512	1984	1:3.88
512	1920	1:3.75
512	1856	1:3.62
576	1792	9:16
576	1728	1:3
576	1664	2:7
640	1600	2:5
640	1536	2:4.8
704	1472	3:6.3
704	1408	1:2
704	1344	4:7
768	1344	4:7
768	1280	3:5
832	1216	2:3
832	1152	2:2.77
896	1152	4:5
896	1088	4:5
960	1088	6:7
960	1024	15:16
1024	1024	1:1
1024	960	16:15
1024	576	16:9
1088	960	9:8
1088	896	17:14
1152	896	4:3
1152	832	11:8
1216	832	3:2
1216	832	3:2
1280	768	5:3
1344	768	7:4
1408	704	2:1
1472	704	21:10
1536	640	12:5
1600	640	5:2
1664	576	2.89:1
1728	576	3:1
1792	576	3.11:1
1856	512	3.62:1
1920	512	3.75:1
1984	512	3.88:1
2048	512	4:1
576	1024	9:16

These are the training resolution, natively supported by the model, they are visualized in the plot bellow. Most used resolution for inference are in bold. Note that finetuned checkpoints by have different recommended resolutions.

Sources:

https://arxiv.org/pdf/2307.01952 (Appendix I.)

Stable Diffusion 3 (SD3)

TODO

Stable Diffusion 3.5 (SD3.5)

TODO

Flux.1

TODO

HiDream-l1

Height	Width	Aspect ratio
768	1360	85:48 (16:9)
832	1248	3:2
800	1168	73:50 (4:3)
1024	1024	1:1
1168	800	50:73 (3:4)
1248	832	2:3
1360	768	48:85 (9:16)

Sources:

https://github.com/HiDream-ai/HiDream-I1/blob/main/inference.py#L41

Video generation

LTX-Video

General resolution constraints:

spatial: height and width must be multiples of 32, max 720×1280
temporal: frame count must be a multiple of 8 plus 1 frame, max 257

By default, the resolution for v0.9.6 dev model is 704×1216.

Sources:

https://github.com/Lightricks/LTX-Video

Hunyuan video

TODO

Mochi 1

TODO

Wan2.1

480p model

There is not definite list for the 480p model, but the model enforces the following constraints:

The total number of pixel must be under 480×834
Height and width both must be divisible by 32

The following table seems reasonable.

Height	Width	Aspect ratio
480	848	53:30 (16:9)
480	640	4:3
480	480	1:1
832	480	26:15
480	832	15:26

Sources:

720p model

Height	Width	Aspect Ratio
720	1280	16:9
960	960	1:1
1280	720	9:16
832	1088	13:17
1088	832	17:13

Sources:

https://huggingface.co/spaces/Wan-AI/Wan2.1

Cosmos

TODO

brayevalerien/resolutions.md

Resolution guide for image and video generation

Introduction

Notation

Image generation

Stable Diffusion 1.5 (SD1.5)

Stable Diffusion XL (SDXL)

Stable Diffusion 3 (SD3)

Stable Diffusion 3.5 (SD3.5)

Flux.1

HiDream-l1

Video generation

LTX-Video

Hunyuan video

Mochi 1

Wan2.1

480p model

720p model

Cosmos