Skip to content

Instantly share code, notes, and snippets.

@brayevalerien
Last active April 25, 2025 09:45
Show Gist options
  • Save brayevalerien/5224f5441ab877166b5a3f328fe381b8 to your computer and use it in GitHub Desktop.
Save brayevalerien/5224f5441ab877166b5a3f328fe381b8 to your computer and use it in GitHub Desktop.
Reference resolutions for most image and video generations AI models.

Resolution guide for image and video generation

Caution

This is very much a WIP. Not all models or resolutions are listed yet, and there are placeholder "TODO" scattered in the document. It will be updated as soon as possible.

Introduction

This documents serves as a reference cheatsheet for choosing the right resolution for the most common image and video generation AI models.

While most visual generation models support any resolution in theory, their training procedure constrains the resolution of generated images / videos to certain values only. They are the resolutions the models were trained on and that are natively supported, meaning the generation quality will be optimal.

For each model, we give a list of contrainsts the generation resolutions must follow and/or a table listing all known optimal resolution alongside other info such as aspect ratio or video length if necessary. For each model, we provide the sources.

Note

The tables are as exhausive as possible but might miss some values or you might need a new model to be added to the list, so feel free to request a change (please provide appropriate sources).

Notation

In the rest of this document, resolutions are given in ComfyUI Tensor order, meaning: heigh, width, frame count.

We denote the width, the height and the frame count respectively with the letters w, h and f. The resolutions are thus often noted h×w×f (e.g. 1280×720×129). If only two values are provided (for images or for videos), they are indicating the spatial resolution: h×w.

Image generation

Stable Diffusion 1.5 (SD1.5)

TODO

Stable Diffusion XL (SDXL)

Height Width Aspect Ratio
512 2048 1:4
512 1984 1:3.88
512 1920 1:3.75
512 1856 1:3.62
576 1792 9:16
576 1728 1:3
576 1664 2:7
640 1600 2:5
640 1536 2:4.8
704 1472 3:6.3
704 1408 1:2
704 1344 4:7
768 1344 4:7
768 1280 3:5
832 1216 2:3
832 1152 2:2.77
896 1152 4:5
896 1088 4:5
960 1088 6:7
960 1024 15:16
1024 1024 1:1
1024 960 16:15
1024 576 16:9
1088 960 9:8
1088 896 17:14
1152 896 4:3
1152 832 11:8
1216 832 3:2
1216 832 3:2
1280 768 5:3
1344 768 7:4
1408 704 2:1
1472 704 21:10
1536 640 12:5
1600 640 5:2
1664 576 2.89:1
1728 576 3:1
1792 576 3.11:1
1856 512 3.62:1
1920 512 3.75:1
1984 512 3.88:1
2048 512 4:1
576 1024 9:16

These are the training resolution, natively supported by the model, they are visualized in the plot bellow. Most used resolution for inference are in bold. Note that finetuned checkpoints by have different recommended resolutions.

image

Sources:

Stable Diffusion 3 (SD3)

TODO

Stable Diffusion 3.5 (SD3.5)

TODO

Flux.1

TODO

HiDream-l1

Height Width Aspect ratio
768 1360 85:48 (16:9)
832 1248 3:2
800 1168 73:50 (4:3)
1024 1024 1:1
1168 800 50:73 (3:4)
1248 832 2:3
1360 768 48:85 (9:16)

Sources:

Video generation

LTX-Video

General resolution constraints:

  • spatial: height and width must be multiples of 32, max 720×1280
  • temporal: frame count must be a multiple of 8 plus 1 frame, max 257

By default, the resolution for v0.9.6 dev model is 704×1216.

Sources:

Hunyuan video

TODO

Mochi 1

TODO

Wan2.1

480p model

There is not definite list for the 480p model, but the model enforces the following constraints:

  • The total number of pixel must be under 480×834
  • Height and width both must be divisible by 32

The following table seems reasonable.

Height Width Aspect ratio
480 848 53:30 (16:9)
480 640 4:3
480 480 1:1
832 480 26:15
480 832 15:26

Sources:

720p model

Height Width Aspect Ratio
720 1280 16:9
960 960 1:1
1280 720 9:16
832 1088 13:17
1088 832 17:13

Sources:

Cosmos

TODO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment