Caution
This is very much a WIP. Not all models or resolutions are listed yet, and there are placeholder "TODO" scattered in the document. It will be updated as soon as possible.
This documents serves as a reference cheatsheet for choosing the right resolution for the most common image and video generation AI models.
While most visual generation models support any resolution in theory, their training procedure constrains the resolution of generated images / videos to certain values only. They are the resolutions the models were trained on and that are natively supported, meaning the generation quality will be optimal.
For each model, we give a list of contrainsts the generation resolutions must follow and/or a table listing all known optimal resolution alongside other info such as aspect ratio or video length if necessary. For each model, we provide the sources.
Note
The tables are as exhausive as possible but might miss some values or you might need a new model to be added to the list, so feel free to request a change (please provide appropriate sources).
In the rest of this document, resolutions are given in ComfyUI Tensor order, meaning: heigh, width, frame count.
We denote the width, the height and the frame count respectively with the letters w, h and f. The resolutions are thus often noted h×w×f (e.g. 1280×720×129). If only two values are provided (for images or for videos), they are indicating the spatial resolution: h×w.
TODO
Height | Width | Aspect Ratio |
---|---|---|
512 | 2048 | 1:4 |
512 | 1984 | 1:3.88 |
512 | 1920 | 1:3.75 |
512 | 1856 | 1:3.62 |
576 | 1792 | 9:16 |
576 | 1728 | 1:3 |
576 | 1664 | 2:7 |
640 | 1600 | 2:5 |
640 | 1536 | 2:4.8 |
704 | 1472 | 3:6.3 |
704 | 1408 | 1:2 |
704 | 1344 | 4:7 |
768 | 1344 | 4:7 |
768 | 1280 | 3:5 |
832 | 1216 | 2:3 |
832 | 1152 | 2:2.77 |
896 | 1152 | 4:5 |
896 | 1088 | 4:5 |
960 | 1088 | 6:7 |
960 | 1024 | 15:16 |
1024 | 1024 | 1:1 |
1024 | 960 | 16:15 |
1024 | 576 | 16:9 |
1088 | 960 | 9:8 |
1088 | 896 | 17:14 |
1152 | 896 | 4:3 |
1152 | 832 | 11:8 |
1216 | 832 | 3:2 |
1216 | 832 | 3:2 |
1280 | 768 | 5:3 |
1344 | 768 | 7:4 |
1408 | 704 | 2:1 |
1472 | 704 | 21:10 |
1536 | 640 | 12:5 |
1600 | 640 | 5:2 |
1664 | 576 | 2.89:1 |
1728 | 576 | 3:1 |
1792 | 576 | 3.11:1 |
1856 | 512 | 3.62:1 |
1920 | 512 | 3.75:1 |
1984 | 512 | 3.88:1 |
2048 | 512 | 4:1 |
576 | 1024 | 9:16 |
These are the training resolution, natively supported by the model, they are visualized in the plot bellow. Most used resolution for inference are in bold. Note that finetuned checkpoints by have different recommended resolutions.
Sources:
- https://arxiv.org/pdf/2307.01952 (Appendix I.)
TODO
TODO
TODO
Height | Width | Aspect ratio |
---|---|---|
768 | 1360 | 85:48 (16:9) |
832 | 1248 | 3:2 |
800 | 1168 | 73:50 (4:3) |
1024 | 1024 | 1:1 |
1168 | 800 | 50:73 (3:4) |
1248 | 832 | 2:3 |
1360 | 768 | 48:85 (9:16) |
Sources:
General resolution constraints:
- spatial: height and width must be multiples of 32, max 720×1280
- temporal: frame count must be a multiple of 8 plus 1 frame, max 257
By default, the resolution for v0.9.6 dev model is 704×1216.
Sources:
TODO
TODO
There is not definite list for the 480p model, but the model enforces the following constraints:
- The total number of pixel must be under 480×834
- Height and width both must be divisible by 32
The following table seems reasonable.
Height | Width | Aspect ratio |
---|---|---|
480 | 848 | 53:30 (16:9) |
480 | 640 | 4:3 |
480 | 480 | 1:1 |
832 | 480 | 26:15 |
480 | 832 | 15:26 |
Sources:
- https://github.com/Wan-Video/Wan2.1
- https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
- https://www.alibabacloud.com/help/en/model-studio/image-to-video-api-reference
Height | Width | Aspect Ratio |
---|---|---|
720 | 1280 | 16:9 |
960 | 960 | 1:1 |
1280 | 720 | 9:16 |
832 | 1088 | 13:17 |
1088 | 832 | 17:13 |
Sources:
TODO