Masked Generative Image Transformers (MaskGIT) have gained popularity for their fast and efficient image generation capabilities. However, the sampling strategy used to progressively "unmask" tokens in these models plays a crucial role in determining image quality and diversity. Our new research paper, introduces the Halton Scheduler—a novel approach that significantly enhances MaskGIT's image generation performance.
Traditional MaskGIT uses a Confidence scheduler, which selects tokens based on logit distribution but tends to cluster token selection, leading to reduced image diversity. The Halton Scheduler addresses this by leveraging low-discrepancy sequences, the Halton sequence, to distribute token selection more uniformly across the image.
Figure 1: MaskGIT using our Halton scheduler on ImageNet 256.
Figure 2: MaskGIT using our Halton scheduler for text-to-image.
Figure 3: MaskGIT using the Confidence scheduler for text-to-image.
On benchmark datasets like ImageNet (256×256) and COCO, the Halton Scheduler outperforms the baseline Confidence scheduler:
@inproceedings{besnier2025iclr, title={Halton Scheduler for Masked Generative Image Transformer}, author={Victor Besnier, Mickael Chen, David Hurych, Eduardo Valle, Matthieu Cord}, booktitle={International Conference on Learning Representations (ICLR)}, year={2025} }