To synthesise realistic megapixel images, learn a high-level discrete representation with a conditional GAN, then train a transformer on top. Beautiful synergy between adversarial and likelihood-based learning! đź§µ (1/8) https://twitter.com/ak92501/status/1339754735658799106
GANs tend to focus on realism, not diversity. This is a great trade-off for capturing realistic texture, as shown before in e.g. HiFiC ( https://hific.github.io/ ): compare their 3rd example (red barn door) to the original to see what I mean. (2/8)
Likelihood-based models like transformers do better at capturing diversity compared to GANs, but tend to get lost in the details. Likelihood is mode-covering; not mode-seeking, like adversarial losses are. (3/8)
By measuring the likelihood in a space where texture details have been abstracted away, the transformer is forced to capture larger-scale structure, and we get great compositions as a result. (4/8)
I've previously discussed the importance of measuring likelihoods in the right space in a blog post ( https://benanne.github.io/2020/09/01/typicality.html#right-level) and on Twitter (e.g. https://twitter.com/sedielem/status/1336733174961999878). (5/8)
While the combination of convolutional architectural priors and powerful transformers is a valuable contribution of this work, personally I think the synergy between adversarial and likelihood-based losses is even more important! (6/8)
This work is also a natural evolution of VQ-VAE2 ( https://arxiv.org/abs/1906.00446 ), which already showed how powerful representation learning can be, in the context of autoregressive generative modelling. Replacing the VQ-VAE with a VQ-GAN enables more aggressive downsampling. (7/8)
This also makes the approach very democratic: the transformer prior operates on short sequences of length 256, and can be trained on a single GPU.

Very exciting, congratulations @pess_r, @robrombach and Björn Ommer! (8/8)
You can follow @sedielem.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.