Thread by @ahmetmeleq, Researchers from @allen_ai and @facebookai publish the article "Shortformer: Better Language Modeling [...]

Ahmet Melek

ahmetmeleq

Researchers from @allen_ai and @facebookai publish the article "Shortformer: Better Language Modeling using Shorter Inputs".

Paper name refers to the older paper "Longformer", which does language modelling on longer sequences.

Paper and code here: https://www.reddit.com/r/MachineLearning/comments/knawu8/r_shortformer_better_language_modeling_using/

[R] Shortformer: Better Language Modeling using Shorter Inputs

We just put a new paper up, in which we show how to improve language models by *shortening* the length of their inputs. Our Shortformer trains...

https://www.reddit.com/r/MachineLearning/comments/knawu8/r_shortformer_better_language_modeling_using/

Longformer was an important article, because transformers models like BERT use an attention that has O(N^2) time complexity, "N" being sequence length

Longformer attention has O(N) time complexity, so we can process longer input sequences with Longformer https://medium.com/dair-ai/longformer-what-bert-should-have-been-78f4cd595be9

Longformer — The Long-Document Transformer 📝

Processing longer forms of text with BERT-like models require us to rethink the attention mechanism in more than one way.

https://medium.com/dair-ai/longformer-what-bert-should-have-been-78f4cd595be9

"Shortformer", uses two methods to improve speed and performance:

1- Staged training: Start with shorter sentences while training, and use longer sentences in the later iterations.

2- Position Infused Attention: Utilize position embeddings just before calculating attention.

"Staged training" reminds me of "Progressive GANs", which trains the GAN model starting with low quality images (going easy on the model), and going towards high quality images.

https://machinelearningmastery.com/how-to-train-a-progressive-growing-gan-in-keras-for-synthesizing-faces/

Progressive GAN method was also used in StyleGAN paper, which perhaps became the most popular GAN model in the recent years.

Example of Style-GAN outputs:
https://thispersondoesnotexist.com/

How Style GAN works: https://towardsdatascience.com/explained-a-style-based-generator-architecture-for-gans-generating-and-tuning-realistic-6cb2be0f431

Explained: A Style-Based Generator Architecture for GANs - Generating and Tuning Realistic…

NVIDIA’s novel architecture for Generative Adversarial Networks

https://thispersondoesnotexist.com/

You can follow @ahmetmeleq.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: