Researchers from @allen_ai and @facebookai publish the article "Shortformer: Better Language Modeling using Shorter Inputs".
Paper name refers to the older paper "Longformer", which does language modelling on longer sequences.
Paper and code here: https://www.reddit.com/r/MachineLearning/comments/knawu8/r_shortformer_better_language_modeling_using/
Paper name refers to the older paper "Longformer", which does language modelling on longer sequences.
Paper and code here: https://www.reddit.com/r/MachineLearning/comments/knawu8/r_shortformer_better_language_modeling_using/
Longformer was an important article, because transformers models like BERT use an attention that has O(N^2) time complexity, "N" being sequence length
Longformer attention has O(N) time complexity, so we can process longer input sequences with Longformer https://medium.com/dair-ai/longformer-what-bert-should-have-been-78f4cd595be9
Longformer attention has O(N) time complexity, so we can process longer input sequences with Longformer https://medium.com/dair-ai/longformer-what-bert-should-have-been-78f4cd595be9
"Shortformer", uses two methods to improve speed and performance:
1- Staged training: Start with shorter sentences while training, and use longer sentences in the later iterations.
2- Position Infused Attention: Utilize position embeddings just before calculating attention.
1- Staged training: Start with shorter sentences while training, and use longer sentences in the later iterations.
2- Position Infused Attention: Utilize position embeddings just before calculating attention.
"Staged training" reminds me of "Progressive GANs", which trains the GAN model starting with low quality images (going easy on the model), and going towards high quality images.
https://machinelearningmastery.com/how-to-train-a-progressive-growing-gan-in-keras-for-synthesizing-faces/
https://machinelearningmastery.com/how-to-train-a-progressive-growing-gan-in-keras-for-synthesizing-faces/
Progressive GAN method was also used in StyleGAN paper, which perhaps became the most popular GAN model in the recent years.
Example of Style-GAN outputs:
https://thispersondoesnotexist.com/
How Style GAN works: https://towardsdatascience.com/explained-a-style-based-generator-architecture-for-gans-generating-and-tuning-realistic-6cb2be0f431
Example of Style-GAN outputs:
https://thispersondoesnotexist.com/
How Style GAN works: https://towardsdatascience.com/explained-a-style-based-generator-architecture-for-gans-generating-and-tuning-realistic-6cb2be0f431