If you (like me) see a 600B model, and shriek, let me try to give you some consolation. Why should we care about ultra-large models?
 1/n https://twitter.com/lepikhin/status/1278125364787605504
LM performance improves if we increase the size of a model and data simultaneously, but enters a regime of diminishing returns if one is kept fixed. That is, increasing amount of data, while using a smaller model is not helpful eventually 2/n https://api.semanticscholar.org/CorpusID:210861095
Since large LMs converge faster, training a large model, but stop training early actually saves *training* compute. Don’t be bummed about the *inference* time, because large models are more compressible! 3/n https://twitter.com/Eric_Wallace_/status/1235616760595791872
You can follow @anmarasovic.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.