DYK, before the "Attention is all you need" paper, Vaswani was already seeking better neural representation for language model? #nlproc

That was also when some MT folks consider #neuralempty a possibility when NPLM is used in
@MosesSMT

https://nlg.isi.edu/software/nplm/vaswani-emnlp13.pdf https://twitter.com/alvations/status/1287769212312645632
Also something to note is how we "mystify" algorithms names when training parameters for our log linear models in the non-neural statistics MT era.

MERT = Linear regression with coordinate descent
PRO = Linear regression with gradient descent

I'm oversimplifying though ;P
BTW, I'm super thankful for Adam Lopez (now at https://rasa.com/ ) that has walked me through on the Moses mailing list and even explaining in person at @EdinburghNLP. Never could have understood MERT without his help https://www.mail-archive.com/[email protected]/msg13545.html
You can follow @alvations.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.