Thread by @alvations, DYK, before the "Attention is all you need" paper, Vaswani was already [...]

DYK, before the "Attention is all you need" paper, Vaswani was already seeking better neural representation for language model? #nlproc

That was also when some MT folks consider #neuralempty a possibility when NPLM is used in
@MosesSMT

https://nlg.isi.edu/software/nplm/vaswani-emnlp13.pdf https://twitter.com/alvations/status/1287769212312645632

Here's some breadcrumbs that traces back ~7 years ago https://github.com/moses-smt/nplm and here's an updated tutorial paper from Bicici (2019) that integrated the SOTA non- #neuralempy https://www.aclweb.org/anthology/W19-5306v1.pdf =)

moses-smt/nplm

Fork of http://nlg.isi.edu/software/nplm/ with some efficiency tweaks and adaptation for use in mosesdecoder. - moses-smt/nplm

https://github.com/moses-smt/nplm

Also something to note is how we "mystify" algorithms names when training parameters for our log linear models in the non-neural statistics MT era.

MERT = Linear regression with coordinate descent
PRO = Linear regression with gradient descent

I'm oversimplifying though ;P

BTW, I'm super thankful for Adam Lopez (now at https://rasa.com/ ) that has walked me through on the Moses mailing list and even explaining in person at @EdinburghNLP. Never could have understood MERT without his help https://www.mail-archive.com/[email protected]/msg13545.html

You can follow @alvations.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: