DYK, before the "Attention is all you need" paper, Vaswani was already seeking better neural representation for language model? #nlproc
That was also when some MT folks consider #neuralempty a possibility when NPLM is used in
@MosesSMT
https://nlg.isi.edu/software/nplm/vaswani-emnlp13.pdf https://twitter.com/alvations/status/1287769212312645632
That was also when some MT folks consider #neuralempty a possibility when NPLM is used in
@MosesSMT
https://nlg.isi.edu/software/nplm/vaswani-emnlp13.pdf https://twitter.com/alvations/status/1287769212312645632
Here's some breadcrumbs that traces back ~7 years ago https://github.com/moses-smt/nplm and here's an updated tutorial paper from Bicici (2019) that integrated the SOTA non- #neuralempy https://www.aclweb.org/anthology/W19-5306v1.pdf =)
Also something to note is how we "mystify" algorithms names when training parameters for our log linear models in the non-neural statistics MT era.
MERT = Linear regression with coordinate descent
PRO = Linear regression with gradient descent
I'm oversimplifying though ;P
MERT = Linear regression with coordinate descent
PRO = Linear regression with gradient descent
I'm oversimplifying though ;P
BTW, I'm super thankful for Adam Lopez (now at https://rasa.com/ ) that has walked me through on the Moses mailing list and even explaining in person at @EdinburghNLP. Never could have understood MERT without his help https://www.mail-archive.com/[email protected]/msg13545.html