Thread by @anh_ng8, Do SotA natural language understanding models care about word order?Nope , 75% [...]

Anh Nguyen

anh_ng8

Do SotA natural language understanding models care about word order?

Nope

, 75% to 90% of the time, for BERT-based models, on many GLUE tasks (where they outperformed humans).

"marijuana cause cancer" == "cancer cause marijuana" Ouch...

https://arxiv.org/abs/2012.15180 1/4

Do SotA natural language understanding models care about word order?Nope , 75% to 90% of the time, for BERT-based models, on many GLUE tasks (where they outperformed humans)."marijuana cause cancer" == "cancer cause marijuana" Ouch... https://arxiv.org/abs/2012.15180 1/4

How? For *different* sequence-pair tasks, interestingly, RoBERTa-based models tend to use *the same* set of self-attention heads to match up similar words that are in both sequences.

This enabled model predictions to stay constant when all words in one question are shuffled. 2/4

For sentiment analysis (SST-2), ~60% of the sentence binary labels can be predicted from a single, most-important word. 3/4

This is intriguing given that pre-trained BERT's contextualized embeddings are known to be *contextual*.

Work led by the amazing PhD student @pmthangxai , and w/ Trung Bui & Long Mai!

We're grateful for any feedback :)
4/4

You can follow @anh_ng8.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: