I think we've pretty much established that dictionary methods are not good at classifying the sentiment of *short* docs *individually*
But that's not the same as validating dictionary methods on *many* docs together as an *aggregate*, and I'd like to see more work on that
1/
But that's not the same as validating dictionary methods on *many* docs together as an *aggregate*, and I'd like to see more work on that
1/
To raise the point of how these are not the same, I'd like to briefly discuss the @hedonometer project by @compstorylab
2/
2/
They calculate sentiment as the dictionary weighted avgerage over many tweets, *not* the avgerage of weight averages of individual tweets
These are mathematically and conceptually different. I think the differences warrant further investigation for at least 2 reasons
3/
These are mathematically and conceptually different. I think the differences warrant further investigation for at least 2 reasons
3/
1) In their "Geography of Happiness" paper, they find reasonable correlations between simple dictionary sentiment over the tweets from entire US states and other state-level measures of well-being
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0064417
4/
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0064417
4/
2) For a decade now, applying the sentiment analysis to 10% of all English tweets has consistently identified reasonable and interpretable spikes and trends in expressed emotion
5/
5/
Please please please do not come into my mentions talking about Twitter as a non-representative sample
That's not my point
My point is that there's enough evidence here to warrant further investigating how well dictionary methods perform over large aggregate corpora of text
6/
That's not my point
My point is that there's enough evidence here to warrant further investigating how well dictionary methods perform over large aggregate corpora of text
6/
So I would like to see less about how well dictionary methods do on *classification*, and more on how well they do as *continuous measures* across many texts/a lot of text
This requires very different annotation and validation tasks and there's a lot of room for innovation
7/7
This requires very different annotation and validation tasks and there's a lot of room for innovation
7/7