The Interpretation of Means
=================

by Zigmunt Bot.

When you see statistics for an account which includes an "average" value for its tweets-per-day over a period, it is important to understand what these figures do & do not imply.

1/20
When I (or StattoBot) report a mean (average) value, it is always the *arithmetic mean*: we've taken the total number of tweets over a period & divided it by the number of days that elapsed between two dates.

mean tpd = Tweets in n days / n

A crude measure, but useful.

2/20
So, for my Botvolution account, Twitter tells me I've sent 13,275 tweets.

1,028 days have elapsed between my account creation & 21/07/2020.

The mean is 13x/day, but it doesn't mean I tweet exactly 13x every day, of course.

3/20
We can also calculate the mean for what is usually (but not always) a *different* & more recent period.

And this is where things start to get complicated.

4/20
The Twitter API (the programmatic interface that StattoBot uses) can only harvest up to ~3,200 of an account's most recent tweets.

So, I can harvest 3,195 of my own tweets, covering the past 131 days.

That gives a mean over about 4-and-a-bit months of 20 tweets/day.

5/20
However, consider an account that was created in Nov 2009 & has 196 tweets on file (this is a real account); that's 3,877 days, & a mean of 0.05 tweets/day.

This really tells us nothing on its own (but we also must wonder why an account so old has so few tweets).

6/20
Still, we have all the tweets, don't we (196 is fewer than 3,200), so maybe we can see what a more recent mean looks like?

In this case, we can: it turns out that all those 196 tweets were made in the past 16 days, & so the tweets/day figure is actually ... 5

7/20
But what if, of 196 tweets, *one* of them is in Nov 2009 & 195 are from today?

Then the mean for the lifetime & for the harvest itself will be identical - 0.05/day.

What about 1 tweet per year from 2009 to 2019 inclusive, but 185 in the past hour?

Same!

Aaaargh!
.
8/20
When it comes to StattoBot, or me doing short threads on accounts, I try to skirt some of these traps by reporting the values for the past (up to) 28 days only, as well as the "lifetime" & harvested numbers.

OK, so is that all we have to worry about?

9/20
Well, no.

Consider an account that's, say, 5 years old & has a mean tweets/day over that period of 50.

That doesn't sound very many, does it?

It's very easy to pump out 50 tweets in a normal waking day of 16 or so hours, it's ~3/hr: a couple of retweets & a lol, innit.

10/20
Now, let's take that mean *absolutely literally*:
Assume that every single day for the past 2,000 days, it has tweeted 50x.

(bear with me here, it's a thought-experiment 🤔)

11/20
Therefore, it has tweeted 50x during every illness, family emergency, every working day, every weekend day, every national holiday, at every wedding, funeral, birthday or other anniversary day it has attended, every longhaul flight, every day-long drive ...

12/20
Well, of course it hasn't.

A "normal" person doesn't do that.

They tweet much less, even none, some days but then a lot more on others. Over time, the mean settles to that figure of 50.

12a/20
So, let's say that one day a week, every week, they tweet only 15x.

For the other 6 days, they must send ~55 tweets, or the mean would be lower.

Conversely, if one day a week, every week, they tweet 100x, the other days they must send ~41, or the mean would be higher.

13/20
But consider the implications of an acct 5 yrs old & a mean of 433/day, with the appearance of an individual: It has an avatar of a person, a bio giving a trade or profession, etc.

Think about what a real live person who works would have to do to maintain that mean.

13a/20
Now, real statisticians (i.e. not me) have data about what populations of real people actually do, & can do witchcraft to estimate & illustrate what might be going on & whether or not it looks "normal", & if not, give numbers to show how abnormal something is.

14/20
Even if I could do that, I probably couldn't fit it into two tweets & make it comprehensible to a general enquirer.

So we have to think about how we use such mean values:

And the way to do it is by also being informed by *other* data, both qualitative & quantitative.

15/20
What other data can we look at?

There's plenty: the bio, the content, who the account retweets, what hashtags it uses, what hours it tweets, what volume it's been tweeting in the past 28 days, what kind of language it uses ... & so on.

16/20
So if our 5-year-old account tweeting a mean of 50x/day for 5 years, has actually tweeted 3,200 links to porn sites in the past 6 days, but the bio says it's Vanessa from Akron who loves Jesus, her children & her dogs + has a verifiable link to her school board's website?

17/20
Well ... we might raise an eyebrow.

If we paid attention only to the mean tweets/day, we might pass over the account, rather than reporting it as a rather obviously hacked account.

18/20
Bear in mind though, some real people really do tweet 2 or 3 hundred times a day, or really do just relight their 2009 account in 2020 & start tweeting again, or just do regularly delete all their old tweets, or do delete their old accounts & start over with a new one.

19/20
The question we have to ask is: how does *all* the evidence stack up?

Is Jacqui from Newbury really just an ardent brexiter who loves her grandkids (but also tweets links to extremist material from 0300-2200 250x daily) ?

I do hope this has been useful. It took me ages.

20/20
FUCKETY FUCK

Yes, 196 tweets in 16 days is 12.25, not 5.

(tweet 7/20)

How very annoying.
You can follow @botvolution.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.