Thread by @m_simonephd, My online #researchstudy was recently infiltrated by bots. I haven't shared this [...]

My online #researchstudy was recently infiltrated by bots. I haven't shared this story publicly because I felt a bit like it was my fault. I'm putting my pride aside because I think #dataintegrity is and will be a growing issue in survey data and is not discussed enough (1/n)

Bots have been a huge threat data integrity in recent years, and I can't believe that bot protection is not yet a standard part of the data integrity section of IRB submissions. Gone are the days where "checking the quality of the data every few days" will suffice (2/n)

Adding protections INTO your survey may take time and energy spent coding & creating advanced branch logic but it will save you hundreds of hours (and LOTS of money) if you do it right! (3/n)

A bit of my story: within 12 hrs of going live I had over 350 false respondents in my study. I tell you this now, but it took me hundreds of hours to identify these bots and more than a week's worth of work. I'm lucky enough to have a quant background which made this easier (4/n)

It took 10 codings schemes to reveal bots. If I hadn't had open ended questions, I am confident that I would not have identified the bots in my study. So, here are my lessons learned. Lesson 1: REQUIRE open-ended responses (5/n)

Lesson 2: Everyone doing online data collection needs to build in ***complex and advanced*** logic/inattentional checks throughout the first sets of surveys (and do NOT cluster them) (6/n)

Lesson 3: Add "honeypot" items to your survey. These are fields that are hidden to your average participants but are visible to bots. Name the items in an identical fashion to your other fields to prevent the bots from catching on (7/n)

Lesson 4: Captchas are not enough. But add them in anyway
Lesson 5: Screen participants and then email those who passed the screener with a unique survey link. This takes more time, but you have to do it. NO PUBLIC LINKS. EVER. DON'T DO IT.
(8/n)

Lesson 6: flag/prompt participants who are "speeding" through materials
Lesson 7: Ask similar questions at different points in your study to check for inconsistencies (e.g., ask gender twice)
(9/n)

Lessons 8-10: Your study will still have bots.

Check your data. A LOT. Don't blame yourself. Acknowledge this as a historical factor influencing data integrity and prepare for it.
That's it for now. Thank you for coming to my #TedTalk.

(10/10)
#AcademicTwitter #DataScience

One last tip - don’t automate participant payment. Thankfully I did not. This allowed me to review my data before paying to avoid compensating bots. Definitely check all of your data integrity markers before compensation and have an IRB approved protocol to determine who to pay

Clarification: this study did not use MTurk

I'm adding a recent article I wrote describing how I handled and identified bots in my study for those who continue to find this thread: https://behavioralscientist.org/how-to-battle-the-bots-wrecking-your-online-study/

How to Battle the Bots Wrecking Your Online Study - Behavioral Scientist

A word of caution to researchers using digital platforms to run their studies: beware of bots. They’re more sophisticated than you might think.

https://behavioralscientist.org/how-to-battle-the-bots-wrecking-your-online-study/

And one more. Just hoping to keep all of the related things in one place for those interested! :) https://www-new.statnews.com/2019/11/21/bots-started-sabotaging-my-online-research-i-fought-back/

Bots started sabotaging my online research. I fought back - STAT

When bots overtook an online research project, the investigator got mad. Now she is sharing tips to help others identify bots and preserve real data.

https://www-new.statnews.com/2019/11/21/bots-started-sabotaging-my-online-research-i-fought-back/

In another update - it seems like Twitter posts about research studies is the main way that bot programers are identifying studies to spam. I would think twice about posting a study with a large compensation amount without including a phone screen

The last 2020 article on this topic: https://www.nature.com/articles/d41586-020-00768-0

Mischief-making bots attacked my scientific survey

A barrage of fake responses to her online questionnaire prompted psychologist Melissa Simone to ferret out the culprits.

https://www.nature.com/articles/d41586-020-00768-0

Latest Threads Unrolled: