Thread by @paulnovosad, Like much regulation, I fear making econ journals the data transparency enforcers [...]

Like much regulation, I fear making econ journals the data transparency enforcers has raised the cost of doing research without benefiting anyone.

It's particularly bad for young scholars without RA teams, and for people working with non-secret admin data.

Some anecdotes

Anecdote 1: delay.

Paper is conditionally accepted, as long as it clears the data replication team.

I submit my replication files.

*SIX MONTHS LATER*, the replication report below.

Failure on LINE 1 of the code, b/c the replicator had a space in their path name. 2/N

This was just the beginning.

I've since talked to many researchers who have spent days and days debugging configuration problems on the replicators' machines.

PIs should not be doing tech support for journal replicators. They should be doing more high-value research. 3/N

Anecdote 2: you can't regulate being good.

After our roads paper was accepted at AER, we spent 6 months back and forth with the data editor. At least 2 months PI time were spent meeting the requirements.

That data/replication packet is a hot mess. 4/N

Specifically, a mess borne of authors ticking all the requirements rather than trying to produce a clean open data source. These are not the same thing!

(We did the latter in parallel: check out the SHRUG, a much better starting point for replicating this paper) 5/N

Anecdote 3: secret data artistry

For our roads and school enrollment paper in AEJ, the education data provider asked us not to post it publicly, so we asked for an exclusion.

Here's what the journal data process was like: 6/N

Working with secret data was like wishing for more wishes — by skipping the rebuild/replication process, two months had been added to my life! 7/N

This is not to blame the data editors, who are trying to satisfy impossible rules, and who were extremely helpful and generous with their time every step of the way, and have helped us produce better packages.

The problem is with the rules. 8/N

tbc I am in favor of research transparency! Our team spends a lot of time making our work more transparent, and on producing more and more usable open data sources from all of our work.

However... 9/N

I'm not sure you can litigate effective research transparency.

You raise the costs for everyone, but the people with shady practices can easily adjust their shady practices—most easily by using the proprietary data exclusion. 10/N

I'm conflicted about this. I want to get to the place where research produces open data and is rapidly and easily replicable.

But I wonder if we need to get there with norms rather than with rules. 11/N

It's easy to tell if a scholar cares about reproducibility/open data. Visit one of their project data pages. Is it clear? Are the data clean? Is there documentation?

But this is orthogonal to what journals make you do, because the eds don't know where the skeletons are. 12/N

Full disclosure, our earlier stuff has not met this standard! Here is our latest replication repo, which is getting closer to where we want to be: https://github.com/devdatalab/paper-agn-forests-roads 13/N

devdatalab/paper-agn-forests-roads

Replication code and data for Asher, Garg, Novosad "The Ecological Impact of Transportation Infrastructure", Economic Journal - devdatalab/paper-agn-forests-roads

https://github.com/devdatalab/paper-agn-forests-roads

Maybe our system is on the right path and it is just a slow road. Our team is redesigning our future builds to anticipate the replication process and to make it less costly. It will get better. But...

14/N

The current system puts a disproportionate cost on empiricists, on young scholars without RA teams, and on people who use public administrative data. And it gives a free pass to secret data artists.

https://johnhcochrane.blogspot.com/2015/12/secret-data.html

15/N

Small steps we can take:
- Jr scholars: you can push back against the data editors! If they ask you to do something that will take months of your time, you can negotiate!
- Data editors: pls respect jr scholars' time, and grant exceptions rather than imposing months of work. 16/N

Some more small things:
- Data eds should abandon the rule that replications need to rebuild directly from all the raw public datasets. This is

for large projects.
- We are putting together some wikis on code/repo design templates for surviving this process. 17/N

A bigger step we could take: journals should hire permanent professional staff to work on replication. People with CS skills who can work fast, handle multi-platform work, and minimize costs to authors.

Imagine if replicators could post fixes to authors' repos!

18/N

Perhaps a pipe dream: Take required replication out of the journal process. The process is long and painful enough already.

Instead, use crowd-sourcing and reputation:
19/N

1. Authors post their replication repos
2. Grad students can try to download and replicate
3. If they succeed, the journal puts a *replicated* emoji on the article, credits the grad student.
4. Authors develop reputations for having replicable papers.

20/N

Conclusion: good intentions, unexpected consequences, we can do better.

21/21

Good post for @julianreif on building code in anticipation of this: https://mobile.twitter.com/JulianReif/status/1354918664521338881

https://mobile.twitter.com/JulianReif/status/1354918664521338881

Latest Threads Unrolled: