Thread by @EdwardRaffML, An extension of my NeurIPS work accepted to @RealAAAI #AAAI2021 "Research Reproducibility [...]

Edward Raff

EdwardRaffML

An extension of my NeurIPS work accepted to @RealAAAI #AAAI2021 "Research Reproducibility as a Survival Analysis" is now online! paper

https://arxiv.org/abs/2012.09932 code

https://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysis The idea? Reproduction isn't just a binary "is" or "is not" that we've been focusing on! https://twitter.com/EdwardRaffML/status/1173770808616980481

An extension of my NeurIPS work accepted to @RealAAAI #AAAI2021 "Research Reproducibility as a Survival Analysis" is now online! paper https://arxiv.org/abs/2012.09932 code https://github.com/EdwardRaff/Research-Reproducibility-Survival-Analysis The idea? Reproduction isn't just a binary "is" or "is not" that we've been focusing on! https://twitter.com/EdwardRaffML/status/1173770808616980481

So I went back through all my code I had access to and from git commits determined a completion date. Start date from @mendeley_com . This now gives time to complete (rough, take with salt) for most of the reproduced papers! You can see a heavy tail to this distribution

Time to reproduce is more objective and informative! This can also be described as a survival analysis which lets us considered failed replications as right censored, meaning more effort/time needed to reproduce. Lets us finish the study, and gives us an effect size of time spent

Knowing effect size gives us ideas about where to intervene to improve research reproducibility. I want data from more people to make results robust, but we can still get some potential insights / hints. In particular, using an XGboost cox survival + SHAP lets us dig into data!

Previous proofs looked like no value. Now we can see that well proven work takes less time to reproduce! Color shows most correlated other feature, more proofs = more equations. But last time I said more eqs = bad. Solution in the paper, go read it for deeper insight!

A lot of value from this analysis is better insights from the most objective (least subjective) features. Another refinement: tables != better. There is a sweet spot, you can have too few or too many tables i your paper!

Another one is year paper was published. Newer papers are taking longer to reproduce! Lends credence to the idea that the situation is potentially getting worse. Or are we just building more complex models that naturally take longer?

Most important, and distressing. Year attempted has almost no relationship. Apparently I've not gotten any better at implementing papers after so much practice

. Now I can quantify just how little I learn

Maybe there is an lower bound to person replication time?

Most important, and distressing. Year attempted has almost no relationship. Apparently I've not gotten any better at implementing papers after so much practice . Now I can quantify just how little I learn Maybe there is an lower bound to person replication time?

Better answers to all of these questions will only come with better data! If you are implementing a paper, or organizing a study, please keep track of time spent on the effort per paper! Its super valuable information and will help us understand reproduction!

You can follow @EdwardRaffML.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: