Thread by @FreeLawProject, A thread about the history of our legal citation extractor and open [...]

Free Law Project ⚖

FreeLawProject

A thread about the history of our legal citation extractor and open source. 1/?

In 2015 or so, two students at @BerkeleyISchool wrote the first version of it. It was pretty good, and was able to find basic citations in a paragraph, look them up in CourtListener, and then make them into links. Cool, v1 was born.

Later, we wanted it to work on all kinds of citations, and we started building a huge database of reporters, their abbreviations, dates, etc. This became our reporters DB: https://github.com/freelawproject/reporters-db/. A *bunch* of folks have now sunk weeks of their lives into making it really good.

freelawproject/reporters-db

A database of court reporters, tests and other experiments - freelawproject/reporters-db

https://github.com/freelawproject/reporters-db/

And it works! Using the database of reporter dates and abbreviations, we can find practically all of citations in a block of text. Awesome. But it was missing a few things:
1. It didn't handle depth of treatment.
2. It was really bad at weird citation formats

Well, we're in 2020 now, and along comes another volunteer. Out of nowhere, he implements support for Id, supra, etc. Wow. https://free.law/2020/03/05/citation-data-gets-richer/

Citation Data Gets Richer

This is a guest post by Matt Dahl, a Ph.D. student in political science at the University of Notre Dame. Citation data is a keystone of legal research—both for understanding a particular judicial...

https://free.law/2020/03/05/citation-data-gets-richer/

Next, we realize it'd be great if all this citation stuff lived outside of CourtListener's code base so that others could use it. A few weeks ago, "eyecite" was born. Now, if you want to pull citations out of text, there's an easy drop in tool for that: https://pypi.org/project/eyecite/

eyecite

A citation extraction tool.

https://pypi.org/project/eyecite/

But, we're not done yet b/c it's 2021 now. The next thing that happens is that Jack Cushman from @harvardlil drops by and makes it 10× faster via some embarrassing performance tweaks. The fruit was hanging low, folks.

Now we're working on making it match a lot more stuff — like statutes — while keeping the performance mostly stable. It's incredible work, and we'll be sharing more about eyecite soon, but it's hard not to rave right now. /fin

You can follow @FreeLawProject.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: