One thing I've never really gotten over about industry is just how much of the economy is about unfucking other parts of the economy.

Not in any grand political sense. I mean, at a practical nuts and bolts level, undoing the careless mistakes of others made further upstream.
One of my first projects at a ~data science~ startup was imputing some characteristics of a client's book of business.

Not because it needed sophisticated statistical inference. They could have just looked the relevant info up, if they had the names and addresses.
But they didn't have the names and addresses of their own clients, because someone accidentally deleted a column in the only copy of a spreadsheet. So they decided to try and machine learn them instead. Their own clients' names and addresses.
Some of the smartest people I know, like literal particle physicists, are working on the problem of data extraction from pdfs.

You're probably thinking of "extracting insight" or something but I literally mean pulling tables that already exist out of pdfs and into csvs.
The pdf standard, which is turing complete by the way, apparently does not define a standard mechanism for representing tabular data. Or if it does, no one uses it. And when you get down to it, like what even is a table, man?

Meanwhile, people need their csvs of tabular data.
The invisible hand's solution to this was to pull a bunch of people away from their postdocs in limning the fundamental nature of the universe, and set them to the task of getting numbers out of pdfs that other people had put into pdfs.
Presumably so that someone else could put them back in another pdf later.
You don't pay these people's salaries by accident and so I have every confidence that this project will eventually pay for itself.

But, when you zoom out, they're implementing one half of the world's most expensive identity function.
A friend of mine jokes that his hiring strategy is to actively try to retard scientific progress.

I think it might be working.
It's good for an academic field to have some competitive pressure from its cognate industry, giving people an escape hatch.

And every decision en route to pdf_unfuckr 1.0 made sense in its immediate context.
The result is that people who were once studying the nature of reality are now just building up mental models of the insides of the pdf spec authors' heads, or worse their own colleagues'.

And probably finding it more taxing, too.
You can follow @MelancholyYuga.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.