1/12 This week marks my 7th anniversary as a data scientist. Looking back, the hardest thing to learn was working with real data. It was something I only slowly grasped, and probably I am still learning. A thread. 🧵
2/12 Graduating from a masters in applied statistics I still had to learn a lot before I could work in the real world. The obvious part was the technical part: databases, servers, good software engineering practices etc.
3/12 Working with data was something I thought I already mastered. I knew how to clean, combine and visualise data in R so I thought I was already home. What more could there be to it than that?
4/12 I treated data as a given. The Latin word "datum" even literally translates to this. As are the Dutch word "gegevens" and the French word "données" that are often used to indicate data.
5/12 Maybe when your data are gathered specifically for the purpose of analysis, you can treat them as a given (assuming all went well). But when working with data collected for other reasons it is never in the shape or form you want.
6/12 It was often unclear what exactly I was looking at, how the events were gathered in the databases. When I asked around, more often than not, nobody really seemed to know in detail. Looking back, my biggest error is how I responded to these situations.
7/12 I think you can qualify it best as "desperate arrogance". I really wanted to put my hard won expertise to work in some way or form and contribute. At the same time I admit that I would scoff at the company or some of the people.
8/12 "If these people don't know what they are talking about, how can you expect me to do a statistical analysis or build some ml product." Looking back this stance was cynical, cocky and unproductive. It was also the easy way out. You don't have to go the extra mile this way.
9/12 Eventually, I came to learn that data is typically not gathered to please the data scientist. It is there first and foremost to run a business. If you want to wield that data for data science purposes you have to work very hard for it, and this is totally part of the job.
10/12 If you are lucky, you work with (data) engineers who help you by doing transformations and joins. But if you are not, you have to swallow the pill. This means asking around, test a lot of data hypotheses and maybe even try to read the source code that gather the data.
11/12 You often don't need impeccable data to build something useful. Don't start throwing around phrases like "Garbage in, garbage out" as soon things are not 100% the way you desire. But try to deal with it. Princess behavior is getting you nowhere.
12/12 Most importantly, be nice to the people you work with. Don't go behind their backs an ridicule they don't have the answers your looking for. Not just because being a jerk is indecent, but also because being snarky (subconciously) lowers your own responsibilty and grit.
You can follow @edwin_thoen.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.