Thread by @data36_com, Data science best practices, common mistakes and basic mindset questions... that I've [...]

Data science best practices, common mistakes and basic mindset questions... that I've collected in the last ~8 years -- in one big twitter thread:

#1 The ultimate goal of a data science project is the impact it creates.

Before a project ask:
What will be different when this project will be done? What change will it foster?

If the answer is "nothing" or "I don't know" then probably it's a data project no one needs.

#2 The output of a data science project is a better decision.
So setting a hypothesis before the project is important.

E.g.
What's the new vs. returning user % in my app?
- if the it's below 20%, I'll do X.
- if it's above 20%, I'll do Y.

(This is just the simplest example.)

#3 Coding is a tool.

To build a house, you have to know how to use a hammer. And to do DS, you have to know how to code.
But eventually, people won't care about the hammer. They'll care about the house.
Your code doesn’t have to be a state of art. What you build with it should.

#4 Skills are more important than "hard knowledge."

It's more important to understand how to "talk to a computer" than knowing every tiny detail of Python's syntax.

#5 The simpler the better.

Data scientists tend to overcomplicate things by using fancy statistical models and algorithms. But in most projects, simpler models perform better than complex ones.

#6 Every DS project should be started with data discovery.

First, simply scrolling through rows and columns and tables. Then basic segmentation + some distribution charts. It can take hours/days and will seem pointless. But it's an investment that pays off later in the project.

#7 Choosing the right project is more important than choosing the right model.

Most DS projects fail way before writing the first line of the code. This leads back to business impact (#1). You should work on the project that has the greatest potential impact. Most people don't.

#8 Great DS project:
> It has a potentially great positive (business, social, etc.) impact.

Bad DS project:
> Your boss is curious about something.
> You've heard smth good on a conference.
> You want to implement an exciting model.
...
Dare to say NO and turn down bad projects.

#9 Ask the right questions!

For that, you have to know your data in and out (see #6 -- data discovery) and mind decisions you'll help to make (see #2 --- decision is the output).

#10 You'll need a single source of truth.

If you use more than one tool for data collection, you'll see discrepancy between them. That might paralyze you.
So pick ONE data source you really TRUST and go with that.
(Don't forget to review and maintain it from time to time.)

#11 "All models are wrong, but some are useful" (G. Box)

You'll never produce a 100% accurate prediction/model/analysis. So don't expect that. In fact, expect the opposite.
In data science, there's always a chance that you and your numbers will be wrong. That's part of the game.

#12 The Pareto principle applies for data science, too.

Even the more extreme version of it:
~90-95% of the things you'll find in your analyses will be useless or evident...
But that tiny ~5% (or less) might be the real game changers. So keep digging!

#13 Data can be wrong.

In so many ways:
- human errors (e.g. biases)
- statistical errors (if you have a 99% confidence level, you'll still have 1 out of 100 projects where you are wrong by chance... despite your 99% confidence level)
- etc.

#14 Never get conclusions from one analysis!

The data scientist’s job is to compress millions of data points into one chart. It'll always be an abstraction. Of course, it will never show you the full picture.

But the more analyses you create, the closer you'll get to the truth.

#15 Don’t zoom in too much.

Most importantly, don’t draw conclusions from small timeframes. You won't see the patterns on a three day scale... But you'll see them on a three months one.

#16 Stupid people make stupid decisions.

Despite all your efforts (education, workshops, 1-on-1s, etc.) some people just won't understand your data projects and they'll read into them whatever they want.

Make sure decisions won't be made by these people.

#17 Data science is a long-term game.

You can't change the world in one day.
Real impact will be seen after months or even years. Along the way, there'll be lot of small and big wins but also failures and dead-ends. What matters is the cumulative result of these.

#18 A data scientist is a pioneer.

You'll have to question the status quo and sometimes move people out of their comfort zones. Data science can inspire groundbreaking ideas.

For someone new things can be scary. But for data scientists: if it’s scary, it’s good.

#19 Never stop learning new things.

There's no such a thing as a 100% data informed business. There are always new things to discover, understand and improve.
And it's true on a personal level, too, you can always learn new things: new languages, tools, approaches, methods, etc.

This is it for now, I might add more to this.
Feel free to add yours.

Latest Threads Unrolled: