Thread by @abhi1thakur, If you are starting with machine learning / deep learning and get [...]

abhishek

abhi1thakur

If you are starting with machine learning / deep learning and get a new dataset to work on, either on kaggle or in real-world or just for fun. There are a few things you must always take care of to squeeze the most out of your model and make it awesome:

1/6

Look at the data carefully. Do EDA.

Look at the targets. See how they are distributed and what kind of problem this is.

Choose the right metric to evaluate your models
2/6

Split the data into folds. You can use this for cross validation or for hold out based validation

Build a first basic model. This is going to be your baseline.

Now try to improve on the baseline by adding new features
3/6

To add new features, go back to data. Look at the EDA. That's why its quite important

When you think you have reached a limit with feature engineering, try different models

Keep log of all the scores, features and models
4/6

When you think you have reached a limit with different models, try feature selection

Done with feature engineering and feature selection? You will realize that by following above steps, you have also chosen a few best models that work well with your data
5/6

Now its time to do hyperparameter optimization and squeeze the last few drops from your best models

Wrap everything in docker, so that it's reproducible.

Build a simple api endpoint or a fully-fledged web application to serve your model and brag about it ;) 6/6

You can follow @abhi1thakur.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: