Apropos nothing, here is how I would structure a data science class:
A
A

0) Ethics
In this class, you might learn something that could hurt someone in unseen and unknown ways. Its important we understand what kinds of things we should ask ourselves before building a model
In this class, you might learn something that could hurt someone in unseen and unknown ways. Its important we understand what kinds of things we should ask ourselves before building a model
1) Point predictions
Assuming the class is about prediction and a little about inference, it makes sense to start with the mean and median. They are the simplest predictions we can make and are extended by regression.
Topics: CLT, sampling variance, confidence intervals
Assuming the class is about prediction and a little about inference, it makes sense to start with the mean and median. They are the simplest predictions we can make and are extended by regression.
Topics: CLT, sampling variance, confidence intervals
2) Regression
We learned the mean minimizes the variance, and the median the absolute deviation. Let's extend that to a regression case now.
Topics: MLE, optimization, loss functions
We learned the mean minimizes the variance, and the median the absolute deviation. Let's extend that to a regression case now.
Topics: MLE, optimization, loss functions
3) Model validation
So we've built a model, but how are we going to tell if it is any good? Here, we would talk about the difference between training, testing, and val sets.
I would spend lots of time on this. ...
So we've built a model, but how are we going to tell if it is any good? Here, we would talk about the difference between training, testing, and val sets.
I would spend lots of time on this. ...
Its important to let students know that whatever choices they make about the model after seeing data are part of the modelling processes. Drop correlated features? You need to validate that.
Topics: Cross validation, the bootstrap, estimation of training error optimism...
Topics: Cross validation, the bootstrap, estimation of training error optimism...
...AIC, other loss functions, simulation. Different ways of measuring how much impact a variable had when added to the model. And that is just off the top of my head.
4) More regression
Few classification problems are actually classification problems. Here is where we would introduce logistic regression and when classification is and is not a good idea:
Topics: Proper scoring rules, sens/spec and when they make sense.
Few classification problems are actually classification problems. Here is where we would introduce logistic regression and when classification is and is not a good idea:
Topics: Proper scoring rules, sens/spec and when they make sense.
I don't think that would be one topic per week. The bootstrap in particular has a few variants, and if I wanted to spend time talking about bootstrap confidence intervals alone that could be a whole lecture.
So I would probably leave out neural nets, svms, etc. Maybe they would come right at the end, but once you cut your teeth on linear models then the other algorithms are just drop in replacements.
Except if you want to do inference of course.
Except if you want to do inference of course.
Anyway, my point is that I think if we let linear models be the workhorses, and introduce non-linearity via splines or similar methods, we can spend more time on the less sexy but more important stuff.