This question annoys ALL students (and professors) of ML, but they are afraid to ask. Thanks for raising it in this "no hand waving" forum. Take two causal diagrams:
X-->Y and X<--Y, and ask a neural network to decide which is more probable, after seeing 10 billion samples. 1/n https://twitter.com/yash_sharma25/status/1324561702222929920
The answer will be: No difference; each diagram scores the same fit as the other. Let's be more sophisticated: assign each diagram a prior and run a Bayesian analysis on the samples. Lo and Behold, the posteriors will equal to the priors no matter how we start. How come? 2/n
Isn't a neural network supposed to learn the truth given enough data? Ans. No! Learning only occurs when the learnable offends the data less than its competitors. Our two diagrams never offend any data, so nothing is learnable. Aha! But what if our data involves interventions? 3/
Now we begin to see some learning, and this is precisely the role of experimental data and randomized trials. The causal diagram is nothing but a parsimonious representation of how the environment responds to all possible interventions and their combinations. Learning 4/
the best such representation from a barrage of interventions and observations is an exercise studied under the rubric "causal discovery", doing so with neural nets is an ambitious task, considering the size of the search space, but is not undoable, especially if we leverage 5/
the tools of causal discovery. It is hard to find a needle in a hay stack, true, but it helps to know what a needle looks like, and how it differs from the hay around it. That is why causal diagrams should be part of ML education. More on this here: https://ucla.in/32YKcWy 
6/6
You can follow @yudapearl.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.