Thread by @austinbenson, (1/N) Since KDD 2019 in Alaska, I have spent a lot of [...]

(1/N)
Since KDD 2019 in Alaska, I have spent a lot of time trying to understand Graph Neural Networks (GNNs) and the graph mining / ML community’s infatuation with them.

This has resulted in two papers on "node prediction,", i.e., (semi-supervised) graph clustering...

(2/N)
Both papers say that we should use the ideas in label prop / spreading / diffusions along with or instead of fancy GNNs.

Paper 1 (just out) has the flippant title “Combining Label Propagation and Simple Models Out-performs Graph Neural Networks.”
https://arxiv.org/abs/2010.13993

(3/N)
Paper 2 (KDD 2020) is “Residual Correlation in Graph Neural Network Regression.”
https://www.cs.cornell.edu/~arb/papers/residual-correlation-KDD-2020.pdf

This paper theoretically connects label prop post-processing to correlated error models, namely generalized least squares.

(4/N)
What got me started on this line of research?

Well, one thing really bugged me when trying to learn about GNNs:
Why are the GNN ideas so far from label propagation and other diffusion ideas that work extremely well? The use of labels in GNNs is so implicit.

(5/N)
I didn’t find a good answer to this... and Paper 1 shows empirically that you don’t need GNNs to cleverly learn to combine your neighborhood information. You just need smoothing (coming from label prop ideas). Paper 2 has some theory for why you should do this.

(6/N) Another thing that bugged me:
GNNs don’t engineer features like deep CNNs (2 layers vs. 100) --- they need good input features. Why not use the many existing tools to get good graph features? E.g., see the last 15 years of KDD/WWW/WSDM + the network science community.

(7/N) It turns out that just including a spectral embedding can help a lot (Paper 1).

For graph learning, augmenting features with existing methods is underrated.

(8/N) My guess is that GNNs spend a lot of effort to learn “positional” info that we already know how to get much more cheaply.

As the involved undergraduate researchers like to put it:
GNNs \\approx MLP + label prop.

(9/N) Another thing that really bugs me:
GNNs take freaking forever to train. Seriously, way too long! And it’s so difficult to reproduce results!
https://github.com/dmlc/dgl/issues/2314#issuecomment-720146540

Paper 1 gives a high-accuracy method that is fast to train. No major hyperparameter tuning.

Command to reproduce OGB arxiv results? · Issue #2314 · dmlc/dgl

Hi I just wanted to verify that the command to reproduce the OGB results from (https://github.com/Espylapiza/dgl/tree/master/examples/pytorch/ogb/ogbn-arxiv) is correct. The command is python3 gat....

https://github.com/dmlc/dgl/issues/2314#issuecomment-720146540

(10/N) Another thing that really bugs me:
There is lots of fun research in graphs. The flooding of KDD/WWW/WSDM with GNN papers has made graph research at these conferences more narrow and a lot less fun.

If I wanted to spend my time tuning GNNs, I wouldn’t be a professor.

(11/11)
And finally, some things that are really great:

The three undergrads @cHHillee, @qhwang3, @abaesingh leading Paper 1 are AWESOME.

@junteng_jia leading Paper 2 is AWESOME.

Cornell has amazing students.

Graphs are fun.

For a tweet thread with much less snark and veiled criticism, please see the thread from @cHHillee (one of the undergrads involved) here: https://twitter.com/cHHillee/status/1323323061370724352

https://twitter.com/cHHillee/status/1323323061370724352

Latest Threads Unrolled: