(1/N)
Since KDD 2019 in Alaska, I have spent a lot of time trying to understand Graph Neural Networks (GNNs) and the graph mining / ML community’s infatuation with them.
This has resulted in two papers on "node prediction,", i.e., (semi-supervised) graph clustering...
Since KDD 2019 in Alaska, I have spent a lot of time trying to understand Graph Neural Networks (GNNs) and the graph mining / ML community’s infatuation with them.
This has resulted in two papers on "node prediction,", i.e., (semi-supervised) graph clustering...
(2/N)
Both papers say that we should use the ideas in label prop / spreading / diffusions along with or instead of fancy GNNs.
Paper 1 (just out) has the flippant title “Combining Label Propagation and Simple Models Out-performs Graph Neural Networks.”
https://arxiv.org/abs/2010.13993
Both papers say that we should use the ideas in label prop / spreading / diffusions along with or instead of fancy GNNs.
Paper 1 (just out) has the flippant title “Combining Label Propagation and Simple Models Out-performs Graph Neural Networks.”
https://arxiv.org/abs/2010.13993
(3/N)
Paper 2 (KDD 2020) is “Residual Correlation in Graph Neural Network Regression.”
https://www.cs.cornell.edu/~arb/papers/residual-correlation-KDD-2020.pdf
This paper theoretically connects label prop post-processing to correlated error models, namely generalized least squares.
Paper 2 (KDD 2020) is “Residual Correlation in Graph Neural Network Regression.”
https://www.cs.cornell.edu/~arb/papers/residual-correlation-KDD-2020.pdf
This paper theoretically connects label prop post-processing to correlated error models, namely generalized least squares.
(4/N)
What got me started on this line of research?
Well, one thing really bugged me when trying to learn about GNNs:
Why are the GNN ideas so far from label propagation and other diffusion ideas that work extremely well? The use of labels in GNNs is so implicit.
What got me started on this line of research?
Well, one thing really bugged me when trying to learn about GNNs:
Why are the GNN ideas so far from label propagation and other diffusion ideas that work extremely well? The use of labels in GNNs is so implicit.
(5/N)
I didn’t find a good answer to this... and Paper 1 shows empirically that you don’t need GNNs to cleverly learn to combine your neighborhood information. You just need smoothing (coming from label prop ideas). Paper 2 has some theory for why you should do this.
I didn’t find a good answer to this... and Paper 1 shows empirically that you don’t need GNNs to cleverly learn to combine your neighborhood information. You just need smoothing (coming from label prop ideas). Paper 2 has some theory for why you should do this.
(6/N) Another thing that bugged me:
GNNs don’t engineer features like deep CNNs (2 layers vs. 100) --- they need good input features. Why not use the many existing tools to get good graph features? E.g., see the last 15 years of KDD/WWW/WSDM + the network science community.
GNNs don’t engineer features like deep CNNs (2 layers vs. 100) --- they need good input features. Why not use the many existing tools to get good graph features? E.g., see the last 15 years of KDD/WWW/WSDM + the network science community.
(7/N) It turns out that just including a spectral embedding can help a lot (Paper 1).
For graph learning, augmenting features with existing methods is underrated.
For graph learning, augmenting features with existing methods is underrated.
(8/N) My guess is that GNNs spend a lot of effort to learn “positional” info that we already know how to get much more cheaply.
As the involved undergraduate researchers like to put it:
GNNs \\approx MLP + label prop.
As the involved undergraduate researchers like to put it:
GNNs \\approx MLP + label prop.
(9/N) Another thing that really bugs me:
GNNs take freaking forever to train. Seriously, way too long! And it’s so difficult to reproduce results!
https://github.com/dmlc/dgl/issues/2314#issuecomment-720146540
Paper 1 gives a high-accuracy method that is fast to train. No major hyperparameter tuning.
GNNs take freaking forever to train. Seriously, way too long! And it’s so difficult to reproduce results!
https://github.com/dmlc/dgl/issues/2314#issuecomment-720146540
Paper 1 gives a high-accuracy method that is fast to train. No major hyperparameter tuning.
(10/N) Another thing that really bugs me:
There is lots of fun research in graphs. The flooding of KDD/WWW/WSDM with GNN papers has made graph research at these conferences more narrow and a lot less fun.
If I wanted to spend my time tuning GNNs, I wouldn’t be a professor.
There is lots of fun research in graphs. The flooding of KDD/WWW/WSDM with GNN papers has made graph research at these conferences more narrow and a lot less fun.
If I wanted to spend my time tuning GNNs, I wouldn’t be a professor.
(11/11)
And finally, some things that are really great:
The three undergrads @cHHillee, @qhwang3, @abaesingh leading Paper 1 are AWESOME.
@junteng_jia leading Paper 2 is AWESOME.
Cornell has amazing students.
Graphs are fun.
And finally, some things that are really great:
The three undergrads @cHHillee, @qhwang3, @abaesingh leading Paper 1 are AWESOME.
@junteng_jia leading Paper 2 is AWESOME.
Cornell has amazing students.
Graphs are fun.
For a tweet thread with much less snark and veiled criticism, please see the thread from @cHHillee (one of the undergrads involved) here: https://twitter.com/cHHillee/status/1323323061370724352