Thread by @xbresson, Positional encodings are essential for two reasons:1. They guarantee Transformers/graphNNs to be [...]

Xavier Bresson

xbresson

Positional encodings are essential for two reasons:

1. They guarantee Transformers/graphNNs to be universal approximators for functions invariant by index permutation. Most real-world graphs have natural symmetries, like the line graph for Transformers. https://twitter.com/francoisfleuret/status/1333727738696519682

https://twitter.com/francoisfleuret/status/1333727738696519682

These symmetries produce e.g. isomorphic nodes, which introduce ambiguities and decrease the ability of the network to disentangle node information. Unique PEs like cos/sin PEs in Transformers remove these ambiguities, and guarantee universal approximators.

See https://arxiv.org/pdf/1903.02541.pdf, https://arxiv.org/pdf/1907.03199.pdf, https://arxiv.org/pdf/2006.07846.pdf

2. Distance-sensitive PEs like cos/sin provide relevant distance info between nodes to identify future/past in NLP and node coordinates for graphs/manifolds.

For standard convolutions in e.g. CV, the notion of PEs is intrinsic because the nodes of the convolutional filter are ordered (we know where the top, bottom, right, left are). For Transformers/graphs, the order of the nodes is irrelevant, and thus require PEs.

You can follow @xbresson.

Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: