Positional encodings are essential for two reasons:

1. They guarantee Transformers/graphNNs to be universal approximators for functions invariant by index permutation. Most real-world graphs have natural symmetries, like the line graph for Transformers. https://twitter.com/francoisfleuret/status/1333727738696519682
These symmetries produce e.g. isomorphic nodes, which introduce ambiguities and decrease the ability of the network to disentangle node information. Unique PEs like cos/sin PEs in Transformers remove these ambiguities, and guarantee universal approximators.
2. Distance-sensitive PEs like cos/sin provide relevant distance info between nodes to identify future/past in NLP and node coordinates for graphs/manifolds.
For standard convolutions in e.g. CV, the notion of PEs is intrinsic because the nodes of the convolutional filter are ordered (we know where the top, bottom, right, left are). For Transformers/graphs, the order of the nodes is irrelevant, and thus require PEs.
You can follow @xbresson.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.