Thread by @RogerGrosse, This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin [...]

This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena.
http://papers.nips.cc/paper/9483-a-fourier-perspective-on-model-robustness-in-computer-vision

In the Fourier domain, images have a well-studied 1/|f|^a power spectrum, such that low frequencies have much higher power than high frequencies.

This describes the 2nd-order statistics of images, so Fourier analysis is the first step before looking at higher-order statistics.

If images are corrupted with (stochastic) additive noise, you can estimate the original signal using a Weiner filter, which rescales individual frequencies according to their signal-to-noise ratios.

As an intuition pump, consider the simpler problem of denoising images corrupted by white Gaussian noise.

Images are concentrated in low frequencies, while white noise has a flat power spectrum. Therefore, the Weiner filter is a low-pass filter, i.e. a blur kernel.

Similarly, in the adversarial setting, the attacker is subject to a norm constraint (usually L1 or L2) which prevents them from putting lots of power in the low frequencies.

Therefore, the attacker can corrupt high frequencies more easily than low frequencies.

From the defender’s perspective, this means low frequencies are more reliable than high frequencies. So adversarial defenses (e.g. PGD adversarial training) tend to figure out that it’s a good idea to ignore the high frequencies.

This explains why robust models underperform on clean accuracy: they ignore high frequencies, which (this paper shows) contain useful information that helps predictions even on the test set.

I think their comment to the Distill challenge nailed it: https://distill.pub/2019/advex-bugs-discussion/response-1/

A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example...

The main hypothesis in Ilyas et al. (2019) happens to be a special case of a more general principle that is commonly accepted in the robustness to distributional shift literature

https://distill.pub/2019/advex-bugs-discussion/response-1/

They have an appealing theory for why PGD adversarial training gives good visualizations: the image gradients are (sort of) low pass filtered, making their frequency spectra better match those of images. This is similar to the blurring trick @ch402 used for conv net feature viz.

They’re not claiming low pass filtering explains *all* of adversarial robustness. But it seems to explain quite a lot, enough so that future papers on robustness for images probably ought to separate out Fourier effects from whatever more interesting things might be happening.

And they argue for considering a wider variety of corruptions: while L1 and L2 norm ball constraints might seem natural/innocuous, in fact they limit the frequency spectra of attacks in ways that aren’t necessarily typical of other image corruptions.

Latest Threads Unrolled: