This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena.
http://papers.nips.cc/paper/9483-a-fourier-perspective-on-model-robustness-in-computer-vision
In the Fourier domain, images have a well-studied 1/|f|^a power spectrum, such that low frequencies have much higher power than high frequencies.

This describes the 2nd-order statistics of images, so Fourier analysis is the first step before looking at higher-order statistics.
If images are corrupted with (stochastic) additive noise, you can estimate the original signal using a Weiner filter, which rescales individual frequencies according to their signal-to-noise ratios.
As an intuition pump, consider the simpler problem of denoising images corrupted by white Gaussian noise.

Images are concentrated in low frequencies, while white noise has a flat power spectrum. Therefore, the Weiner filter is a low-pass filter, i.e. a blur kernel.
Similarly, in the adversarial setting, the attacker is subject to a norm constraint (usually L1 or L2) which prevents them from putting lots of power in the low frequencies.

Therefore, the attacker can corrupt high frequencies more easily than low frequencies.
From the defender’s perspective, this means low frequencies are more reliable than high frequencies. So adversarial defenses (e.g. PGD adversarial training) tend to figure out that it’s a good idea to ignore the high frequencies.
They have an appealing theory for why PGD adversarial training gives good visualizations: the image gradients are (sort of) low pass filtered, making their frequency spectra better match those of images. This is similar to the blurring trick @ch402 used for conv net feature viz.
They’re not claiming low pass filtering explains *all* of adversarial robustness. But it seems to explain quite a lot, enough so that future papers on robustness for images probably ought to separate out Fourier effects from whatever more interesting things might be happening.
And they argue for considering a wider variety of corruptions: while L1 and L2 norm ball constraints might seem natural/innocuous, in fact they limit the frequency spectra of attacks in ways that aren’t necessarily typical of other image corruptions.
You can follow @RogerGrosse.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.