My project from the @LosAlamosNatLab Summer School is finally out! We are happy to announce the...

Absence of Barren Plateau in Quantum Convolutional Neural Networks đŸ”„

with @MvsCerezo, @samson_wang, T. Volkoff, @sornborg and @ColesQuantum.

https://scirate.com/arxiv/2011.02966

Thread 👇
A quantum convolutional neural network (QCNN) is a type of variational circuit proposed in https://arxiv.org/abs/1810.03787  and designed for classification and regression on quantum data, e.g. determining the magnetic phase of an input state.
2/N
An important general problem with QNNs is the barren plateau (BP) phenomenon, the quantum equivalent of vanishing gradients: when a QNN is randomly initialized, its gradient tends to vanish exponentially with the # of qubits...
3/N
BPs have been shown to occur for many architectures, including the Hardware Efficient Ansatz ( https://arxiv.org/abs/2001.00550 ), Perceptron-based QNNs ( https://arxiv.org/abs/2005.12458 ), Quantum Boltzmann machines ( https://arxiv.org/abs/2010.15968 ), etc.
So, what makes the QCNN different?
4/N
At each layer of the QCNN, half of the qubits are discarded (pooling layer), limiting the number of layers to log(n). And it was shown in ( https://arxiv.org/abs/2001.00550 ) that for a local cost function (as in the QCNN), the Hardware Efficient Ansatz with log(n) layers has no BP
5/N
Can we directly generalize this result to QCNNs? It's not totally obvious since QCNNs have several key differences with Hardware-Efficient Ansatzes. But using a similar proof technique along w/ some novel methods, we managed to show that the QCNN landscape is not so barren!
6/N
More precisely, we showed that the variance of the QCNN gradient is lower bounded by a term vanishing only polynomially in the number of qubits. Therefore, less than a polynomial # of measurements is needed to estimate the QCNN gradient => no BP đŸ„ł
7/N
What's the main idea behind our proof? To compute the variance of the grad, we have to integrate the grad^2 when each block of the QCNN follows the Haar measure! We end up with a tensor way too big to integrate analytically! (as shown here and compared to a R&M character)
8/N
To solve this issue, we introduced a novel method, called GRIM (Graph Recursion Integration Method), where we group sets of repeating blocks in modules, integrate them sequentially, and construct a graph that keeps track of each integral (details in the paper)
9/N
Finally, we verified our results numerically, both when the blocks share the same parameters in each layer (as proposed in the original architecture) and when they are completely uncorrelated
10/N
Spending the Summer (virtually) with all the @LosAlamosNatLab folks has been an incredible experience and I am super thankful to have been given this opportunity! It was a very collaborative project, so shout out to all the team for having made it possible! â˜ș
11/N, N=11
You can follow @artix41.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.