the work of A. Taylor and coauthors on using SDPs to certify the performance of different optimisation schemes really cannot be recommended highly enough, see e.g. , https://www.di.ens.fr/~ataylor/share/Slides_cambridge2020.pdf (pdf).
https://twitter.com/sam_power_825/status/1353999775520194561
[ 1 / 9 ]
https://twitter.com/sam_power_825/status/1353999775520194561
[ 1 / 9 ]
an exercise which i enjoyed was to go in the other direction: given a template for proving convergence of an algorithm, derive the standard conditions which allow for the proof to go through.
e.g. consider minimising a function f by following its gradient flow
[ 2 / 9 ]
e.g. consider minimising a function f by following its gradient flow
[ 2 / 9 ]
so that x follows the dynamics
ẋ = − ∇ f (x),
and we want to show that x → x*, where x* is a minimiser of f.
to this end, you can consider positing some potential function V(x) which decreases exponentially fast along the trajectory.
[ 3 / 9 ]
ẋ = − ∇ f (x),
and we want to show that x → x*, where x* is a minimiser of f.
to this end, you can consider positing some potential function V(x) which decreases exponentially fast along the trajectory.
[ 3 / 9 ]
here, we will show exponential convergence through Grönwall's lemma: namely, that if
y(t) ≥ 0, and
dy/dt ≤ - cy
then y(t) ≤ y(0) exp(-ct).
the game is then to find a potential V and conditions on f which ensure that this holds for y(t) = V(x(t)).
[ 4 / 9 ]
y(t) ≥ 0, and
dy/dt ≤ - cy
then y(t) ≤ y(0) exp(-ct).
the game is then to find a potential V and conditions on f which ensure that this holds for y(t) = V(x(t)).
[ 4 / 9 ]
a first attempt: try V(x) = f(x) - f*, where f* is the minimum value of f.
then, with y = V(x(t)), we get
dy/dt = -〈∇ f (x), ∇ f (x) 〉,
and so we need that
〈∇ f (x), ∇ f (x)〉≥ c ( f(x) - f* ),
which is known as the Polyak-Łojasiewicz condition.
[ 5 / 9 ]
then, with y = V(x(t)), we get
dy/dt = -〈∇ f (x), ∇ f (x) 〉,
and so we need that
〈∇ f (x), ∇ f (x)〉≥ c ( f(x) - f* ),
which is known as the Polyak-Łojasiewicz condition.
[ 5 / 9 ]
another go: try V(x) = |x - x*|². we then get
dy/dt = -2〈∇ f (x), x - x*〉,
and so we need that
〈∇ f (x), x - x* 〉≥ 2c |x - x*|²,
which is a bit like "strong convexity at the minimiser".
this may have another proper name (?).
[ 6 / 9 ]
dy/dt = -2〈∇ f (x), x - x*〉,
and so we need that
〈∇ f (x), x - x* 〉≥ 2c |x - x*|²,
which is a bit like "strong convexity at the minimiser".
this may have another proper name (?).
[ 6 / 9 ]
a third attempt: consider running a pair of trajectories (x₁, x₂), started from different points.
instead of showing that they converge to a specific point, we can instead try to show that they move closer to one another.
[ 7 / 9 ]
instead of showing that they converge to a specific point, we can instead try to show that they move closer to one another.
[ 7 / 9 ]
as such, define V(x₁, x₂) = |x₁ - x₂|².
then, with y = V(x₁ (t), x₂(t)), we get
dy/dt = -〈∇ f (x₁) - ∇ f (x₂), x₁ - x₂ 〉,
which suggests the condition
〈∇ f (x₁) - ∇ f (x₂), x₁ - x₂ 〉≥ 2c |x₁ - x₂|²,
a standard form for strong convexity.
[ 8 / 9 ]
then, with y = V(x₁ (t), x₂(t)), we get
dy/dt = -〈∇ f (x₁) - ∇ f (x₂), x₁ - x₂ 〉,
which suggests the condition
〈∇ f (x₁) - ∇ f (x₂), x₁ - x₂ 〉≥ 2c |x₁ - x₂|²,
a standard form for strong convexity.
[ 8 / 9 ]
of course, these conditions are meaningful beyond analysis of algorithms. nevertheless, it can be useful for intuition to identify when a condition has this "pre-Grönwall" character. there are other nice examples elsewhere, e.g. Poincaré and log-Sobolev inequalities.
[ 9 / 9 ]
[ 9 / 9 ]
i realise now that in different parts of this thread, i may have accidentally included some factors of 2 here and there which don't need to be there. they should be easy enough to spot, but shouldn't get in the way of the main message. my apologies in any case.