using Plots, LaTeXStrings, LinearAlgebra, IterativeSolvers11 Krylov Subspace Methods
Lecture 11
I encourage you to play around with the juptyer notebook for this lecture - you can copy the code with the this notebook button on the side of this page.
These notes are mainly a record of what we discussed and are not a substitute for attending the lectures and reading books! If anything is unclear/wrong, let me know and I will update the notes.
This lecture is mostly based on Burden, Faires, and Burden (2015) sections 7.6.
[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
We saw that, if the condition number is large, this can lead to zig-zagging (which is a big problem in higher dimensions) and leads to slow convergence:
A = [50 0 ; 0 .01]; b = [10; 3]; x0 = [1;20]; ξ = A\b
gif( gradientDescent(A,b,x0; ylims=[0,ξ[2]+50]), fps=1 )[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
We then considered Conjugate gradient method: fix x^{(0)} and thus v^{(0)} := r^{(0)} = b - Ax^{(0)} and iterate
\begin{align} \alpha^{(k)} &= \frac{(v^{(k)}, r^{(k)})}{(v^{(k)}, v^{(k)})_A} \nonumber\\ % x^{(k+1)} &= x^{(k)} + \alpha^{(k)} v^{(k)} \nonumber\\ % r^{(k+1)} &= b - Ax^{(k+1)} \nonumber\\ % \beta^{(k+1)} &= - \frac{(v^{(k)}, r^{(k+1)})_A}{(v^{(k)}, v^{(k)})_A} \nonumber\\ % v^{(k+1)} &= r^{(k+1)} + \beta^{(k+1)} v^{(k)} \end{align}
which we implement here:
function CG(A,b,x0; Niter=50, tol=1e-3, xlims=[-25,25], ylims=[-25,25])
g(x,y) = dot([x;y], A*[x;y])/2 - dot([x;y], b)
x_range = range(xlims[1], xlims[2], length=100)
y_range = range(ylims[1], ylims[2], length=100)
P = contour(x_range, y_range, g,
fill=true,
levels=50,
title="Conjugate Gradient",
xlabel="x", ylabel="y", legend=false,
colorbar=true)
r = b - A*x0
v = b - A*x0
anim = @animate for i ∈ 0:Niter
if (i==0)
scatter!(P, [x0[1]], [x0[2]], markersize=5, color=:white)
else
if (norm(r)<tol)
break;
end
# step size
α = dot(v,r)/dot(v,A*v)
# new point
x1 = x0 + α*v
plot!(P, [x0[1],x1[1]], [x0[2],x1[2]], color=:white, lw=2)
scatter!(P, [x1[1]], [x1[2]], markersize=5, color=:white)
# residual
r = b - A*x1
# new search direction
β = - dot(v,A*r)/dot(v,A*v)
v = r + β*v
x0 = x1
end
end
return anim
endCG (generic function with 1 method)
A = [50 0 ; 0 .01]; b = [10; 3]; x0 = [1;20]; ξ= A\b
display( gif( gradientDescent(A,b,x0; ylims=[0,ξ[2]+50]), fps=1 ) )
gif( CG(A,b,x0; ylims=[0,ξ[2]+50]), fps=1 )[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
11.1 Convergence
We saw last week that the rate of convergence depends on the conditioning of the matrix:
Theorem 11.1 Suppose A is SPD, \{x^{(k)}\} are the iterates generated by steepest descent, and x be the exact solution to Ax = b. Then,
\begin{align} \| x^{(k)} - x \|_A \leq \left( \frac{\kappa_2(A)-1}{\kappa_2(A)+1} \right)^{k} \| x^{(0)} - x \|_A \end{align}
where \| y \|_A := \sqrt{(y,y)_A} = \sqrt{y^T A y} and \kappa_2(A) = \|A^{-1}\|_2 \|A\|_2 is the condition number of A (with respect to the 2-norm).
Theorem 11.2 Suppose A is SPD, \{x^{(k)}\} are the iterates generated by conjugate gradient, and x be the exact solution to Ax = b. Then,
\begin{align} \| x^{(k)} - x \|_A \leq 2 \left( \frac{\sqrt{\kappa_2(A)} -1}{\sqrt{\kappa_2(A)} + 1} \right)^{k} \| x^{(0)} - x \|_A \end{align}
where \| y \|_A := \sqrt{(y,y)_A} = \sqrt{y^T A y} and \kappa_2(A) = \|A^{-1}\|_2 \|A\|_2 is the condition number of A (with respect to the 2-norm).
Proof: An option for a presentation. For CG, you can use polynomial approximation by writing \| x^{(k)} - x \|_A = \| p(A) (x^{(0)} - x)\|_A for some polynomial p of degree less than or equal to k with p(0) = 1. You can conclude by applying results about Chebyshev polynomials….
Notice that the bound for CG is better (and the difference between the methods becomes more stark as we increase the condition number):
P = []
xx = range(1, 10, 2)
for κ ∈ [1.01, 2, 10, 100]
ρSD = (κ-1)/(κ+1)
ρCG = (sqrt(κ)-1)/(sqrt(κ)+1)
p = plot(xx, [ρSD.^xx ρCG.^xx],
label=["ρSD" "ρCG"], xlabel = "k", ylabel = "error",
title="κ = $κ", yaxis=:log)
push!( P, p )
end
plot(P...)(Note: we are only plotting the upper bounds and have not actually computed the SD/CG iterations in these plots)
11.2 Krylov subspace methods
Example 11.1 Show that, for CG, we have r^{(k)}, v^{(k)} \in \mathcal K_{k+1}(A,r^{(0)}) where
\begin{align} \mathcal K_{k}(A,y) := \mathrm{span}\left\{ y, Ay, \cdots, A^{k-1} y \right\}. \end{align}
Hint: v^{(0)} := r^{(0)}. Proceed by induction and use r^{(k+1)} = r^{(k)} - \alpha^{(k)} Av^{(k)} and v^{(k+1)} = r^{(k+1)} + \beta^{(k+1)} v^{(k)}.
Notice that, for CG, r^{k+1} is orthogonal to every vector in \mathcal K_{k}(A, r^{(0)}) (if this is the case, we simply say r^{k+1} is orthogonal to \mathcal K_{k}(A, r^{(0)})).
Definition 11.1 \mathcal K_k(A,r^{(0)}) is known as a Krylov subspace.
A Krylov subspace method is an iterative method where r^{(k)} \in \mathcal K_{k+1}(A, r^{0}) and orthogonal to \mathcal K_{k}(A, r^{0}).
Theorem 11.3 In exact arithmetic (i.e. not accounting for rounding errors), Krylov subspace methods (including CG) converge in at most N steps where A \in \mathbb R^{N\times N}.
Let p_{k-1} be a polynomial of degree k-1. Then, Krylov methods are of the form
\begin{align} x^{(k)} = x^{(0)} + p_{k-1}(A) r^{(0)} \end{align}
- CG - choose p_{k-1} to give optimal search directions,
- GMRES (general minimal residual method) - choose p_{k-1} to minimise the residual
\begin{align} x^{(k)} = \mathrm{arg min}_{x \in \mathcal K_k(A, r^{(0)})} \| b - Ax \|_{2} \end{align}
(this uses Arnoldi or Lanczos iteration - possible presentation/poster…)
- Let me know any questions you have for Wednesday (problem class),
- Revise for midterm!
- (Sign-up to office hours if you would like)
Chapter 1 reading:
- Burden, Faires, and Burden (2015) sections 5.1 - 5.6,
- Driscoll and Braun (2018) zero stability
Chapter 2 reading: