11 Krylov Subspace Methods

Lecture 11

Published

March 2, 2026

Note

I encourage you to play around with the juptyer notebook for this lecture - you can copy the code with the this notebook button on the side of this page.

Warning

These notes are mainly a record of what we discussed and are not a substitute for attending the lectures and reading books! If anything is unclear/wrong, let me know and I will update the notes.

Tip

This lecture is mostly based on Burden, Faires, and Burden (2015) sections 7.6.

using Plots, LaTeXStrings, LinearAlgebra, IterativeSolvers

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

We saw that, if the condition number is large, this can lead to zig-zagging (which is a big problem in higher dimensions) and leads to slow convergence:

A = [50 0 ; 0 .01]; b = [10; 3]; x0 = [1;20]; ξ = A\b
gif( gradientDescent(A,b,x0; ylims=[0,ξ[2]+50]), fps=1 )

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

We then considered Conjugate gradient method: fix x^{(0)} and thus v^{(0)} := r^{(0)} = b - Ax^{(0)} and iterate

\begin{align} \alpha^{(k)} &= \frac{(v^{(k)}, r^{(k)})}{(v^{(k)}, v^{(k)})_A} \nonumber\\ % x^{(k+1)} &= x^{(k)} + \alpha^{(k)} v^{(k)} \nonumber\\ % r^{(k+1)} &= b - Ax^{(k+1)} \nonumber\\ % \beta^{(k+1)} &= - \frac{(v^{(k)}, r^{(k+1)})_A}{(v^{(k)}, v^{(k)})_A} \nonumber\\ % v^{(k+1)} &= r^{(k+1)} + \beta^{(k+1)} v^{(k)} \end{align}

which we implement here:

function CG(A,b,x0; Niter=50, tol=1e-3, xlims=[-25,25], ylims=[-25,25])

    g(x,y) = dot([x;y], A*[x;y])/2 - dot([x;y], b)

    x_range = range(xlims[1], xlims[2], length=100)
    y_range = range(ylims[1], ylims[2], length=100)

    P = contour(x_range, y_range, g, 
        fill=true, 
        levels=50, 
        title="Conjugate Gradient",
        xlabel="x", ylabel="y", legend=false,
        colorbar=true)

    r = b - A*x0
    v = b - A*x0
    anim = @animate for i ∈ 0:Niter

        if (i==0)
            scatter!(P, [x0[1]], [x0[2]], markersize=5, color=:white)
        else 
        
            if (norm(r)<tol)
                break;
            end

            # step size 
            α = dot(v,r)/dot(v,A*v)

            # new point
            x1 = x0 + α*v
            plot!(P, [x0[1],x1[1]], [x0[2],x1[2]], color=:white, lw=2)
            scatter!(P, [x1[1]], [x1[2]], markersize=5, color=:white)

            # residual
            r = b - A*x1
            
            # new search direction
            β = - dot(v,A*r)/dot(v,A*v)
            v = r + β*v
            
            x0 = x1
        end
    end

    return anim
end

CG (generic function with 1 method)

A = [50 0 ; 0 .01]; b = [10; 3]; x0 = [1;20]; ξ= A\b
display( gif( gradientDescent(A,b,x0; ylims=[0,ξ[2]+50]), fps=1 ) )
gif( CG(A,b,x0; ylims=[0,ξ[2]+50]), fps=1 )

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

11.1 Convergence

We saw last week that the rate of convergence depends on the conditioning of the matrix:

Theorem 11.1 Suppose A is SPD, \{x^{(k)}\} are the iterates generated by steepest descent, and x be the exact solution to Ax = b. Then,

\begin{align} \| x^{(k)} - x \|_A \leq \left( \frac{\kappa_2(A)-1}{\kappa_2(A)+1} \right)^{k} \| x^{(0)} - x \|_A \end{align}

where \| y \|_A := \sqrt{(y,y)_A} = \sqrt{y^T A y} and \kappa_2(A) = \|A^{-1}\|_2 \|A\|_2 is the condition number of A (with respect to the 2-norm).

Theorem 11.2 Suppose A is SPD, \{x^{(k)}\} are the iterates generated by conjugate gradient, and x be the exact solution to Ax = b. Then,

\begin{align} \| x^{(k)} - x \|_A \leq 2 \left( \frac{\sqrt{\kappa_2(A)} -1}{\sqrt{\kappa_2(A)} + 1} \right)^{k} \| x^{(0)} - x \|_A \end{align}

where \| y \|_A := \sqrt{(y,y)_A} = \sqrt{y^T A y} and \kappa_2(A) = \|A^{-1}\|_2 \|A\|_2 is the condition number of A (with respect to the 2-norm).

Proof: An option for a presentation. For CG, you can use polynomial approximation by writing \| x^{(k)} - x \|_A = \| p(A) (x^{(0)} - x)\|_A for some polynomial p of degree less than or equal to k with p(0) = 1. You can conclude by applying results about Chebyshev polynomials….

Notice that the bound for CG is better (and the difference between the methods becomes more stark as we increase the condition number):

P = []
xx = range(1, 10, 2)
for κ ∈ [1.01, 2, 10, 100] 
    ρSD = (κ-1)/(κ+1)
    ρCG = (sqrt(κ)-1)/(sqrt(κ)+1)
    p = plot(xx, [ρSD.^xx ρCG.^xx], 
        label=["ρSD" "ρCG"], xlabel = "k", ylabel = "error", 
        title="κ = $κ", yaxis=:log)
    push!( P, p )
end

plot(P...)

(Note: we are only plotting the upper bounds and have not actually computed the SD/CG iterations in these plots)

11.2 Krylov subspace methods

Example 11.1 Show that, for CG, we have r^{(k)}, v^{(k)} \in \mathcal K_{k+1}(A,r^{(0)}) where

\begin{align} \mathcal K_{k}(A,y) := \mathrm{span}\left\{ y, Ay, \cdots, A^{k-1} y \right\}. \end{align}

Hint: v^{(0)} := r^{(0)}. Proceed by induction and use r^{(k+1)} = r^{(k)} - \alpha^{(k)} Av^{(k)} and v^{(k+1)} = r^{(k+1)} + \beta^{(k+1)} v^{(k)}.

Notice that, for CG, r^{k+1} is orthogonal to every vector in \mathcal K_{k}(A, r^{(0)}) (if this is the case, we simply say r^{k+1} is orthogonal to \mathcal K_{k}(A, r^{(0)})).

Definition 11.1 \mathcal K_k(A,r^{(0)}) is known as a Krylov subspace.

A Krylov subspace method is an iterative method where r^{(k)} \in \mathcal K_{k+1}(A, r^{0}) and orthogonal to \mathcal K_{k}(A, r^{0}).

Theorem 11.3 In exact arithmetic (i.e. not accounting for rounding errors), Krylov subspace methods (including CG) converge in at most N steps where A \in \mathbb R^{N\times N}.

Let p_{k-1} be a polynomial of degree k-1. Then, Krylov methods are of the form

\begin{align} x^{(k)} = x^{(0)} + p_{k-1}(A) r^{(0)} \end{align}

CG - choose p_{k-1} to give optimal search directions,
GMRES (general minimal residual method) - choose p_{k-1} to minimise the residual

\begin{align} x^{(k)} = \mathrm{arg min}_{x \in \mathcal K_k(A, r^{(0)})} \| b - Ax \|_{2} \end{align}

(this uses Arnoldi or Lanczos iteration - possible presentation/poster…)

TO-DO

Let me know any questions you have for Wednesday (problem class),
Revise for midterm!
(Sign-up to office hours if you would like)

Chapter 1 reading:

Burden, Faires, and Burden (2015) sections 5.1 - 5.6,
Driscoll and Braun (2018) zero stability

Chapter 2 reading:

Recap matrices: sections 7.1 - 7.2 of Burden, Faires, and Burden (2015)
Jacobi & Gauss-Seidel: section 7.3 Burden, Faires, and Burden (2015)
CG: section 7.6 Burden, Faires, and Burden (2015)

Burden, Richard L., Douglas J. Faires, and Annette M. Burden. 2015. Numerical Analysis. 10th ed. CENGAGE Learning.

Driscoll, Tobin A, and Richard J Braun. 2018. Fundamentals of Numerical Computation. https://fncbook.com/.