23 P2: Chapter 2

Problem Class 2: Iterative Methods in Matrix Analysis

Published

March 3, 2026

Abstract

Exercises based on content from Lectures 8-11

Make sure you are happy with the examples/exercises from lectures 8-11,

Exercise 23.1 Suppose A is symmetric. Show that A is positive definite if and only if A has strictly positive eigenvalues.

Hint: If A is symmetric, there exists orthogonal P (i.e. P^{-1} = P^T) of eigenvectors and a diagonal matrix of corresponding eigenvalues \Lambda such that A = P \Lambda P^T.

Exercise 23.2 Suppose A is symmetric and positive definite (SPD). Show that \kappa(A) = \frac{\lambda_{\rm max}}{\lambda_{\rm min}} where \lambda_{\rm min}, \lambda_{\rm max} are the minimum and maximum eigenvalues of A.

Hint: \rho(A) = \|A\|_2 (because A is normal).

Exercise 23.3 Consider the following iterative method for solving Ax = b:

\begin{align} x^{(k+1)} = (I-A)x^{(k)} + b \tag{*} \end{align}

Suppose that x^{(k)} \to x as k\to\infty. Show that x is a solution to Ax=b,
Under what conditions on A does x^{(k)} converge?
Does the following iteration improve on (*)? Why or why not?

\begin{align} x^{(k+1)} = (I-\alpha A) x^{(k)} + \alpha b \tag{**} \end{align}

Show that, if A is SPD, then there exists \alpha > 0 such that (**) converges,
You can write this method as a splitting method - what is P and N in this case?
Explain how this method can be written as a gradient-based method.

Here, we plot the iterates of this method with two different choices of \alpha for a simple 2\times2 system:

A = [5 0 ; 0 1]; b = [10; 3]; x0 = [1;4]; ξ = A\b
display( gif( SD( A, b, x0; α=.005, xlims=[ξ[1]-1,ξ[1]+1], ylims=[ξ[2]-1,ξ[2]+1]) , fps=1 ) )
display( gif( SD( A, b, x0; α=.39, xlims=[ξ[1]-1,ξ[1]+1], ylims=[ξ[2]-1,ξ[2]+1]) , fps=1 ) )

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

For what step size \alpha does the method converge for the example above?
Compare this with the method from lectures that used line search: \alpha^{(k)} = \frac{(r^{(k)},r^{(k)})}{(r^{(k)},r^{(k)})_A}:

gif( SD( A, b, x0; xlims=[ξ[1]-1,ξ[1]+1], ylims=[ξ[2]-1,ξ[2]+1]) , fps=1 )

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

Exercise 23.4 Recall that (x,y)_A := x^TA y and \|x\|_A := \sqrt{(x,x)_A}

Write down the definition of a vector norm,
Show that, if A is SPD, then \|\cdot\|_A defines a vector norm.

Exercise 23.5 Here, we prove Theorem 11.1 from lectures. Suppose A is SPD, x is the solution to Ax=b and let x^{(k+1)} = x^{(k)} + \alpha^{(k)} r^{(k)} be the iteration given by gradient descent with line search (i.e. r^{(k)} = b - Ax^{(k)} is the residual and \alpha^{(k)} = (r^{(k)},r^{(k)})/(r^{(k)},r^{(k)})_A is the step size).

Let e^{(k)} := x^{(k)} - x be the error between the iteration and the exact solution, let \|x\| := \sqrt{(x,x)} be the 2-norm, and define \lambda_{\rm min}, \lambda_{\rm max} be the minimum and maximum eigenvalues of A.

Expand ( e^{(k+1)}, e^{(k+1)} )_A = (e^{(k)} + \alpha^{(k)} r^{(k)}, e^{(k)} + \alpha^{(k)} r^{(k)}) and thus show

\begin{align} \| e^{(k+1)} \|_A^2 % &\leq \| e^{(k)} \|_A^2 \left( 1 - \frac {\| r^{(k)} \|^4} {\|r^{(k)}\|_A^2 \,\, \|r^{(k)}\|_{A^{-1}}^2} \right) \end{align}

Hint: e^{(k)} = x^{(k)} - x = A^{-1}( Ax^{(k)} - Ax ) = - A^{-1}r^{(k)}.

Use the so-called Kantorovich inequality

\begin{align} \frac {\|x\|^4} {\|x\|^2_A \,\, \|x\|^2_{A^{-1}}} \geq \frac {4 \lambda_{\rm min} \lambda_{\rm max}} {(\lambda_{\rm min} + \lambda_{\rm max})^2} \end{align}

to prove that

\begin{align} \| e^{(k+1)} \|_A % &\leq \left( \frac {\frac{\lambda_{\rm max}}{\lambda_{\rm min}} - 1} {\frac{\lambda_{\rm max}}{\lambda_{\rm min}} + 1} \right) % \| e^{(k)} \|_A \end{align}

Note: this proves the theorem because \kappa(A) = \frac{\lambda_{\rm max}}{\lambda_{\rm min}}.

We saw in class that

\begin{align} \|e^{(k+1)}\|_A^2 = \|e^{(k)}\|_A^2 - \frac{\|r^{(k)}\|^4}{\|r^{(k)}\|_A^2} \end{align}

Since e^{(k)} = x^{(k)} - x = A^{-1} (Ax^{(k)} - Ax) = - A^{-1} r^{(k)}, we have

\begin{align} \| e^{(k)} \|_A^2 % &= (e^{(k)},e^{(k)})_A % = (A^{-1} r^{(k)}, A^{-1} r^{(k)})_A \nonumber\\ % &= (r^{(k)}, r^{(k)})_{A^{-1}} % = \|r^{(k)}\|_{A^{-1}}^2. \end{align}

As a result, we have

\begin{align} \|e^{(k+1)}\|_A^2 = \|e^{(k)}\|_A^2 \left( 1 - \frac{\|r^{(k)}\|^4}{\|r^{(k)}\|_{A^{-1}}^4 \|r^{(k)}\|_A^2} \right) \end{align}

Exercise 23.6 Here, we prove the Kantorovich inequality: let 0 <\lambda_1 \leq\cdots \leq\lambda_n be the eigenvalues of A.

Show that Kantorovich is equivalent to

\begin{align} (\sum_i \lambda_i y_i^2) (\sum_j \lambda_j^{-1} y_j^2) % \leq \frac{(\lambda_1 + \lambda_n)^2}{4\lambda_1\lambda_n} \end{align}

for all \|y\|_2 = 1.

Explain why \lambda^{-1} \leq \frac{1}{\lambda_1} + \frac{1}{\lambda_n} - \frac{\lambda}{\lambda_1\lambda_n} for all \lambda \in [\lambda_1,\lambda_n],
Thus show that

\begin{align} \sum_j \lambda_j^{-1} y_j^2 % \leq \frac {\lambda_1 + \lambda_n - \sum_j \lambda_j y_j^2} {\lambda_1\lambda_n}. \end{align}

Conclude by finding the maximum value of

\begin{align} X \frac {\lambda_1 + \lambda_n - X} {\lambda_1\lambda_n} % \end{align}

for X \in [\lambda_1,\lambda_n].

Exercise 23.7 (Bonus) Adapt this code to include a preconditioner P:

function CG(A,b,x0; P=I, Niter=50, tol=1e-3)

    X = [x0]

    g(x,y) = dot([x;y], A*[x;y])/2 - dot([x;y], b)

    r = b - A*x0
    v = r

    for i ∈ 1:Niter

        if (norm(r)<tol)
            break;
        end

        # step size 
        α = dot(v,r)/dot(v,A*v)

        # new point
        x1 = x0 + α*v
        
        # residual
        r = b - A*x1
            
        # new search direction
        β = - dot(v,A*r)/dot(v,A*v)
        v = r + β*v
            
        x0 = x1

        push!(X, x1)
    end

    return X

end;

CG (generic function with 1 method)

Other Links