24 P2: Hints

Problem Class 2: Iterative Methods in Matrix Analysis

Published

March 3, 2026

Abstract

Exercises based on content from Lectures 8-11

Make sure you are happy with the examples/exercises from lectures 8-11,

Exercise 24.1 Suppose A is symmetric. Show that A is positive definite if and only if A has strictly positive eigenvalues.

Hint: If A is symmetric, there exists orthogonal P (i.e. P^{-1} = P^T) of eigenvectors and a diagonal matrix of corresponding eigenvalues \Lambda such that A = P \Lambda P^T.

Answer.

Since A is symmetric, we can follow the hint to write A = \sum_i \lambda_i v_i v_i^T where \{\lambda_i\} are the eigenvalues of A and \{v_i\} are the corresponding eigenvectors satisfying v_i^T v_j = \delta_{ij}. We have

\begin{align} x^T A x &= \sum_i \lambda_i (x^T v_i) (v_i^T x) \nonumber\\ % &= \sum_i \lambda_i (x, v_i)^2. \end{align}

This quantity is greater than or equal to zero for all x if and only if each \lambda_i \geq 0. Moreover, if \lambda_i = 0, we have v_i^T A v_i = 0 and so A is not positive definite. As a result, A is positive definite if \lambda_i > 0 for all i.

On the other hand, if \lambda_i > 0 for all i then x^TAx = 0 if and only if (x,v_i) = 0 for all i. Since we can write x = \sum_j \alpha_j v_j for some coefficients \{\alpha_i\}, we have (x, v_i) = \sum_j \alpha_j (v_j,v_i) = \alpha_i. Therefore, we have x^TAx >0 unless \alpha_i = 0 for all i i.e. if x = 0.

Exercise 24.2 Suppose A is symmetric and positive definite (SPD). Show that \kappa(A) = \frac{\lambda_{\rm max}}{\lambda_{\rm min}} where \lambda_{\rm min}, \lambda_{\rm max} are the minimum and maximum eigenvalues of A.

Hint: \rho(A) = \|A\|_2 (because A is normal).

Answer.

Recall that

\begin{align} \kappa(A) &= \kappa_2(A) = \|A\|_2 \|A^{-1}\|_2. \end{align}

Since A is normal, we have \|A\|_2 = \rho(A) = \max_{\lambda \in \sigma(A)} |\lambda| = \max_{\lambda\in\sigma(A)} \lambda = \lambda_{\rm max} (here, we have used the fact that the eigenvalues of A are positive). Moreover, since \sigma(A^{-1}) = \{ \lambda^{-1} : \lambda \in \sigma(A)\}, we have \|A^{-1}\| = \rho(A^{-1}) = \max_{\lambda \in \sigma(A)} |\lambda|^{-1} = \lambda_{\rm min}^{-1}.

Exercise 24.3 Consider the following iterative method for solving Ax = b:

\begin{align} x^{(k+1)} = (I-A)x^{(k)} + b \tag{*} \end{align}

Suppose that x^{(k)} \to x as k\to\infty. Show that x is a solution to Ax=b,
Under what conditions on A does x^{(k)} converge?
Does the following iteration improve on (*)? Why or why not?

\begin{align} x^{(k+1)} = (I-\alpha A) x^{(k)} + \alpha b \tag{**} \end{align}

Show that, if A is SPD, then there exists \alpha > 0 such that (**) converges,
You can write this method as a splitting method - what is P and N in this case?
Explain how this method can be written as a gradient-based method.

Here, we plot the iterates of this method with two different choices of \alpha for a simple 2\times2 system:

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

For what step size \alpha does the method converge for the example above?
Compare this with the method from lectures that used line search: \alpha^{(k)} = \frac{(r^{(k)},r^{(k)})}{(r^{(k)},r^{(k)})_A}:

[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif

Answer.

Notice that if x^{(k)} \to x, then x = (I - A) x = b and thus Ax = b.

We saw in lectures that the iteration x^{(k+1)} = Tx^{(k)} + c converges if and only if \rho(T)<1. In this case, we therefore require \rho( I-A ) < 1.

The method depending on \alpha gives a better asymptotic rate of convergence if \rho( I-\alpha A ) < \rho( I-A ).

Notice that the eigenvalues of I-\alpha A are given by 1 - \alpha \lambda where \lambda is an eigenvalue of A (Exercise: why?). Therefore, when A is SPD (so that the eigenvalues of A are positive), we have \rho(1-\alpha A) <1 if and only if

\begin{align} -1 < 1-\alpha \lambda < 1 \end{align}

for all \lambda \in \sigma(A). That is, if 0<\alpha <\frac{2}\lambda for all \lambda\in\sigma(A). For this choice of \alpha, the method converges.

Recall, we defined splitting methods as follows: A = P - N and P x^{(k+1)} = N x^{(k)} + b. Notice that, we can rewrite x^{(k+1)} = (I-\alpha A)x^{(k)} + \alpha b as

\begin{align} \alpha^{-1} x^{(k+1)} = \alpha^{-1} (I-\alpha A)x^{(k)} + b \end{align}

and so P = \alpha^{-1}I and N = \alpha^{-1} - A.

Recall, we defined g(x) = \frac{1}2 x^T A x - b^T x and considered methods of the form

\begin{align} x^{(k+1)} &= x^{(k)} - \alpha^{(k)} \nabla g(x^{(k)}) \nonumber\\ % &= x^{(k)} + \alpha^{(k)} (b - Ax^{(k)}). \end{align}

We can rewite our method as

\begin{align} x^{(k+1)} &= (I-\alpha A)x^{(k)} + \alpha b \nonumber\\ % &= x^{(k)} + \alpha(b - Ax^{(k)}). \end{align}

One can therefore view this method as a gradient method with fixed step size \alpha^{(k)} = \alpha.

In the shown example, we have A = \begin{pmatrix} 5 & 0 \\ 0 & 1 \end{pmatrix} which has eigenvalues 1 and 5. For the method to converge we therefore require 0 < \alpha < \frac{2}{5}.

Exercise 24.4 Recall that (x,y)_A := x^TA y and \|x\|_A := \sqrt{(x,x)_A}

Write down the definition of a vector norm,
Show that, if A is SPD, then \|\cdot\|_A defines a vector norm.

Answer.

Since A is SPD, we have x^T A x \geq 0 and x^T A x = 0 if and only if x=0. Moreover, \| \alpha x \|_A = \sqrt{(\alpha x,\alpha x)_A} = |\alpha| \|x\|_A.

Triangle inequality is slightly harder: Notice that there exists eigenvalues \lambda_i > 0 and eigenvectors v_i with (v_i,v_j) = \delta_{ij} such that A = \sum_i \lambda_i v_i v_i^T. We can therefore define \sqrt{A} := \sum_i \sqrt{\lambda_i} v_i v_i^T such that

\begin{align} (\sqrt{A}^2)_{ij} &= \sum_k \sqrt{A}_{ik} \sqrt{A}_{kj} \nonumber\\ % &= \sum_{k,mn} \left( \sqrt{\lambda_m} (v_m)_i (v_m)_k \right) \left( \sqrt{\lambda_n} (v_n)_k (v_n)_j \right) \nonumber\\ % &= \sum_{mn} ( v_m, v_n) \sqrt{\lambda_m} (v_m)_i \sqrt{\lambda_n} (v_n)_j \nonumber\\ % &= \sum_m \lambda_{n} (v_n)_i (v_n)_j = A_{ij}. \end{align}

Here, we have used the fact that (v_m, v_n) = \delta_{mn}. We can also see that \sqrt{A} is SPD because A is. Using this, we notice that

\begin{align} \|x+y\|_A^2 &= (x+y)^T A (x+y) \nonumber\\ % &= \big(\sqrt{A} (x+y) \big)^T \sqrt{A} (x+y) \nonumber\\ % &= \| \sqrt{A} (x+y) \|_2^{2} \nonumber\\ % &\leq ( \| \sqrt{A} x \|_2 + \| \sqrt{A} y\|_2 )^2 \nonumber\\ % &\leq ( \| x \|_A + \| y\|_A )^2. \end{align}

Therefore \|\cdot\|_A satisfies the triangle inequality.

Exercise 24.5 Here, we prove Theorem 11.1 from lectures. Suppose A is SPD, x is the solution to Ax=b and let x^{(k+1)} = x^{(k)} + \alpha^{(k)} r^{(k)} be the iteration given by gradient descent with line search (i.e. r^{(k)} = b - Ax^{(k)} is the residual and \alpha^{(k)} = (r^{(k)},r^{(k)})/(r^{(k)},r^{(k)})_A is the step size).

Let e^{(k)} := x^{(k)} - x be the error between the iteration and the exact solution, let \|x\| := \sqrt{(x,x)} be the 2-norm, and define \lambda_{\rm min}, \lambda_{\rm max} be the minimum and maximum eigenvalues of A.

Expand ( e^{(k+1)}, e^{(k+1)} )_A = (e^{(k)} + \alpha^{(k)} r^{(k)}, e^{(k)} + \alpha^{(k)} r^{(k)}) and thus show

\begin{align} \| e^{(k+1)} \|_A^2 % &\leq \| e^{(k)} \|_A^2 \left( 1 - \frac {\| r^{(k)} \|^4} {\|r^{(k)}\|_A^2 \,\, \|r^{(k)}\|_{A^{-1}}^2} \right) \end{align}

Hint: e^{(k)} = x^{(k)} - x = A^{-1}( Ax^{(k)} - Ax ) = - A^{-1}r^{(k)}.

Use the so-called Kantorovich inequality

\begin{align} \frac {\|x\|^4} {\|x\|^2_A \,\, \|x\|^2_{A^{-1}}} \geq \frac {4 \lambda_{\rm min} \lambda_{\rm max}} {(\lambda_{\rm min} + \lambda_{\rm max})^2} \end{align}

to prove that

\begin{align} \| e^{(k+1)} \|_A % &\leq \left( \frac {\frac{\lambda_{\rm max}}{\lambda_{\rm min}} - 1} {\frac{\lambda_{\rm max}}{\lambda_{\rm min}} + 1} \right) % \| e^{(k)} \|_A \end{align}

Note: this proves the theorem because \kappa(A) = \frac{\lambda_{\rm max}}{\lambda_{\rm min}}.

We saw in class that

\begin{align} \|e^{(k+1)}\|_A^2 = \|e^{(k)}\|_A^2 - \frac{\|r^{(k)}\|^4}{\|r^{(k)}\|_A^2} \end{align}

Since e^{(k)} = x^{(k)} - x = A^{-1} (Ax^{(k)} - Ax) = - A^{-1} r^{(k)}, we have

\begin{align} \| e^{(k)} \|_A^2 % &= (e^{(k)},e^{(k)})_A % = (A^{-1} r^{(k)}, A^{-1} r^{(k)})_A \nonumber\\ % &= (r^{(k)}, r^{(k)})_{A^{-1}} % = \|r^{(k)}\|_{A^{-1}}^2. \end{align}

As a result, we have

\begin{align} \|e^{(k+1)}\|_A^2 = \|e^{(k)}\|_A^2 \left( 1 - \frac{\|r^{(k)}\|^4}{\|r^{(k)}\|_{A^{-1}}^4 \|r^{(k)}\|_A^2} \right) \end{align}

Exercise 24.6 Here, we prove the Kantorovich inequality: let 0 <\lambda_1 \leq\cdots \leq\lambda_n be the eigenvalues of A.

Show that Kantorovich is equivalent to

\begin{align} (\sum_i \lambda_i y_i^2) (\sum_j \lambda_j^{-1} y_j^2) % \leq \frac{(\lambda_1 + \lambda_n)^2}{4\lambda_1\lambda_n} \end{align}

for all \|y\|_2 = 1.

Explain why \lambda^{-1} \leq \frac{1}{\lambda_1} + \frac{1}{\lambda_n} - \frac{\lambda}{\lambda_1\lambda_n} for all \lambda \in [\lambda_1,\lambda_n],
Thus show that

\begin{align} \sum_j \lambda_j^{-1} y_j^2 % \leq \frac {\lambda_1 + \lambda_n - \sum_j \lambda_j y_j^2} {\lambda_1\lambda_n}. \end{align}

Conclude by finding the maximum value of

\begin{align} X \frac {\lambda_1 + \lambda_n - X} {\lambda_1\lambda_n} % \end{align}

for X \in [\lambda_1,\lambda_n].

Exercise 24.7 (Bonus) Adapt this code to include a preconditioner P:

function CG(A,b,x0; P=I, Niter=50, tol=1e-3)

    X = [x0]

    g(x,y) = dot([x;y], A*[x;y])/2 - dot([x;y], b)

    r = b - A*x0
    v = r

    for i ∈ 1:Niter

        if (norm(r)<tol)
            break;
        end

        # step size 
        α = dot(v,r)/dot(v,A*v)

        # new point
        x1 = x0 + α*v
        
        # residual
        r = b - A*x1
            
        # new search direction
        β = - dot(v,A*r)/dot(v,A*v)
        v = r + β*v
            
        x0 = x1

        push!(X, x1)
    end

    return X

end;

Other Links