[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
Problem Class 2: Iterative Methods in Matrix Analysis
Exercise 15.1 Suppose A is symmetric. Show that A is positive definite if and only if A has strictly positive eigenvalues.
Hint: If A is symmetric, there exists orthogonal P (i.e. P^{-1} = P^T) of eigenvectors and a diagonal matrix of corresponding eigenvalues \Lambda such that A = P \Lambda P^T.
Answer.
Since A is symmetric, we can follow the hint to write A = \sum_i \lambda_i v_i v_i^T where \{\lambda_i\} are the eigenvalues of A and \{v_i\} are the corresponding eigenvectors satisfying v_i^T v_j = \delta_{ij}. We have
\begin{align} x^T A x &= \sum_i \lambda_i (x^T v_i) (v_i^T x) \nonumber\\ % &= \sum_i \lambda_i (x, v_i)^2. \end{align}
This quantity is greater than or equal to zero for all x if and only if each \lambda_i \geq 0. Moreover, if \lambda_i = 0, we have v_i^T A v_i = 0 and so A is not positive definite. As a result, A is positive definite if \lambda_i > 0 for all i.
On the other hand, if \lambda_i > 0 for all i then x^TAx = 0 if and only if (x,v_i) = 0 for all i. Since we can write x = \sum_j \alpha_j v_j for some coefficients \{\alpha_i\}, we have (x, v_i) = \sum_j \alpha_j (v_j,v_i) = \alpha_i. Therefore, we have x^TAx >0 unless \alpha_i = 0 for all i i.e. if x = 0.
Exercise 15.2 Suppose A is symmetric and positive definite (SPD). Show that \kappa(A) = \frac{\lambda_{\rm max}}{\lambda_{\rm min}} where \lambda_{\rm min}, \lambda_{\rm max} are the minimum and maximum eigenvalues of A.
Hint: \rho(A) = \|A\|_2 (because A is normal).
Answer.
Recall that
\begin{align} \kappa(A) &= \kappa_2(A) = \|A\|_2 \|A^{-1}\|_2. \end{align}
Since A is normal, we have \|A\|_2 = \rho(A) = \max_{\lambda \in \sigma(A)} |\lambda| = \max_{\lambda\in\sigma(A)} \lambda = \lambda_{\rm max} (here, we have used the fact that the eigenvalues of A are positive). Moreover, since \sigma(A^{-1}) = \{ \lambda^{-1} : \lambda \in \sigma(A)\}, we have \|A^{-1}\| = \rho(A^{-1}) = \max_{\lambda \in \sigma(A)} |\lambda|^{-1} = \lambda_{\rm min}^{-1}.
Exercise 15.3 Consider the following iterative method for solving Ax = b:
\begin{align} x^{(k+1)} = (I-A)x^{(k)} + b \tag{*} \end{align}
\begin{align} x^{(k+1)} = (I-\alpha A) x^{(k)} + \alpha b \tag{**} \end{align}
Here, we plot the iterates of this method with two different choices of \alpha for a simple 2\times2 system:
[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
[ Info: Saved animation to c:\Users\math5\Math 5485\Math5486\tmp.gif
Answer.
Notice that if x^{(k)} \to x, then x = (I - A) x = b and thus Ax = b.
We saw in lectures that the iteration x^{(k+1)} = Tx^{(k)} + c converges if and only if \rho(T)<1. In this case, we therefore require \rho( I-A ) < 1.
The method depending on \alpha gives a better asymptotic rate of convergence if \rho( I-\alpha A ) < \rho( I-A ).
Notice that the eigenvalues of I-\alpha A are given by 1 - \alpha \lambda where \lambda is an eigenvalue of A (Exercise: why?). Therefore, when A is SPD (so that the eigenvalues of A are positive), we have \rho(1-\alpha A) <1 if and only if
\begin{align} -1 < 1-\alpha \lambda < 1 \end{align}
for all \lambda \in \sigma(A). That is, if 0<\alpha <\frac{2}\lambda for all \lambda\in\sigma(A). For this choice of \alpha, the method converges.
Recall, we defined splitting methods as follows: A = P - N and P x^{(k+1)} = N x^{(k)} + b. Notice that, we can rewrite x^{(k+1)} = (I-\alpha A)x^{(k)} + \alpha b as
\begin{align} \alpha^{-1} x^{(k+1)} = \alpha^{-1} (I-\alpha A)x^{(k)} + b \end{align}
and so P = \alpha^{-1}I and N = \alpha^{-1} - A.
Recall, we defined g(x) = \frac{1}2 x^T A x - b^T x and considered methods of the form
\begin{align} x^{(k+1)} &= x^{(k)} - \alpha^{(k)} \nabla g(x^{(k)}) \nonumber\\ % &= x^{(k)} + \alpha^{(k)} (b - Ax^{(k)}). \end{align}
We can rewite our method as
\begin{align} x^{(k+1)} &= (I-\alpha A)x^{(k)} + \alpha b \nonumber\\ % &= x^{(k)} + \alpha(b - Ax^{(k)}). \end{align}
One can therefore view this method as a gradient method with fixed step size \alpha^{(k)} = \alpha.
In the shown example, we have A = \begin{pmatrix} 5 & 0 \\ 0 & 1 \end{pmatrix} which has eigenvalues 1 and 5. For the method to converge we therefore require 0 < \alpha < \frac{2}{5}.
Exercise 15.4 Recall that (x,y)_A := x^TA y and \|x\|_A := \sqrt{(x,x)_A}
Answer.
Since A is SPD, we have x^T A x \geq 0 and x^T A x = 0 if and only if x=0. Moreover, \| \alpha x \|_A = \sqrt{(\alpha x,\alpha x)_A} = |\alpha| \|x\|_A.
Triangle inequality is slightly harder: Notice that there exists eigenvalues \lambda_i > 0 and eigenvectors v_i with (v_i,v_j) = \delta_{ij} such that A = \sum_i \lambda_i v_i v_i^T. We can therefore define \sqrt{A} := \sum_i \sqrt{\lambda_i} v_i v_i^T such that
\begin{align} (\sqrt{A}^2)_{ij} &= \sum_k \sqrt{A}_{ik} \sqrt{A}_{kj} \nonumber\\ % &= \sum_{k,mn} \left( \sqrt{\lambda_m} (v_m)_i (v_m)_k \right) \left( \sqrt{\lambda_n} (v_n)_k (v_n)_j \right) \nonumber\\ % &= \sum_{mn} ( v_m, v_n) \sqrt{\lambda_m} (v_m)_i \sqrt{\lambda_n} (v_n)_j \nonumber\\ % &= \sum_m \lambda_{n} (v_n)_i (v_n)_j = A_{ij}. \end{align}
Here, we have used the fact that (v_m, v_n) = \delta_{mn}. We can also see that \sqrt{A} is SPD because A is. Using this, we notice that
\begin{align} \|x+y\|_A^2 &= (x+y)^T A (x+y) \nonumber\\ % &= \big(\sqrt{A} (x+y) \big)^T \sqrt{A} (x+y) \nonumber\\ % &= \| \sqrt{A} (x+y) \|_2^{2} \nonumber\\ % &\leq ( \| \sqrt{A} x \|_2 + \| \sqrt{A} y\|_2 )^2 \nonumber\\ % &\leq ( \| x \|_A + \| y\|_A )^2. \end{align}
Therefore \|\cdot\|_A satisfies the triangle inequality.
Exercise 15.5 Here, we prove Theorem 11.1 from lectures. Suppose A is SPD, x is the solution to Ax=b and let x^{(k+1)} = x^{(k)} + \alpha^{(k)} r^{(k)} be the iteration given by gradient descent with line search (i.e. r^{(k)} = b - Ax^{(k)} is the residual and \alpha^{(k)} = (r^{(k)},r^{(k)})/(r^{(k)},r^{(k)})_A is the step size).
Let e^{(k)} := x^{(k)} - x be the error between the iteration and the exact solution, let \|x\| := \sqrt{(x,x)} be the 2-norm, and define \lambda_{\rm min}, \lambda_{\rm max} be the minimum and maximum eigenvalues of A.
\begin{align} \| e^{(k+1)} \|_A^2 % &\leq \| e^{(k)} \|_A^2 \left( 1 - \frac {\| r^{(k)} \|^4} {\|r^{(k)}\|_A^2 \,\, \|r^{(k)}\|_{A^{-1}}^2} \right) \end{align}
Hint: e^{(k)} = x^{(k)} - x = A^{-1}( Ax^{(k)} - Ax ) = - A^{-1}r^{(k)}.
\begin{align} \frac {\|x\|^4} {\|x\|^2_A \,\, \|x\|^2_{A^{-1}}} \geq \frac {4 \lambda_{\rm min} \lambda_{\rm max}} {(\lambda_{\rm min} + \lambda_{\rm max})^2} \end{align}
to prove that
\begin{align} \| e^{(k+1)} \|_A % &\leq \left( \frac {\frac{\lambda_{\rm max}}{\lambda_{\rm min}} - 1} {\frac{\lambda_{\rm max}}{\lambda_{\rm min}} + 1} \right) % \| e^{(k)} \|_A \end{align}
Note: this proves the theorem because \kappa(A) = \frac{\lambda_{\rm max}}{\lambda_{\rm min}}.
We saw in class that
\begin{align} \|e^{(k+1)}\|_A^2 = \|e^{(k)}\|_A^2 - \frac{\|r^{(k)}\|^4}{\|r^{(k)}\|_A^2} \end{align}
Since e^{(k)} = x^{(k)} - x = A^{-1} (Ax^{(k)} - Ax) = - A^{-1} r^{(k)}, we have
\begin{align} \| e^{(k)} \|_A^2 % &= (e^{(k)},e^{(k)})_A % = (A^{-1} r^{(k)}, A^{-1} r^{(k)})_A \nonumber\\ % &= (r^{(k)}, r^{(k)})_{A^{-1}} % = \|r^{(k)}\|_{A^{-1}}^2. \end{align}
As a result, we have
\begin{align} \|e^{(k+1)}\|_A^2 = \|e^{(k)}\|_A^2 \left( 1 - \frac{\|r^{(k)}\|^4}{\|r^{(k)}\|_{A^{-1}}^4 \|r^{(k)}\|_A^2} \right) \end{align}
Exercise 15.6 Here, we prove the Kantorovich inequality: let 0 <\lambda_1 \leq\cdots \leq\lambda_n be the eigenvalues of A.
\begin{align} (\sum_i \lambda_i y_i^2) (\sum_j \lambda_j^{-1} y_j^2) % \leq \frac{(\lambda_1 + \lambda_n)^2}{4\lambda_1\lambda_n} \end{align}
for all \|y\|_2 = 1.
\begin{align} \sum_j \lambda_j^{-1} y_j^2 % \leq \frac {\lambda_1 + \lambda_n - \sum_j \lambda_j y_j^2} {\lambda_1\lambda_n}. \end{align}
\begin{align} X \frac {\lambda_1 + \lambda_n - X} {\lambda_1\lambda_n} % \end{align}
for X \in [\lambda_1,\lambda_n].
Exercise 15.7 (Bonus) Adapt this code to include a preconditioner P:
function CG(A,b,x0; P=I, Niter=50, tol=1e-3)
X = [x0]
g(x,y) = dot([x;y], A*[x;y])/2 - dot([x;y], b)
r = b - A*x0
v = r
for i ∈ 1:Niter
if (norm(r)<tol)
break;
end
# step size
α = dot(v,r)/dot(v,A*v)
# new point
x1 = x0 + α*v
# residual
r = b - A*x1
# new search direction
β = - dot(v,A*r)/dot(v,A*v)
v = r + β*v
x0 = x1
push!(X, x1)
end
return X
end;