46 Midterm Exam

My solutions (of course, you may have got the correct answers in different/better ways)

Published

March 18, 2026

“I agree to follow the rules with regard to academic honesty & integrity as outlined in the course syllabus”

Signed:

Print name:

Time Allowed: 1 hr 30 mins

Answer SECTION A (40%) and TWO of the three (30% + 30%) optional sections (B, C, D). If you have answered more than than the required two optional sections, you will be given credit for your best two optional sections. As a guide, I suggest you spend around 30 minutes on each section.

Please indicate your answers to multiple choice questions clearly with an ╳ to the left of your answers.

Please label any additional paper you use with your name and page number.

Electronic devices are not needed and not permitted in this examination. You may use your own notes, limited to 2 sides of letter sized paper. Additional paper may be requested.

A: Compulsory Section [40 points]

Useful definitions

Consider the initial value problem: seek u:[0,T] \to \mathbb R such that

\begin{align} \tag{IVP} \begin{split} u(0) &= u_0 \\ u'(t) &= f\big(t,u(t)\big). \end{split} \end{align}

Recall, u_0 is known as the initial condition and (u_0, f) is known as the data.

For a fixed mesh t_j = j h = \frac{jT}{n} (for 0 \leq j \leq n), we use the notation u_j \approx u(t_j) for approximations coming from the numerical scheme in question.

[10 points]

Please select all the true statements.

There is f for which there exists a solution u : [0,T) \to \mathbb R which blows up at time T.

True! We saw that the solution to u'(t) = u(t)^3 with u(0) = 1 has finite-time blow-up at t=\frac12.

If f is continuous, then there exists a unique solution to \text{(IVP)}.

False! The theorem we had in class needed f to also be Lipschitz in the second component with constant independent of t. Indeed, f(t,u) := \sqrt{|u|} is continuous on [0,T] \times \mathbb R but there is not a unique solution to u'(t) = \sqrt{|u(t)|} with u(0) = 0.

When f(t,u) = \cos(u), the problem \text{(IVP)} has a unique solution for all time.

True. The solution is u(t) = u_0 + \sin t

\text{(IVP)} with f(t,u) = \sqrt{u} is well-posed.

False. There are mulitple solutions when u(0) = 0.

Euler’s method is consistent

True. It has order of accuracy 1 since

\begin{align} \tau_{j+1}(h) % &= \frac{u(t_j+h)-u(t_j)}{h} - u'(t_j) \nonumber\\ % &= \frac{u(t_j) + hu'(t_j) + \frac{h^2}{2} u''(t_j) + O(h^3) -u(t_j)}{h} - u'(t_j) \nonumber\\ % &= O(h) \end{align}

as h\to0.

The order of accuracy of Runge-Kutta methods can be seen numerically by plotting the error against the number of grid points n on a log-log scale

True! Order is O(h^p) = O(n^{-p}) and so if we plot y = C n^{-p} on a log-log, we should see \log y = -p \log n + \log C and so the order is the negative gradient of the error curve.

If one uses an adaptive time stepping Runge-Kutta method, it is common to notice that the step size is larger in regions where the derivative of the exact solution is large

False. We expect to see the opposite - more mesh points where the derivatives are large.

If a multistep method is consistent then it is automatically convergent

False. It also has to be zero-stable.

A linear m-step method can have order of accuracy p = m+1

True. Implicit methods aim to achieve this!

If a method is zero stable, then it produces a sequence of approximations (u_j) which remains bounded as j\to \infty

False. Zero stability is about the limit as h \to 0.

u and v are A-orthogonal if (Au, Av) = 0

False. We defined A-orthogonality to be (u,v)_A := (u, Av) = 0.

[5 points]

Suppose that f is continuous on [0,T]\times \mathbb R and f is Lipschitz in its second component with constant L (independent of t) and that u solves \text{(IVP)}. Suppose that v solves the perturbed problem

\begin{align} \tag{IVP$\delta$} \begin{split} v(0) &= u_0 + \delta \\ v'(t) &= f\big(t,v(t)\big). \end{split} \end{align}

Explain why one can not improve the general bound \max_{t\in[0,T]} | u(t) - v(t) | \leq |\delta| e^{LT} for all |\delta| sufficiently small.

Hint: It is useful to consider the function f(t,v) = v.

Answer.

If f(t, u) = u then the solution to \text{(IVP)} is u(t) = u_0 e^{t} and the solution to (\text{IVP}_\delta) is v(t) = (u_0 + \delta)e^t and so

\begin{align} |u(t) - v(t)| = |\delta| e^{t}. \end{align}

Since f is Lipschitz with constant L=1 (since |f(t,u)-f(t,v)|=|u-v|), we have \|u-v\|_{L^\infty([0,T])} = |\delta| e^{LT}. In particular, this bound is sharp (it cannot be improved).

[10 points]

Write down the Runge-Kutta method corrseponding to the following Butcher tableau:

\begin{array} {c|cccc} 0\\ c & a \\ \hline & b_1 & b_2 . \end{array}

Show that you can choose the constants a,b_1,b_2,c to obtain a order 2 method.

Answer.

The Butcher tableau corresponds to the RK method given by

\begin{align} u_{j+1} = u_j + b_1 h f(t_j,u_j) + b_2 h f\big(t_j + ch, u_j + a h f(t_j,u_j) \big). \end{align}

The local truncation error is

\begin{align} \tau_{j+1}(h) &= \frac{u(t_{j+1}) - u(t_j)}{h} - b_1 f(t_j,u(t_j)) % \nonumber\\ &\qquad - b_2 f\big(t_j + ch, u(t_j) + a h f(t_j,u(t_j)) \big) \nonumber\\ % &= u'(t_j) + \frac{h}2 u''(t_j) + O(h^2) \nonumber\\ % &\qquad - b_1 u'(t_j) - b_2 \Big[ f(t_j,u(t_j)) + ch \partial_t f(t_j, u(t_j)) \nonumber\\ % &\qquad\qquad + a h f(t_j,u(t_j)) \partial_u f(t_j,u(t_j)) + O(h^2) \Big] \nonumber\\ % &= (1-b_1-b_2) u'(t_j) + \frac{h}2 u''(t_j) \nonumber\\ % &\qquad - h b_2 \Big[ c \partial_t f(t_j, u(t_j)) \nonumber\\ % &\qquad\qquad + a f(t_j,u(t_j)) \partial_u f(t_j,u(t_j)) \Big] + O(h^2) \end{align}

Therefore, if b_1 + b_2 = 1 and a = c = \frac{1}{2b_2}, then b_2 \big[ c \partial_t f(t_j, u(t_j)) + a f(t_j,u(t_j)) \partial_u f(t_j,u(t_j)) \big] = \frac12 \big[ \partial_t f(t_j, u(t_j)) + u'(t_j) \partial_u f(t_j,u(t_j)) \big] = \frac12 u''(t_j) and thus \tau_{j+1}(h) = O(h^2) and h\to0.

Therefore, the RK method has (at least) order 2 convergence for this choice of constants.

[15 points]

Hint

It will be useful in the following to use the Taylor expansion:

\begin{align} f\big(t\pm h, u(t\pm h)\big) &= f\big(t, u(t)\big) \pm h \left[ \frac{\partial f}{\partial t}\big(t,u(t)\big) + \frac{\partial f}{\partial u}\big(t,u(t)\big) u'(t) \right] + O(h^2) \nonumber \end{align}

as h \to 0.

Here is a multistep method for some fixed \alpha \in \mathbb R:

\begin{align} u_{j+1} &= (1-\alpha) u_j + \alpha u_{j-1} % + \frac{h}{2} \big( (\alpha + 3) f( t_j, u_j ) + (\alpha-1) f( t_{j-1}, u_{j-1} ) \big) % \tag{$\alpha2$} \end{align}

Is this an implicit or explicit method?

Recall that the local truncation error of (\alpha2) is given by

\begin{align} \tau_{j+1}(h) := \frac {u(t_{j+1}) - (1-\alpha) u(t_j) - \alpha u(t_{j-1})} {h} % - \frac{1}2 \Big( (\alpha+3) f\big(t_j,u(t_j)\big) + (\alpha-1) f\big(t_{j-1}, u(t_{j-1})\big) \Big). \nonumber \end{align}

Show that (\alpha2) has order of accuracy at least p = 2,
For what \alpha is (\alpha2) zero stable?

Answer.

1. This is an explicit method (there is no u_{j+1} on the right hand side),

2. Using the Taylor expansion of u about t_j, we have

\begin{align} &\frac{1}{h} \big[ u(t_{j+1}) -(1-\alpha) u(t_j) - \alpha u(t_{j-1}) \big] \nonumber\\ % &= \frac{1}{h} \Big[ u(t_{j}) + h u'(t_j) + \tfrac{h^2}2 u''(t_j) -(1-\alpha) u(t_j) \nonumber\\ % &\qquad - \alpha \big[u(t_{j}) - hu'(t_{j}) + \tfrac{h^2}2u''(u_j) \big] \Big] + O(h^2) \nonumber\\ % &= (1+\alpha) u'(t_j) + \frac{1-\alpha}{2} h u''(t_j) + O(h^2) \nonumber \end{align}

Moreover,

\begin{align} &\frac{1}2 \Big( (\alpha+3) f\big(t_j,u(t_j)\big) + (\alpha-1) f\big(t_{j-1}, u(t_{j-1})\big) \Big) \nonumber\\ % &= \frac{1}2 \Big( (\alpha+3) u'(t_j) \nonumber\\ &\qquad + (\alpha-1) \big[ f\big(t_{j}, u(t_{j})\big) % - h \partial_t f\big(t_{j}, u(t_{j})\big) \nonumber\\ % &\qquad\qquad - h \partial_u f\big(t_{j}, u(t_{j})\big) u'(t_j) % + O(h^2) \big] \Big) \nonumber\\ % &= (\alpha+1) u'(t_j) - \frac{\alpha-1}{2}h u''(t_j) + O(h^2). \end{align}

Taking the difference of these terms, we get \tau_{j+1}(h) = O(h^2).

Extra: for what \alpha is this method 3rd order accurate?

3. This method is zero stable if it satisfies the root condition: the roots \zeta of the characteristic polynomial \rho(z) = z^2 + (\alpha-1)z - \alpha satisfy |\zeta|\leq1 and if |\zeta|=1 then \zeta is simple. We have \rho(z) = (z-1)(z+\alpha) so the roots are \zeta = 1 and \zeta= -\alpha. Therefore, this method is zero stable if |\alpha|\leq 1 and \alpha \not= 1.

As a result, the method (\alpha2) is convergent for |\alpha|\leq 1 and \alpha \not= 1 (we saw this in Example 7.4 in the lecture notes)

Choose TWO sections from B, C, D (each worth 30 points)

B: Simple Mixing [30 points]

Useful facts and definitions

Recall that splitting methods for solving Ax=b take the form

\begin{align} P x^{(k+1)} = N x^{(k)} + b \tag{splitting} \end{align}

where A = P - N. This can also be written as x^{(k+1)} = P^{-1} N x^{(k)} + P^{-1} b.

In this question, we consider the following method: for fixed \alpha > 0, define

\begin{align} x^{(k+1)} = (I-\alpha A)x^{(k)} + \alpha b. \tag{$\star$} \end{align}

It is useful to remember that, when A is symmetric, A is SPD if and only if all eigenvalues of A are strictly positive,

Suppose that x^{(k)} defined by (\text{splitting}) converges to some x as k\to\infty. Show that x is a solution to Ax=b,
For (\text{splitting}) to be practical, what is the key property that the preconditioner P must satisfy?
Which condition leads to convergence x^{(k)} \to x?
For which choice of preconditioner P gives (\star)?
Show that, if A is symmetric and positive definite (SPD), then (\star) converges for all 0 < \alpha < \frac{2}{\rho(A)}.

For fixed \beta\in \mathbb R, consider the matrix

\begin{align} A_{\beta} := \begin{pmatrix} 1 & \beta & \beta \\ \beta & 1 & \beta \\ \beta & \beta & 1 \end{pmatrix}\nonumber. \end{align}

which has \sigma(A_{\beta}) = \{1-\beta, 1 + 2\beta\}.

Hence, show that for -\frac12 < \beta < 1, there exists \alpha sufficiently small such that (\star) converges when applied to A_\beta.

Answer.

1. If x^{(k)} \to x as k \to \infty, then we have Px = Nx + b and so (P-N)x = Ax = b.

2. P must be easy to invert,

3. We require \rho( P^{-1} N ) < 1,

4. P = \alpha^{-1}I gives N = \alpha^{-1}I - A and thus P^{-1} N = \alpha(\alpha^{-1}I - A) = I - \alpha A as required.

5. The eigenvalues of I - \alpha A are 1 - \alpha \lambda for \lambda \in \sigma(A). If A is SPD then all the eigenvalues are strictly positive and so

\begin{align} \rho(I-\alpha A) < 1 \qquad \iff \qquad &-1 < 1 - \alpha \lambda < 1 &&\text{for all } \lambda \in \sigma(A) \nonumber\\ % \iff\qquad &0 < \alpha \lambda < 2 &&\text{for all } \lambda \in \sigma(A) \nonumber\\ % \iff\qquad &0 < \alpha < \frac{2}{\lambda} &&\text{for all } \lambda \in \sigma(A) \nonumber\\ % \iff\qquad &0 < \alpha < \frac{2}{\rho(A)}. \end{align}

By question 3, under this condition, (\star) is convergent.

6. A_\beta is symmetric. When 1-\beta > 0 and 1+2\beta > 0 we have that A_\beta is SPD and thus (\star) is convergent when applied to A_\beta for all \alpha sufficiently small. We have 1-\beta > 0 and 1+2\beta > 0 if and only if -\frac12<\beta < 1.

Extra: Show that (\star) does not converge for any \alpha if \beta is outside this range.

Choose TWO sections from B, C, D (each worth 30 points)

C: Strictly Diagonally Dominant Matrices [30 points]

Useful facts and definitions

Recall that |x|_\infty = \max_{i}|x_i| and \|A\|_{\infty} := \max_{|x|_\infty = 1} |Ax|_{\infty} define a vector norm and the induced matrix norm, respectively. In this question, it will be useful to use the fact that

\begin{align} \rho(A) \leq \|A\|_{\infty} = \max_{1\leq i\leq n} \sum_{j=1}^n |A_{ij}| \tag{hint} \end{align}

(which you can use without proof).

We say a matrix A \in \mathbb R^{n\times n} is strictly diagonally dominant (SDD) if

\begin{align} |A_{ii}| > \sum_{j\not=i} |A_{ij}| \qquad \text{for all } i = 1,\dots,n. \tag{SDD} \end{align}

Recall that Jacobi’s method for solving Ax = b is given by x^{(k+1)} = D^{-1} \left( (L+U)x^{(k)} + b \right) where A = D - L - U, D is diagonal, L is strictly lower triangular, and U is strictly upper triangular. Define T_{\rm J} := D^{-1} (L+U).

Show that, if A is strictly diagonally dominant, then \| T_{\rm J} \|_\infty < 1,
Hence show that Jacobi’s method converges for strictly diagonally dominant matrices A.

Recall that the Gauss-Seidel method for solving Ax = b is given by x^{(k+1)} = (D-L)^{-1} \left( Ux^{(k)} + b \right) and let \lambda be an eigenvalue of T_{\rm GS} := (D-L)^{-1} U with corresponding eigenvector v.

Show that \lambda Dv = (\lambda L+U) v.
Hence show that, if |\lambda| \geq 1, then

\begin{align} |A_{ii}| |v_i| \leq \sum_{j\not=i} |A_{ij}| |v_j| \qquad \text{for all } i=1,\dots,n.\nonumber \end{align}

By choosing i such that |v_i| \geq |v_j| for all j show that, if |\lambda|\geq1, then |A_{ii}| \leq \sum_{j\not=i} |A_{ij}|.
Hence conclude that Gauss-Seidel converges for all strictly diagonally dominant matrices.

Answer.

1. We will use the hint, the fact that (L+U)_{ij} = A_{ij} if i\not=j and (L+U)_{ii} = 0, and D^{-1} is diagonal with [D^{-1}]_{ii} = (D_{ii})^{-1}:

\begin{align} \| T_{\rm J} \|_\infty &= \max_i \sum_j | [T_{\rm J}]_{ij} | \nonumber\\ % &= \max_i \sum_j | [D^{-1}]_{ii} (L+U)_{ij} | \nonumber\\ % &= \max_i \sum_{j\not=i} \frac{|A_{ij}|}{|A_{ii}|} \nonumber\\ % & < 1. \end{align}

In the final line, we are using the fact that A is strictly diagonally dominant.

2. We have \rho( T_{\rm J} ) \leq \| T_{\rm J} \|_\infty < 1 and so the iteration is convergent.

3. We have T_{\rm GS} v = (D-L)^{-1} U v = \lambda v and thus Uv = (D-L) \lambda v and thus \lambda Dv = (\lambda L+ U)v.

4. If |\lambda| \geq 1, we have \lambda D_{ii} v_i = \sum_j (\lambda L+ U)_{ij} v_j and D_{ii} = A_{ii} and so

\begin{align} |\lambda| |A_{ii}| |v_i| &\leq \sum_j \left| (\lambda L+ U)_{ij} v_j \right| \nonumber\\ % &\leq |\lambda| \sum_j \left| (L+ U)_{ij}\right| \left| v_j \right| \nonumber\\ % &= |\lambda| \sum_j \left|A_{ij}\right| \left| v_j \right|, \end{align}

as required.

5. If |v_i| \geq |v_j| for all j then |v_j|/|v_i| \leq 1 and so

\begin{align} |A_{ii}| |v_i| &\leq \sum_j \left|A_{ij}\right| \frac{\left| v_j \right|}{|v_i|} \nonumber\\ % &\leq \sum_j \left|A_{ij}\right|. \end{align}

6. We have seen that, if |\lambda|\geq1, then A is not strictly diagonally dominant. Therefore, if A is strictly diagonally dominant, then |\lambda|<1 for all \lambda \in \sigma(T_{\rm GS}) (that is, \rho(T_{\rm GS}) < 1). Therefore GS converges for all strictly diagonally dominant matrices A.

Extra: (possible presentation) Show that Jacobi and GS converge for irreducibly diagonally dominant matrices.

Choose TWO sections from B, C, D (each worth 30 points)

D: Power and Inverse Methods [30 points]

Useful facts and definitions

Suppose that A \in \mathbb R^{n\times n} has orthonormal eigenvectors v_i with corresponding eigenvalues \lambda_i (that is, Av_i = \lambda_i v_i and (v_i,v_j) = v_i^Tv_j = \delta_{ij}). We have seen in lectures that, in this case, we can write A = \sum_i \lambda_i v_i v_i^T.

We first suppose that \lambda_1 is a dominant eigenvalue: |\lambda_1| > |\lambda_i| for all i\not=1. We call the corresponding eigenvector v_1 a dominant eigenvector.

Explain why \alpha v_1 is also a dominant eigenvector for all \alpha\not=0.
Suppose that x^{(0)} = \sum_i c_i v_i for some constants c_i. Show that

\begin{align} A^{k} x^{(0)} &= \sum_{i=1}^n \lambda_i^{k} c_i v_i % = \lambda_1^{k} \Big[ c_1 v_1 + \sum_{i\not=1} c_i (\tfrac{\lambda_i}{\lambda_1})^{k} v_i \Big].\nonumber \end{align}

In particular, the sequence (A^k x^{(0)})_{k=1}^{\infty} approaches a constant multiple of the dominant eigenvector v_1. Therefore, on normalising A^k x^{(0)}, we obtain a sequence that converges to v_1 or -v_1 (which are both dominant eigenvectors) as k \to \infty. Using this sequence one is also able to approximate the dominant eigenvalue \lambda_1. We implement this so-called power method here:

function power( A; Niter=100, tol=1e-8 )
    n, = size(A)
    x = normalize( randn(n) )
    λ = []
    for k ∈ 1:Niter
        x′ = A*x
        push!( λ, dot(x,x′) )
        x = normalize(x′)
        
        # stopping criteria
        r = norm(A*x-λ[end]*x)/norm(A)
        if ( r < tol )
            println("converged in $k iterations\nλ ≈ $(λ[end]), residual = $r")
            break;
        elseif (k == Niter)
            println("did not converge in $Niter iterations, residual = $r")
        end
    end
    return λ, x
end;

In the stopping criteria, why do use the norm of the residual \|Ax - \lambda x\|_2 instead of computing either of the errors \|x - v_1\|_2 or |\lambda - \lambda_1|?

Here, we apply the power method to a random 1000 \times 1000 symmetric matrix and plot the errors in the eigenvalue against the number of iterations:

converged in 3461 iterations
λ ≈ -89.03496134424624, residual = 9.988658606610933e-9

What is the rate of convergence in this plot?

One can show that, if z \not\in \sigma(A), then zI - A is invertible and \sigma\big( (zI-A)^{-1} \big) = \left\{ (z- \lambda)^{-1} : \lambda\in\sigma(A) \right\}. Suppose that, after possibly reordering \lambda_1,\dots,\lambda_n \in \sigma(A), we have

\begin{align} |z - \lambda_1| < |z - \lambda_2| \leq \cdots \leq |z- \lambda_n|\nonumber \end{align}

Explain how you can apply the power method to B := (zI-A)^{-1} to approximate \lambda_1 (the closest eigenvalue to z).

Here, we apply this idea to the same 1000 \times 1000 matrix as before to find the closest eigenvalue to zero.

converged in 34 iterations
eigenvalue closest to z=0 is λ ≈ 0.06388010931076409, 
residual = 7.3399398262554825e-9

Answer.

1. If v_1 is an eigenvector of A with eigenvalue \lambda_1 then for all \alpha \not=0 we have A(\alpha v_1) = \alpha Av_1 = \alpha \lambda_1 v_1 = \lambda_1 (\alpha v_1) and so \alpha v_1 is an eigenvector of A with eigenvalue \lambda.

2. If x^{(0)} = \sum_i c_i v_i, then Ax^{(0)} = \sum_i c_i Av_i = \sum_i \lambda_i c_i v_i, therefore, by induction, we have

\begin{align} A^k x^{(0)} &= A A^{k-1} x^{(0)} = \sum_i \lambda_i^{k-1} c_i Ax^{(0)} \nonumber\\ % &= \sum_i \lambda_i^{k} c_i x^{(0)} \nonumber\\ % &= \lambda_1^{k} c_1 x^{(0)} + \sum_{i\not=1} \lambda_i^{k} c_i x^{(0)} \nonumber\\ % &= \lambda_1^{k} \Big[ c_1 x^{(0)} + \sum_{i\not=1} (\frac{\lambda_i}{\lambda_1})^{k} c_i x^{(0)} \Big]. \end{align}

3. (\lambda_1, v_1) are unknown! We are trying to approximate these things.

4. This is a straight line on a semi-log plot and so we have exponential decay: O(\rho^{-n}) for some \rho > 1 (if y = C \rho^{-n} then \log y = -n \log \rho + \log C). In fact, we have \rho = \left|\frac{\lambda_1}{\lambda_2}\right| where \lambda_2 is the second largest eigenvalue in absolute value.

5. The eigenvalues of B are (z-\lambda_i)^{-1} with

\begin{align} \left|(z-\lambda_1)^{-1}\right| > \left|(z-\lambda_2)^{-1}\right| > \cdots. \end{align}

As a result (z-\lambda_1)^{-1} is the dominant eigenvalue of B and applying the power method to B converges to \sigma_1 \approx (z-\lambda_1)^{-1} and we can recover the eigenvalue of A closest to z by rearranging this equation:

\begin{align} \lambda_1 \approx z-\frac{1}{\sigma_1}. \end{align}