6  Multistep Methods

Lecture 6

Published

February 11, 2026

Overview
  1. Recap,
  2. Adaptive RK methods,
  3. Numerical examples,
  4. Multistep methods
Note

I encourage you to play around with the juptyer notebook for this lecture - you can copy the code with the this notebook button on the side of this page.

Warning

These notes are mainly a record of what we discussed and are not a substitute for attending the lectures and reading books! If anything is unclear/wrong, let me know and I will update the notes.

Tip

This lecture is mostly based on Burden, Faires, and Burden (2015) section 5.6 and Driscoll and Braun (2018) section on zero stability,

6.1 Recap

Recall that we are considering IVPs: seek u :[0,T] \to \mathbb R such that

\begin{align} &u(0) = u_0 \nonumber\\ &u'(t) = f\big( t, u(t) \big). \tag{IVP} \end{align}

We will suppose that f is continuous on [0,T] \times \mathbb R and Lipschitz in the second argument and so the problem is well-posed and there exists a unique solution to (\text{IVP}). Recall one-step methods:

\begin{align} u_{j+1} = u_j + h \phi(t_j, u_j; h) \tag{1-step} \end{align}

where t_j = j \frac{T}{n} = j h define the mesh. We saw last week that, if the local truncation error is \tau_{j+1} = O(h^p) as h\to 0 (i.e. if the method is consistent with order of accuracy p) and |\frac{\partial \phi}{\partial u}| \leq L on [0,T] \times \mathbb R, then we have the error bound

\begin{align} |u(t_j) - u_j| \leq \frac{Ch^p}{L} ( e^{Lt_j} - 1). \end{align}

Then, we looked at Runge–Kutta methods:

Definition 6.1 A general s-stage Runge-Kutta method is of the form:

\begin{align} k_1 &= h f(t_j, u_j) \nonumber\\ k_2 &= h f(t_j + c_1 h, u_j + a_{11} k_1) \nonumber \\ k_3 &= h f(t_j + c_2 h, u_j + a_{21} k_1 + a_{22} k_2) \nonumber\\ \vdots& \nonumber\\ k_s &= h f(t_j + c_{s-1} h, u_j + a_{s-1,1} k_1 + \dots + a_{s-1,s-1} k_{s-1}) \nonumber\\ % u_{j+1} &= u_{j} + b_1 k_1 + b_2 k_2 + \dots + b_s k_s. \nonumber \end{align}

This Runge-Kutta method is conveniently written as the Butcher tableau:

\begin{array} {c|cccc} 0\\ c_1 & a_{11} \\ c_2 & a_{21} & a_{22} \\ \vdots & \vdots& \vdots & \ddots& \\ c_{s-1} & a_{s-1,1}& a_{s-1,2} & \cdots & a_{s-1,s-1} \\ \hline & b_1 & b_2 & \dots & b_{s-1} & b_s \end{array}

Example 6.1 Show that all 2 stage, order 2 Runge-Kutta methods have a Butcher tableau of the form

\begin{array} {c|cccc} 0\\ \tfrac{1}{2b} & \tfrac{1}{2b} \\ \hline & 1-b & b \end{array}

for some b \not= 0.

Example 6.2 Show that \sum_{j=1}^s b_j = 1 implies the Runge-Kutta method given by the Butcher tableau above is consistent.

We saw a popular 4 stage order 4 method:

\begin{align}\tag{RK4} \begin{array} {c|cccc} 0\\ \frac12 & \frac12 \\ \frac12 & 0 & \frac12 \\ 1 & 0 & 0 & 1 \\ \hline & \frac16 & \frac13 & \frac13 & \frac16 \end{array} \end{align}

We then implemented a general Runge-Kutta method:

function RK( u0, f, T, n, A, b, c )
    h, t = T/n, 0:T/n:T    
    u = fill(float(u0), n+1)
    s, = size(A)
    s = s+1 
    k = fill(float(u0), s)
    for j = 1:n
        k[1] = h * f( t[j], u[j] )
        for i = 1:s-1 
            k[i+1] = h * f( t[j] + h * c[i], u[j] + sum( A[i,ℓ] * k[ℓ] for 1:i ) )
        end
        u[j+1] = u[j] + sum( b[ℓ] * k[ℓ] for 1:s )
    end
    return u
end 
RK (generic function with 1 method)

Then, we saw examples of error curves for order 1 (Euler), order 2, and order 4 methods. We plotted the errors as a function of both n and function evaluations.

6.2 Adaptive Runge-Kutta

In adaptive RK methods, we want to choose a smaller step size where the error between the true solution and the numerical approximation is large. However, we do not have access to the true error (because the true solution is unknown) and so one must use error estimators. We consider two methods of order p and p+1 to compute u_{j+1} and \widetilde{u}_{j+1}, respectively. If E_j(h) := |\widetilde{u}_{j+1} - u_{j+1}| is “small” (relative to some tolerance \rm tol), we accept \widetilde{u}_{j+1} whereas, if E_j is “large” we choose a new step size qh and recompute u_{j+1} and \widetilde{u}_{j+1} with this new step size and repeat. At first glance, it looks like the computational cost in doing this is very large: (i) we are computing two RK methods in the first place, and (ii) we are using function evaluations and then discarding the outputs (if the step size is too large).

Using so-called embedded RK methods reduces the (potential) problem in (i) by reusing all the stages of the order p method in the order p+1 method: The following is the Bogacki-Shampine (BS23) method of order 2/3:

\begin{align}\tag{BS23} \begin{array} {c|cccc} 0\\ 1/2 & 1/2 \\ 3/4 & 0 & 3/4 \\ 1 & 2/9 & 1/3 & 4/9 \\ \hline \widetilde{\bm b} := & 2/9 & 1/3 & 4/9 & 0\\ \hline \bm b := & 7/24 & 1/4 & 1/3 & 1/8 . \end{array} \end{align}

What we mean in the final two lines is that

\begin{align} u_{j+1} &= u_j + b_1 k_1 + b_2 k_2 + b_3 k_3 + b_4 k_4 \nonumber\\ \widetilde{u}_{j+1} &= u_j + \widetilde{b}_1 k_1 + \widetilde{b}_2 k_2 + \widetilde{b}_3 k_3 \nonumber\\ \end{align}

Exercise 6.1 How many function evaluations do we need to compute u_{j+1} and \widetilde{u}_{j+1} from u_j?

We implement a general adaptive RK method:

function ARK( u0, f, T; 
    A=[1/2 0 0 ; 0 3/4 0 ; 2/9 1/3 4/9], 
    b=[2/9 1/3 4/9 0; 7/24 1/4 1/3 1/8], 
    c=[1/2 3/4 1], tol=1e-5, p=2, α=0.84 )
    
    t, u = [0.], [float(u0)]
    s, = size(A)
    s = s+1 
    k = fill(float(u0), s)
    i, h, evals = 1, 1e-1, 0
    
    while t[i] < T
        if h < 1e-15
            @warn "Step size is too small, t=$(t[i])"
            break  
        end

        k[1] = h * f( t[i], u[i] )
        for j = 1:s-1 
            k[j+1] = h * f( t[i] + h * c[j], u[i] + sum( A[j,ℓ] * k[ℓ] for 1:j ) )
        end
        evals = evals + s
        next_u_1 = u[i] + sum( b[2,ℓ] * k[ℓ] for 1:s )
        next_u_2 = u[i] + sum( b[1,ℓ] * k[ℓ] for 1:s )
        E = abs(next_u_1 - next_u_2)

        if E < tol
            push!(t, t[i] + h)
            push!(u, next_u_2)
            i += 1
        end

        q = min( α * (tol/E)^(1/(p+1)), 5 )
        h = min(q*h, T-t[i]) 
    end
    return t, u, evals
end
ARK (generic function with 1 method)

In the above code, if A,b,c are not specified, the default is (\text{BS23}).

Example 6.3 (Example 6.5.1 Driscoll and Braun (2018)) Here, we consider

\begin{align} u(0) &= 0 \nonumber\\ u'(t) &= e^{t - u(t) \sin u(t)}. \end{align}

First, we plot the numerical solution found using Tsit5() with a very small tolerence (so that we have something to compare our numerical methods to) together with our adaptive RK method:

u0, T = 0., 5.
f(t,u) = (u == Inf) ? Inf : exp( t - u * sin(u) )

ivp = ODEProblem((u,p,t)->f(t,u), u0, (0., T))
sol = solve(ivp, Tsit5(), reltol=1e-15, abstol=1e-15)

t, u, evals = ARK(u0,f,T; tol=1e-2) 

plot(sol, 
    label="Tsit5()", 
    xlabel=L"t",  ylabel=L"u(t)")

plot!(t, u, m=3, label="BS23 (f-evals = $evals)")

Notice that the derivative of the solution becomes very large around t\approx \frac{3\pi}{2} (Exercise: Why?). In these regions, our adaptive method chooses a smaller step size (as expected). Let’s compare this to RK4 with the same number of function evaluations:

n = Int(evals/4)

P = plot(sol, 
    label="Tsit5()", 
    xlabel=L"t",  ylabel=L"u(t)", ylims=(0,maximum(sol.u)))

# RK4
A = [1/2 0 0 ; 0 1/2 0 ; 0 0 1]
b = [1/6 1/3 1/3 1/6]
c = [1/2 1/2 1]

u = RK( u0, f, T, n, A, b, c)
plot!( 0:T/n:T, u, label="RK4 (n = $n)", m=3)

display(P)

Let’s plot the error (when compared to the Tsit5() with a low tolerance) against the number of function evalations for the different methods that we have seen

nn = [round(Int, 10^ℓ) for ℓ  2:0.5:6]
errs = zeros(length(nn),6)
i = 0

for (ℓ,n)  enumerate(nn)
    mesh = 0:T/n:T
    u1 = RK( u0, f, T, n, [], [1], [])              # Euler
    u2 = RK( u0, f, T, n, [1/2], [0, 1], [1/2])     # Midpoint
    u3 = RK( u0, f, T, n, [1], [1/2, 1/2], [1])     # Modified Euler 
    u4 = RK( u0, f, T, n, [2/3], [1/4, 3/4], [2/3]) # Heun 
    u5 = RK( u0, f, T, n, A, b, c)                  # RK4
    errs[ℓ,1] = maximum( @. abs( u1 - sol(mesh) ) )
    errs[ℓ,2] = maximum( @. abs( u2 - sol(mesh) ) )
    errs[ℓ,3] = maximum( @. abs( u3 - sol(mesh) ) )
    errs[ℓ,4] = maximum( @. abs( u4 - sol(mesh) ) )
    errs[ℓ,5] = maximum( @. abs( u5 - sol(mesh) ) )
end

Aerrs, evals = [], []                              # BS23
for 2:13
    t, u6, feval = ARK(u0,f,T; tol=10.0^(-ℓ)) 
    push!( evals, feval )
    push!( Aerrs, maximum( @. abs( u6 - sol(t) ) ) )
end

plot([nn 2nn 2nn 2nn 4nn], errs[:,1:5];
    m=4, label=["Euler (p=1)" "Midpoint (p=2)" "Modified Euler (p=2)" "Heun (p=2)" "RK4 (p=4)"], 
    xaxis=(:log10, "function evaluations"), yaxis=(:log10, "error"), ylims=(1e-11,1e5), 
    legend=:bottomleft)
plot!(evals, Aerrs, m=3, label="BS23")
plot!( nn, (1/2)*errs[end,1]*(nn/nn[end]).^(-1), primary=false, linestyle=:dash )
plot!( 2*nn, (1/2)*errs[end,2]*(nn/nn[end]).^(-2), primary=false, linestyle=:dash )
plot!( nn, 1e6*(nn).^(-3), primary=false, linestyle=:dash )
plot!( 4*nn, (1/2)*errs[end,5]*(nn/nn[end]).^(-4), primary=false, linestyle=:dash )

Exercise 6.2 Discuss the previous plot with the person next to you. Is the order of accuracy always the best measure of convergence?

6.3 Multistep methods

Burden, Faires, and Burden (2015) section 5.6

All the methods we have seen so take the form u_{j+1} = u_j + h \phi( t_j, u_j; h). These are known as 1-step methods because u_{j+1} is only given as a function of 1 previously computed value u_j. Multistep methods hope to improve efficiency by using the history of the solution \{u_j, u_{j-1}, \dots, u_{j-m+1}\} for some m \in \mathbb N to compute the next term u_{j+1}. Linear multistep methods produce u_{j+1} as a linear combination of \{ u_{i} \} and \{ f(t_{i}, u_{i}) \}:

Definition 6.2 (Linear multistep method) A linear m-step method is of the form:

\begin{align} u_{j+1} &= \alpha_{m-1} u_{j} + \alpha_{m-2} u_{j-1} + \cdots + \alpha_{0} u_{j-m+1} \nonumber\\ % &+ h \big[ \beta_m f(t_{j+1}, u_{j+1}) + \beta_{m-1} f(t_{j}, u_{j}) + \cdots + \beta_0 f(t_{j-m+1}, u_{j-m+1})\big] \end{align}

for some coefficients \alpha_0,\dots,\alpha_{m-1} and \beta_0, \dots, \beta_{m}.

If \beta_m = 0, the method is explicit (u_{j+1} is given explicitly),

If \beta_m \not= 0, the method is implicit (u_{j+1} solves a non-linear equation that must be solved).

Note

For 1-step methods, we started from u_0 given by the equation.

Now, we need to initiate u_0,\dots,u_{m-1} in order to define u_{m}. This can be done by e.g. using a RK method.

We can store the history f(t_{j-1}, u_{j-1}), \dots, f(t_{j-m+1}, u_{j-m+1}) so that (in the explicit case) to evaluate u_{j+1} we only need to compute one function evaluation f(t_{j}, u_{j}).

Example 6.4 (Adams-Bashforth (AB)) AB methods are explicit (\beta_{m} = 0) with \alpha_{m-1} = 1 and \alpha_{m-2} = \cdots = 0. They are chosen to have order p=m and denoted \text{(AB}p\text{)}:

\begin{align} \tag{AB1} &m=1, \quad \beta_0 = 1 \\ \tag{AB2} &m=2, \quad \beta_1 = \frac32, \quad \beta_0 = -\frac12 \\ \tag{AB3} &m=3, \quad \beta_2 = \frac{23}{12}, \quad \beta_1 = -\frac{16}{12}, \quad \beta_0 = \frac5{12} \end{align}

Exercise 6.3 Notice that \text{(AB1)} is Euler’s method

We define the local truncation error in an analogous way to 1-step methods:

Definition 6.3 The local truncation error of a linear multistep method with coefficients \alpha_{m-1}, \dots, \alpha_0 and \beta_{m},\dots,\beta_0 is given by

\begin{align} \tau_{j+1}(h) &= \frac1h \Big[ u(t_{j+1}) - \sum_{i=0}^{m-1} \alpha_{m-1-i} u(t_{j-i}) \Big] \nonumber\\ % &\quad - \sum_{i=0}^m \beta_{m-i} f\big( t_{j+1-i}, u(t_{j+1-i}) \big) \end{align}

We say the multistep method is consistent if \max_j \tau_{j+1}(h) \to 0 as h\to 0.

We say the multistep method has order of accuracy p if \tau_{j+1}(h) = O(h^p) as h\to 0.

Exercise 6.4 Show that \text{(AB2)} has order of accuracy 2.

Example 6.5 (Adams-Moulton (AM)) AM methods are implicit (\beta_{m} \not= 0) with \alpha_{m-1} = 1 and \alpha_{m-2} = \cdots = 0. Since there is another degree of freedom (\beta_m\not=0), m-step implicit methods can have order of accuracy p = m+1:

\begin{align} \tag{AM1} &m=1, \quad \beta_1 = 1, \quad \beta_0 = 0 \\ \tag{AM2} &m=1, \quad \beta_1 = \frac12, \quad \beta_0 = \frac12 \\ \tag{AM3} &m=2, \quad \beta_2 = \frac{5}{12}, \quad \beta_1 = \frac{8}{12}, \quad \beta_0 - \frac1{12} \end{align}

Here, we use the naming convention (\text{AM}p) where p is the order of accuracy.

Exercise 6.5 Write down the formulas for (\text{AM}1), (\text{AM}2), (\text{AM}3).

Note

(\text{AM}1) is also called backward’s Euler because you can derive it from u'(t_{j+1}) = f(t_{j+1}, u_{j+1}) and replace the derivative with the backwards difference \frac{u_{j+1} - u_j}{h}.

(\text{AM}2) is also called the trapezoid rule or Crank-Nicolson

TipTO-DO
  • Assignment 2: Due today 11:59 PM,
  • Make sure you are up to date with Burden, Faires, and Burden (2015) sections 5.1 - 5.5
  • Read: Burden, Faires, and Burden (2015) 5.6, Driscoll and Braun (2018) zero stability
  • (Sign-up to office hours if you would like)

Next: (zero) stability and convergence of multistep methods.

Burden, Richard L., Douglas J. Faires, and Annette M. Burden. 2015. Numerical Analysis. 10th ed. CENGAGE Learning.
Driscoll, Tobin A, and Richard J Braun. 2018. Fundamentals of Numerical Computation. https://fncbook.com/.