40 A4: Solutions

Assignment 4: Discrete Laplacian and Sparse Arrays

Published

March 23, 2026

using LinearAlgebra, Plots, LaTeXStrings

Suppose we have a uniform grid \{x_{j}\} on [a,b] so that x_{j} = a + \frac{b-a}{n} j for j=0,\dots,n. We will use h := \frac{b-a}{n} to denote the grid spacing.

Exercise 1. Write down the Taylor expansion of u(x_j \pm h) about x_j and hence show that

\begin{align} u''(x_j) = \frac{u(x_j - h) - 2 u(x_j) + u(x_j+h)}{h^2} + O(h^2), \nonumber \end{align}

as h\to0.

The Taylor expansion of u(x_j\pm h) about x_j is

\begin{align} u(x_j\pm h) = u(x_j) \pm h u'(x_j) + \frac{h^2}{2} u''(x_j) \pm \frac{h^3}{6}u'''(x_j) + O(h^4). \end{align}

Therefore, we have

\begin{align} u(x_j-h) + u(x_j+h)= 2u(x_j) + h^2 u''(x_j) + O(h^4). \end{align}

The result follows by rearranging this.

Exercise 2. Define u_j := u(x_j). Explain why we can approximate the second derivative of u at the interior points \{x_j\}_{j=1}^{n-1} by the vector

\begin{align} L \begin{pmatrix} u_1 \\ u_2 \\ \vdots \\ u_{n-2}\\ u_{n-1} \end{pmatrix} % \quad \text{where}\quad % L := \frac{1}{h^2} \begin{pmatrix} -2 & 1 & & & \\ 1 & -2 & 1 & & \\ & 1 & -2 & \ddots \\ & & \ddots & \ddots & 1\\ & & & 1 & -2 \end{pmatrix}.\nonumber \end{align}

By the first question, we have

\begin{align} u''(x_j) &\approx \frac{u(x_j - h) - 2 u(x_j) + u(x_j+h)}{h^2} \nonumber\\ % &= \frac{u_{j-1} - 2 u_j + u_{j+1}}{h^2}. \end{align}

The Laplacian is \Delta u := \sum_{i} \frac{\partial^2 u}{\partial x_i^2} and reduces to u'' in 1 spatial dimension. L is a discrete version of this and is known as a discrete Laplacian (with homogeneous Dirichlet boundary conditions). Suppose we want to solve -u''(x) = f(x) on [a,b] with the boundary conditions u(a) = 0 and u(b) = 0 (this e.g. models the deformation of a beam under external forces f, ….). By defining f_j := f(x_j), we may approximate the solution of -u''(x) = f(x) on the grid \{x_j\} by solving the linear system of equations - L u = f. We did this last semester (in Assignment 8). Since L is tridiagonal, one can compute the LU decomposition of L using only O(n) operations (we had a presentation on this - see Thomas’ Algorithm for a recap). Here is an implementation (for large n):

n = 10000
f(x) = -x^3
x = [j/n for j ∈ 1:n-1] 
L_dense = diagm(-1 => ones(n-2), 0 => -2ones(n-1), 1 => ones(n-2))
t_dense = @elapsed u = ( -L_dense \ f.(x) )/n^2
println("Solving the linear system of equations took $t_dense seconds")
plot(x, u; title=L"numerical solution of $-u\prime\prime = f$", legend=false)

Solving the linear system of equations took 14.2946097 seconds

Exercise 3. Print L_dense in the following cell. How many entries of this matrix are being stored in memory?

L_dense

9999×9999 Matrix{Float64}:
 -2.0   1.0   0.0   0.0   0.0   0.0  …   0.0   0.0   0.0   0.0   0.0   0.0
  1.0  -2.0   1.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   1.0  -2.0   1.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   1.0  -2.0   1.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   1.0  -2.0   1.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   1.0  -2.0  …   0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   1.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0  …   0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  ⋮                             ⋮    ⋱               ⋮                
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0  …   0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      1.0   0.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0     -2.0   1.0   0.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      1.0  -2.0   1.0   0.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0  …   0.0   1.0  -2.0   1.0   0.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   1.0  -2.0   1.0   0.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   1.0  -2.0   1.0
  0.0   0.0   0.0   0.0   0.0   0.0      0.0   0.0   0.0   0.0   1.0  -2.0

The matrix is 9999\times9999 and each of the 99,980,001 entries is stored. This format stores all the “structural zeros” (which is a bad idea!)

We can use Base.summarysize() to find out how much memory this is using:

bytes = Base.summarysize(L_dense)
println("Size: $(round(bytes/1024^2,digits=2)) MB")

Size: 762.79 MB

The problem here is that we are storing lots of zeros (that we don’t need to!). Instead, we should use a “sparse format” where we only store nonzero entries (and where they are). To do this, you need to use the package SparseArrays:

using SparseArrays

Here, we can instead define L to be the discrete Laplacian but as a sparse matrix:

L = spdiagm(-1 => ones(n-2), 0 => -2ones(n-1), 1 => ones(n-2))

9999×9999 SparseMatrixCSC{Float64, Int64} with 29995 stored entries:
⎡⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎤
⎢⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⎥
⎢⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⎥
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⎦

L represents the same matrix as L_dense but requires a lot less memory to represent it:

bytes = Base.summarysize(L)
println("Size: $(round(bytes/1024^2,digits=2)) MB")

Size: 0.53 MB

Exercise 4. As a percentage, how much memory have we saved by storing L in a sparse format?

We have saved

println( round(100*(762.79 - 0.53)/762.79,digits=2), "%" )

99.93%

It turns out that Julia is storing L in the following Compressed Sparse Column (CSC) format where three vectors are stored:

nzval (non-zero values) : the actual numbers that are in the matrix (length is the number of nonzero entries),
rowval (row indices) : integer vector of row indices corresponding to each of the entries in nzval (has the same length as nzval),
colptr (column pointers) : integer vector of length n+1 (where the matrix is n\times n) that tells you which entries in nzval and rowval correspond to which column: colptr[1] = 1 and colptr[k+1] = colptr[k] + (the number of nonzero entries in column k).

That is, if L_{ij} = a then there exists k such that

nzval[k] = a,
rowval[k] = i, and
colptr[j] ≤ k < colptr[j+1]

Exercise 5. Write down a valid (nzval, rowval, colptr) representation of the matrix

\begin{align} \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 2 & 0 & 1 & 0 \\ 0 & 0 & 0 & 3 \end{pmatrix} \end{align}

We have

nzval = [1, 2, 1, 1, 1, 3]
rowval = [1, 3, 2, 3, 2, 4]
colptr = [1, 3, 4, 5, 7]
A = SparseMatrixCSC(4, 4, colptr, rowval, nzval)

4×4 SparseMatrixCSC{Int64, Int64} with 6 stored entries:
 1  ⋅  ⋅  ⋅
 ⋅  1  ⋅  1
 2  ⋅  1  ⋅
 ⋅  ⋅  ⋅  3

This is a common mistake:

nzval = [1, 1, 1, 2, 1, 3]
rowval = [1, 2, 2, 3, 3, 4]
colptr = [1, 3, 4, 5, 7]
A = SparseMatrixCSC(4, 4, colptr, rowval, nzval)

4×4 SparseMatrixCSC{Int64, Int64} with 6 stored entries:
 1  ⋅  ⋅  ⋅
 1  1  ⋅  ⋅
 ⋅  ⋅  2  1
 ⋅  ⋅  ⋅  3

nzval = [1, 2, 1, 1, 1, 3]
rowval = [0, 2, 1, 2, 1, 3]
colptr = [0, 2, 3, 4, 6]
A = SparseMatrixCSC(4, 4, colptr, rowval, nzval)

ArgumentError: 1 == colptr[1] != 1

Stacktrace:
 [1] (::SparseArrays.var"#throwstart#2")(ckp::Int64)
   @ SparseArrays C:\Users\math5\.julia\juliaup\julia-1.11.6+0.x64.w64.mingw32\share\julia\stdlib\v1.11\SparseArrays\src\sparsematrix.jl:142
 [2] sparse_check
   @ C:\Users\math5\.julia\juliaup\julia-1.11.6+0.x64.w64.mingw32\share\julia\stdlib\v1.11\SparseArrays\src\sparsematrix.jl:147 [inlined]
 [3] SparseMatrixCSC(m::Int64, n::Int64, colptr::Vector{Int64}, rowval::Vector{Int64}, nzval::Vector{Int64})
   @ SparseArrays C:\Users\math5\.julia\juliaup\julia-1.11.6+0.x64.w64.mingw32\share\julia\stdlib\v1.11\SparseArrays\src\sparsematrix.jl:38
 [4] top-level scope
   @ In[27]:4

1 - (6n-10)/(2(n-1)^2)

0.999699959996

Notice that, if A = [u_1 | u_2 | \cdots | u_n] where u_j \in \mathbb R^n is column j of A, then

\begin{align} (Ax)_i = \sum_{j=1}^n A_{ij} x_j = \sum_{j=1}^n (u_{j})_i x_j \end{align}

This leads to an efficent algorithm for doing matrix-vector multiplication on sparse matrices:

Exercise 6. Use the following code as a starting point to implement matrix-vector multiplication (you only need to add code between “##########” )

function multiply_sparse( A::SparseMatrixCSC, x::Vector )
    evals = 0
    m,n = size(A)
    b = zeros(m)
    nzval, rowval, colptr = A.nzval, A.rowval, A.colptr

    # sum over the columns
    for j ∈ 1:n

        # if x[j] = 0, we don't do anything because (u_j)_i x_j = 0
        if (x[j] != 0)

            for k ∈ colptr[j]:(colptr[j+1] - 1)

                ##### COMPLETE THIS ######### 
                # get row index i 
                # get value v = A[i,j] by only using nzval 
                # add v * x[j] to b[i] 
                ############################
                i = rowval[k]
                v = nzval[k]
                b[i] += v * x[j]
                evals += 1
                
            end
        end
    end
    return b, evals
end

# don't change this
y = randn(n-1) 
b, evals = multiply_sparse( L, y )
b == L*y ? println("success!") : println("your implementation is incorrect: multiply_sparse(L,y) ̸= Ly")
println("matrix vector multiplication in $evals mulitplications and additions")

success!
matrix vector multiplication in 29995 mulitplications and additions

Exercsie 7. How many multiplication and additions are required to do the matrix multiplication L x if L is stored in a dense format? What is the percentage saving in the number of operations required?

L is 9999 \times 9999 so dense matrix-vector multiplication requires 9999^2 = 99,980,001 multiplications and additions. The percentage saving is

println( round(100*(9999^2-29995)/(9999^2),digits=2), "%"  )

99.97%

Here, we can see that sparse matrix-vector operations are also much faster:

                                                          time taken
dense matrix-vector multiplication                        = 1314.6 ms
sparse matrix-vector multiplication (using your function) = 0.22 ms
sparse matrix-vector multiplication (built-in function)   = 0.13 ms 

• sparse matrix-vector multiplication is 9745× faster!
• your code above (without any fancy optimisation)
  is 5903× faster than the naive method!

Here, we numerically solve -u''(x) = f(x) using the exact same code as above but now use the sparse format for L:

t = @elapsed u = ( -L \ f.(x) )/n^2
println("Solving the linear system of equations took $t seconds\nThat is $(round(Int, t_dense/t))× faster than before!")
plot(x, u; title=L"numerical solution of $-u\prime\prime = f$", legend=false)

Solving the linear system of equations took 5.8136968 seconds
That is 2× faster than before!

Exercise 8. Explain why storing matrices in a sparse format is also convienent for implementating iterative methods.

We have implemented a sparse matrix-vector method. This is all that is required for the iterative methods with have described in class. Go back through the lecture notes and convince yourself that, in order to implement conjugate gradient (and the other iterative methods we have seen), we only need (an efficient way) to multiply matrices and vectors.

It turns out that, when solving the -\Delta u = f in higher dimensions, it becomes prohibitively expensive to solve using direct methods. If we have n+1 grid points per dimension (we use a uniform grid \{(x_{j_1}, \cdots, x_{j_d})\} where x_j = \frac{j}{n} and 0 \leq j \leq n), then the size of the linear system we must solve is (n-1)^d \times (n-1)^d and the computational cost involved in a LU decomposition is O(n^{3d-2}). Solving the Laplace equation numerically on [0,1]^2 would make a nice presentation/poster/paper - see here for some hints!

Other Links