7) FD Solutions

7) FD Solutions#

Last time:

Introduction to Numerical Analysis
Accuracy, Consistency, Stability, Convergence
Lax equivalence theorem

Today:

Measuring errors
Stability
Consistency
Second order derivatives
Matrix representations and properties
Second order derivative with Dirichelet boundary conditions
Discrete Green’s functions
Interpolation by Vandermonde matrices
High-order discretization of the Laplacian
Method of manufactured solutions
Second order derivative with Neumann boundary conditions

1. Measuring errors#

Recap:

Last time we looked at the error of the forward difference, the backward difference, and the centered difference

using Plots
using LaTeXStrings
default(linewidth=3)
default(legendfontsize=12)

# These are inline functions: take arrays x and u, spit out an array of differences u' over array x.
diff1l(x, u) = x[2:end],   (u[2:end] - u[1:end-1]) ./ (x[2:end] - x[1:end-1])
diff1r(x, u) = x[1:end-1], (u[2:end] - u[1:end-1]) ./ (x[2:end] - x[1:end-1])
diff1c(x, u) = x[2:end-1], (u[3:end] - u[1:end-2]) ./ (x[3:end] - x[1:end-2])
difflist = [diff1l, diff1r, diff1c]

n = 40 
h = 2 / (n - 1)
x = LinRange(-3, 3, n)
u = sin.(x)
fig = plot(cos, xlims=(-3, 3), label = L"f'(x)=\cos(x)")
for d in difflist
    xx, yy = d(x, u)
    plot!(fig, xx, yy, marker=:circle, label=d)
end

fig

It’s time for a worked exercise.

What does this code do?
What property of the difference formulas (methods) are we measuring? (hint: error vs N is ?)

using LinearAlgebra

# We are measuring here the CONVERGENCE of these methods

# How many gridpoints we use: 2^2, 2^3, ..., 2^10
grids = 2 .^ (2:10)  # dot operator vectorizes, so ".^" is just vectorized "^" over all entries of a collection/tuple/array
# Grid spacing: 2^(-2), 2^(-3), ... 2^(-10)
hs = 1 ./ grids 

# We define a function to measure the convergence of these methods
# The func takes 3 arguments: `f` and `fprime` are callables (functions)
# and `d` is an array containing numerical differences (derivatives)
function refinement_error(f, fprime, d) 
    error = []
    # Loop over the grids as they get more refined
    for n in grids
        x = LinRange(-3, 3, n) # domain
        # Compute numerical derivative for this grid
        xx, yy = d(x, f.(x))
        # Compute error and add it to an array
        push!(error, norm(yy - fprime.(xx), 2)/sqrt(n)) # push! = add to array; uses a normalized 2-norm, as Root Mean Square (RMS) 
    end
    # Return the array of errors
    error
end

refinement_error (generic function with 1 method)

fig = plot(xscale=:log10, yscale=:log10)
for d in difflist
    error = refinement_error(sin, cos, d)
    plot!(fig, hs, error, marker=:circle, label=d)
end
plot!(fig, hs, [hs hs .^ 2], label=["h" "\$h^2\$"])

Note

Different norms can be defined.
The Euclidean norm is also called the quadratic norm, \(L^2\) norm, \(ℓ^2\) norm (for vectors that have an infinite number of components, like sequences), \(2\)-norm, or square norm. This is a special case (\(p=2\)) of the \(L^p\) norm for \(L^p\)-spaces.

What happens if we use a 1-norm, 2-norm, or Inf-norm?

function refinement_error(f, fprime, d) 
    error = []
    for n in grids
        x = LinRange(-3, 3, n)
        xx, yy = d(x, f.(x))
        push!(error, norm(yy - fprime.(xx), 1)) # uses a L-1 norm
    end
    error
end

refinement_error (generic function with 1 method)

fig = plot(xscale=:log10, yscale=:log10)
for d in difflist
    error = refinement_error(sin, cos, d)
    plot!(fig, hs, error, marker=:circle, label=d)
end
plot!(fig, hs, [hs hs .^ 2], label=["h" "\$h^2\$"])

The Root Mean Square (RMS)-like error used in the previous example measures the average size of the error per grid point.
- It is like asikng: “On average, how wrong is my numerical derivative at a typical grid point?”
- This question is useful, regardless of how coarse/fine the grid is

In general, the quantity

\[ \|f\|_h = \left[ h \sum_{m=-\infty}^{\infty} \left| f_{m} \right|^2 \right]^{1/2} \]

is the \(L^2\) norm of the grid function \(f\), and is a measure of the size (energy) of the solution.

The multiplication by \(h \equiv \Delta x\) is needed so that the norm is not sensitive to grid refinements (the number of points increases as \(h\rightarrow 0\)).

2. Stability#

Activity

Read What is Numerical Stability? and discuss in small groups
Share insights in class discussion

Backward error from the FNC online book

Source: FNC: backward error

(Forward) Stability#

“nearly the right answer to nearly the right question”

\[ \frac{\lvert \tilde f(x) - f(\tilde x) \rvert}{| f(\tilde x) |} \in O(\epsilon_{\text{machine}}) \]

for some \(\tilde x\) that is close to \(x\). Recall that \(\tilde f\) is the computed, approximate solution and \(f\) is the analytical one.

Backward Stability#

“exactly the right answer to nearly the right question”

\[ \tilde f(x) = f(\tilde x) \]

for some \(\tilde x\) that is close to \(x\)

Note:

Every backward stable algorithm is (\(\implies\)) stable.
Not every stable algorithm is backward stable.
The difference is in the focus: forward analysis is concerned with what the method reaches, while backward analysis looks at the problem being solved (which is why we can speak of ill-conditioned methods and ill-conditioned problems).
In a backward stable algorithm the errors introduced during the algorithm have the same effect as a small perturbation in the data.
If the backward error is the same size as any uncertainty in the data then the algorithm produces as good a result as we can expect.
An algorithm is backwards stable if for a small input rel error you get a small output rel error
= “You get a nearly correct answer to the nearly correct question”

Question:

Are there “rough” functions for which these formulas estimate \(u'(x_i) = 0\)?

x = LinRange(-1, 1, 9)
f_rough(x) = cos(.1 + 4π*x)
fp_rough(x) = -4π*sin(.1 + 4π*x)

plot(x, f_rough, marker=:circle, label = L"f(x_i) =-4 \pi \sin(0.1 + 4 \pi x_i)") # Sparse sampling of the function
plot!(f_rough, label = L"f(x) =-4 \pi \sin(0.1 + 4 \pi x)")

fig = plot(fp_rough, xlims=(-1, 1), label = L"f'(x)=-16 \pi^2 \cos(0.1 + 4 \pi x )")
for d in difflist
    xx, yy = d(x, f_rough.(x))
    plot!(fig, xx, yy, label=d, marker=:circle)
end
fig

If we have a solution \(u(x)\), then \(u(x) + f_{\text{rough}}(x)\) is indistinguishable to our FD method.
Therefore, given a small input relative error, we can get a large output relative error.
There do not exist “bad” functions that also satisfy the equation.
The solution does not “blow up” for time-dependent problems.
Definition here is intentionally vague, and there are more subtle requirements for problems like incompressible flow.

3. Consistency#

When we apply the differential operator to the exact solution, we get a small residual.
The residual converges under grid refinement.
Hopefully fast as \(h \to 0\)

Recall, one part of Lax equivalence theorem: Consistency + Stability \(\implies\) Convergence

4. Second order derivatives#

We can compute a second derivative by applying first derivatives twice, but which ones?

Example 7.1#

function diff2a(x, u)
    # Compute the second-derivative as a centered difference of a centered difference
    xx, yy = diff1c(x, u)
    diff1c(xx, yy)
end

diff2a (generic function with 1 method)

function diff2b(x, u)
    # Compute the second-derivative as a backward (left) difference of a forward (right) difference 
    xx, yy = diff1l(x, u)
    diff1r(xx, yy) 
end

diff2b (generic function with 1 method)

diff2list = [diff2a, diff2b]
n = 20
x = LinRange(-3, 3, n)
u = - cos.(x); # f(x)

fig = plot(cos, xlims=(-3, 3), label = L"f''(x)=\cos(x)")
for d2 in diff2list
    xx, yy = d2(x, u)
    plot!(fig, xx, yy, marker=:circle, label=d2)
end
fig

How fast do these approximations converge?#

grids = 2 .^ (3:10)
hs = 1 ./ grids
function refinement_error2(f, f_xx, d2)
    error = []
    for n in grids
        x = LinRange(-3, 3, n)
        xx, yy = d2(x, f.(x))
        push!(error, norm(yy - f_xx.(xx), Inf
        )) # which norm?
    end
    error
end

refinement_error2 (generic function with 1 method)

fig = plot(xlabel="h", xscale=:log10, ylabel="Error", yscale=:log10)
for d2 in diff2list
    error = refinement_error2(x -> -cos(x), cos, d2)
    plot!(fig, hs, error, marker=:circle, label=d2)
end
plot!(fig, hs, hs .^ 2, label="\$h^2\$") 

Both methods are second order accurate.
The diff2b method is more accurate than diff2a (by a factor of 4)
The diff2a method can’t compute derivatives at points adjacent the boundary.
We don’t know yet whether either is stable

5. Matrix representations and properties#

All our diff* functions thus far have been linear in u, therefore they can be represented as differentiation matrices.
\[\begin{split}\frac{u_{i+1} - u_i}{x_{i+1} - x_i} = \begin{bmatrix} -1/h & 1/h \end{bmatrix} \begin{bmatrix} u_i \\ u_{i+1} \end{bmatrix}\end{split}\]
More generally:
\[\begin{split}\begin{bmatrix} u'(x_1) \\ u'(x_2) \\ \vdots \\ u'(x_n) \end{bmatrix} = \begin{bmatrix} D_{11} & D_{12} & \ldots & D_{1n} \\ D_{21} & D_{22} & \ldots & D_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ D_{n1} & D_{2n} & \ldots & D_{nn} \end{bmatrix} \begin{bmatrix} u(x_1) \\ u(x_2) \\ \vdots \\ u(x_n) \end{bmatrix}\quad \text{or} \quad \mathbf{u'} = \mathbf{D} \mathbf{u}\end{split}\]

Example 7.2: First-derivative differentiation matrix on a uniform grid#

Let’s see how the matrix form of the first-derivative operator can look like.

Let us use a second-order accurate centered scheme for the interior points and first-order one-sided differences for the boundary points:

For the interior points, \(x_2, \ldots, x_{n-1}\), we have:

\[ u'(x_i) \approx \frac{u_{i+1} - u_{i-1}}{2 h} \]

So the interior rows of the matrix of coefficients looks like:

\[ \frac{1}{2h} [-1, 0, 1] \]

At the left boundary point, \(x_1\), we cannot use the same centered difference formula

\[ \frac{u_{2} - u_{0}}{2 h} \]

because \( u_{0}\) is outside of the domain.

Let’s use a first-order accurate forward difference

\[ u'(x_1) \approx \frac{u_{2} - u_{1}}{h} \]

For convenience, since interior point coefficients are scaled by \(\frac{1}{2h}\), we also want to scale this simlarly, therefore we multiply both the numerator and denomiantor by a factor of \(2\):

\[ u'(x_1) \approx \frac{2u_{2} - 2u_{1}}{2h} \]

Similarly, at the right boundary point, we cannot use the same centered difference formula

\[ \frac{u_{n+1} - u_{n-1}}{2 h} \]

because \( u_{n+1}\) is outside of the domain.

Let’s use a first-order accurate backard difference

\[ u'(x_n) \approx \frac{u_{n} - u_{n-1}}{h} \]

and multiply this by a factor of \(2\) to match the interior point coefficients that are scaled by \(\frac{1}{2h}\),

\[ u'(x_n) \approx \frac{2u_{n} - 2u_{n-1}}{2h} \]

"""First-derivative differentiation matrix on a uniform grid.

- interior: 2nd-order centered
- boundaries: 1st-order one-sided
"""
# Dfferentiation matrix for a first-derivative
function diff1_mat(x)
    n = length(x)
    D = zeros(n, n)
    h = x[2] - x[1]
    D[1, 1:2] = [-1/h  1/h]              # Use a first-order forward difference at the left boundary
    for i in 2:n-1
        D[i, i-1:i+1] = [-1/2h  0  1/2h] # In the interior points, use a second-order centered difference
    end
    D[n, n-1:n] = [-1/h  1/h]            # Use a first-order backward difference at the right boundary
    D
end
x = LinRange(-1, 1, 5) # with this grid, h = 1/2 => 2 h = 1
diff1_mat(x)

5×5 Matrix{Float64}:
 -2.0   2.0   0.0   0.0  0.0
 -1.0   0.0   1.0   0.0  0.0
  0.0  -1.0   0.0   1.0  0.0
  0.0   0.0  -1.0   0.0  1.0
  0.0   0.0   0.0  -2.0  2.0

n = 12
x = LinRange(-3, 3, n)
plot(x, diff1_mat(x) * sin.(x), marker=:circle, label = L"D * \sin(x)")
plot!(cos, label = L"f'(x)=\cos(x)")

How accurate is this derivative matrix?#

fig = plot(xscale=:log10, yscale=:log10, legend=:topleft)
ref_error = refinement_error(sin, cos, (x, u) -> (x, diff1_mat(x) * u))
plot!(fig, hs, ref_error, marker=:circle, label = "refinement error")
plot!(fig, hs, hs, label="\$h\$")
plot!(fig, hs, hs .^ 2, label="\$h^2\$")

But I thought we had chosen a second-order centered difference formula for the interior points? Then why do we get a lower order?

Note

In general, the approximation order of a problem is the lowest order used in any part of the problem.

Can we study it as a matrix?#

function my_spy(A)
    cmax = norm(vec(A), Inf)
    s = max(1, ceil(120 / size(A, 1)))
    spy(A, marker=(:square, s), c=:diverging_rainbow_bgymr_45_85_c67_n256, clims=(-cmax, cmax))
end

D = diff1_mat(x)
my_spy(D)

svdvals(D) # Singular values given by the SVD decomposition, in dicreasing order

12-element Vector{Float64}:
7718000318790152
7716788692501084
7708640148632901
732003875931066
5877132402714698
4519108222799693
2963624321753364
0397961559479483
9166666666666657
5413728320040285
4745015826879543
7546012240164312e-16

cond(D) # condition number of the matrix

1.5797321886816708e16

Condition number of a matrix#

The condition number of an invertible matrix \(A\) is given by

\[ \kappa(A) = \lVert A \rVert \lVert A^{-1} \rVert . \]

If we think of the action of a matrix on an input vector as performing the matrix-vector multiplication, we see that multiplying by a matrix is just as ill-conditioned of an operation as solving a linear system using that matrix.

SVD decomposition#

We have the following factorization

\[ U \Sigma V^T = A \]

where \(U\) and \(V\) have orthonormal columns and \(\Sigma\) is diagonal with nonnegative entries. The entries of \(\Sigma\) are called singular values and this decomposition is the singular value decomposition (SVD).

Singular values of a linear operator (matrix \(A\)) are the square roots of the (necessarily non-negative) eigenvalues of the self-adjoint operator \(A^{*}A\) (where \(A^{*}\) denotes the adjoint of \(A\).)

It may remind you of an eigenvalue decomposition \(X \Lambda X^{-1} = A\), but

the SVD exists for all matrices (including non-square and deficient matrices)
\(U,V\) have orthogonal columns (while \(X\) can be arbitrarily ill-conditioned).
Indeed, if a matrix is symmetric and positive definite (all positive eigenvalues), then \(U=X\) and \(\Sigma = \Lambda\).
If we think of orthogonal matrices as reflections/rotations, this says any matrix can be represented by the sequence of operations: reflect/rotate, diagonally scale, and reflect/rotate again.

open("../img/svd_viz.svg") do f
   display("image/svg+xml", read(f, String))
end

../_images/02820a7c4a34c82e5e931e5b52959f161c1a7166ebceeb8b6f19e7a497dbe365.svg

Condition number via SVD#

\[ \kappa(A) = \lVert A \rVert \ \lVert A^{-1} \rVert \]

Or, in terms of the SVD

\[ U \Sigma V^T = \texttt{svd}(A) \]

where

\[\begin{split} \Sigma = \begin{bmatrix} \sigma_{\max} && \\ & \ddots & \\ && \sigma_{\min} \end{bmatrix}, \end{split}\]

where, \((\sigma_{1}, \sigma_{2}, \ldots)\) are the non-negative real numbers called singular values of \(A\) (usually listed in decreasing order).

We have that the matrix condition number is given by:

\[ \kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}} = \texttt{cond}(A) \]

6. Second-derivative with Dirichlet boundary conditions#

Recall the Poisson equation with non-homogeneous Dirichelet boundary conditions:

(8)#\[\begin{gather} -\frac{d^2 u}{dx^2} = f(x) \quad x \in \Omega = (-1,1) \\ u(-1) = a, \quad u(1) = b . \end{gather}\]

Turn this into a linear system by replacing
- \(x \to [x_1, x_2, \ldots, x_n] = \mathbf{x}\),
- \(u(x) \to [u_1, u_2, \ldots, u_n] = \mathbf{u}\),
- \(f(x) \to [f_1, f_2, \ldots, f_n] = \mathbf{f}\),
- \(\frac{d^2}{dx^2} \to \mathbf{D}^2\).
\[ \mathbf{D}^2\mathbf{u} = \mathbf{f}\]
How to encode left boundary condition? Discuss.

The left endpoint in our example boundary value problem has a Dirichlet boundary condition,

\[u(-1) = a . \]

With finite difference methods, we have an explicit degree of freedom \(u_1 = u(x_1 = -1)\) at that endpoint. When building a matrix system for the BVP, we can implement this boundary condition by modifying the first row of the matrix,

\[\begin{split} \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ \\ & & A_{2:n,:} & & \\ \\ \end{bmatrix} \begin{bmatrix} u_1 \\ \\ u_{2:n} \\ \\ \end{bmatrix} = \begin{bmatrix} a \\ \\ f_{2:n} \\ \\ \end{bmatrix} . \end{split}\]

This matrix is now not symmetric even if the interior part \(A_{2:n-1} ( = D^2)\) is.

function laplacian_dirichlet(x)
    n = length(x)
    D = zeros(n, n)
    h = x[2] - x[1]
    D[1, 1] = 1
    for i in 2:n-1
        D[i, i-1:i+1] = (1/h^2) * [-1, 2, -1]
    end
    D[n, n] = 1
    D
end

x = LinRange(-1, 1, 5) # with this grid, h = 1/2 => h^2 = 1/4 => 1/h^2 = 4
laplacian_dirichlet(x)[2:end-1,:] # show only the interior rows

3×5 Matrix{Float64}:
 -4.0   8.0  -4.0   0.0   0.0
  0.0  -4.0   8.0  -4.0   0.0
  0.0   0.0  -4.0   8.0  -4.0

Laplacian operator as a matrix#

L = laplacian_dirichlet(x)
my_spy(L)

svdvals(L)

5-element Vector{Float64}:
960925187257008
955557775047327
7026128519505397
8933000267487782
6190524891990026

cond(L)

22.55208634298712

Solutions#

For now, the right boundary condition is also Dirichlet (we will consider Neumann later), \(u(1) = b\).

\[\begin{split} \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ \\ & & A_{2:n-1,:} & & \\ \\ \\ 0&0 &0 &0 &0 & 1 \end{bmatrix} \begin{bmatrix} u_1 \\ \\ u_{2:n-1} \\ \\ \\ u_n \end{bmatrix} = \begin{bmatrix} a \\ \\ f_{2:n-1} \\ \\ \\ b \end{bmatrix} . \end{split}\]

L = laplacian_dirichlet(x)
f = one.(x) # constant forcing
f[1] = 0 # enforcing the left BC
f[end] = 0; # enforcing the right BC
plot(x, f, label = L"f(x)")

u = L \ f # This is syntax for the direct solution of the system Lu = f (similar to MATLAB syntax)
plot(x, u, label = L"u = L^{-1} f ")

7. Discrete “Green’s functions”#

Review: Matrix inverses#

Before diving into discrete “Green’s functions”, let’s review matrix inverses from Linear Algebra.

The inverse \(A^{−1}\) of a matrix \(A\) is such that \(A A^{-1} = I\). Let’s think about what this means. Each column j of \(A^{−1}\) has a special meaning: \(A A^{-1} = I\), so

\[A \cdot (\textrm{column }j \textrm{ of }A^{-1}) = (\textrm{column }j \textrm{ of } I) = \mathbf{e}_j,\]

where \(\mathbf{e}_j\) is the unit vector \((0,0,\ldots,0,1,0,\ldots, 0)^T\) which has a \(1\) in the \(j\)-th row.

Multiplying both sides of the the expression above by \(A^{-1}\), we obtain

\[ A^{-1} A \cdot (\textrm{column }j \textrm{ of }A^{-1}) = A^{-1} \mathbf{e}_j ,\]

hence,

\[(\textrm{column } j \textrm{ of } A^{-1}) = A^{-1}\mathbf{e}_j,\]

which looks like the solution to \(A\mathbf{u} = \mathbf{f}\) with \(\mathbf{e}_j\) on theright-hand-side.

If we are solving \(A\mathbf{u} = \mathbf{f}\), the statement \(\mathbf{u} = A^{-1}\mathbf{f}\) can be written as a superposition of columns of \(A^{-1}\):

\[ \mathbf{u} = A^{-1}\mathbf{f} = \sum_j (\textrm{column }j \textrm{ of }A^{-1}) f_j = A^{-1} \sum_j f_j \mathbf{e}_j . \]

Since \(\sum_j f_j \mathbf{e}_j\) is just writing \( \mathbf{f}\) in the canonical basis of the \(\mathbf{e}_j\)’s, we can interpret \(A^{-1} \mathbf{f}\) as writing \(\mathbf{f}\) in the canonical basis of the \(\mathbf{e}_j\) unit vectors and then summing (superimposing) the solutions for each \(\mathbf{e}_j\).

Going one step further, we can write

\[ u_i = \left( A^{-1} \mathbf{f} \right)_i = \sum_j \left( A^{-1} \right)_{i,j}f_j . \]

We can now have a simple “physical” interpretation of the matrix elements of \(A^{-1} \):

\(\left( A^{-1} \right)_{i,j}\) is the “effect” (solution) at \(i\) of a “source” at \(j\) (a RHS \(\mathbf{e}_j\)) [and column \(j\) of \( A^{-1} \) is the effect (solution) everywhere from the source at \(j\)].

Laplacian matrix#

Recall that one interpretation of the Poisson’s problem

\[ \nabla^2 \mathbf{u} = \mathbf{f}, \]

is that \(\mathbf{u}\) is a displacement of a stretched string (in 1D) or membrane (in 2D) and that \(f\) (or \(-f\), if we followed the physics notation) is a force density (pressure). Then we can see:

Green’s functions as impulse response
If we write the LHS of our equation as a linear differential operator \(L\) acting on \(u\),

\[ Lu(x) = f(x),\]

then the Green’s function \(G(x, s)\) (\(s\) is for “source”) is defined as

\[ LG(x, s) = \delta(x - s) \]

Motivation: once we have \(G(x, s)\), we can “build up” solutions from it for different \(f\), by superposition.

Integrating the definition of the Green’s function, we get

\[ \int LG(x,s)\,f(s)\,ds=\int \delta (x-s)\,f(s)\,ds=f(x)\, .\]

Due to \(L\) only acting on \(x\) and being linear, we can bring \(L\) out:

\[ L\left(\int G(x,s)\,f(s)\,ds\right)=f(x)\, ,\]

meaning

\[ u(x)=\int G(x,s)\,f(s)\,ds. \]

Position of the source, \(s\), can only be one of the \(x_i\) in the linear system
In matrix form:

\[ LG = I \]

So \(G = L^{-1}\), and the columns of \(L^{-1}\) are the discrete “Green’s functions” for varying \(s\).

plot(x, inv(L)[:, 2], label = L"L^{-1}_{:,2}") # second column of L^{-1}

We can see that the columns of \(L^{-1}\) in 1D are very much like what you might intuitively expect if you pulled on a string at “one point.”

Discrete eigenfunctions#

x = LinRange(-1, 1, 10)
L = laplacian_dirichlet(x)
Lambda, V = eigen(L) # returns an eigen factorization (eigenvalues, matrix of eigenvectors)
plot(Lambda, marker=:circle, label = L"\lambda")

plot(x, V[:, 1:4]) # first 4 eigenvectors

Outlook on our method#

Pros#

Consistent
Stable
Second order accurate (we hope)

Cons#

Only second order accurate (at best)
Worse than second order on non-uniform grids
Worse than second order at Neumann boundaries
Boundary conditions break symmetry

8. Interpolation by Vandermonde matrices#

Let’s say we solved our IBVP with a FD difference method. Now we have \(\mathbf{u} = [u(x_1), u(x_2), \ldots, u(x_n) ] := [ u_1, u_2, \ldots, u_n]\)
How do we get \(u(x)\) from the vector \(\mathbf{u}\)?
Try fitting a polynomial that goes through the “data” \(u_1\), \(u_2\), \(\ldots\), of the form

\[ p(x) = c_0 + c_1 x + c_2 x^2 + \dotsb \]

that assumes function values \(p(x_i) = u_i\).

Questions#

What degree does the polynomial need to be (how many \(c_i\)-s) and why?
What constraints can we write down for the \(c_i\)-s?
What linear system can we write the constraints as?

Answers#

Polynomial needs to be degree \(n - 1\).
Constraints:

\[ p(x_1) = c_0 + c_1 x_1 + c_2 x_1^2 + \dotsb = u(x_1) \]

\[ p(x_2) = c_0 + c_1 x_2 + c_2 x_2^2 + \dotsb = u(x_2) \]

\[ \vdots \]

We can write them as a linear system called a Vandermonde matrix:

\[\begin{split} \underbrace{\begin{bmatrix} 1 & x_1 & x_2^2 & \dotsb \\ 1 & x_2 & x_3^2 & \dotsb \\ 1 & x_3 & x_3^2 & \dotsb \\ \vdots & & & \ddots \end{bmatrix}}_V \begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ \vdots \end{bmatrix} = \begin{bmatrix} u_0 \\ u_1 \\ u_2 \\ \vdots \end{bmatrix} \\ V\mathbf{c} = \mathbf{u} . \end{split}\]

function vander(x, k=nothing)
    if k === nothing
        k = length(x)
    end
    V = ones(length(x), k)
    for j = 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

vander (generic function with 2 methods)

vander(LinRange(-1, 1, 5))

5×5 Matrix{Float64}:
0  -1.0  1.0   -1.0    1.0
0  -0.5  0.25  -0.125  0.0625
0   0.0  0.0    0.0    0.0
0   0.5  0.25   0.125  0.0625
0   1.0  1.0    1.0    1.0

cond(vander(LinRange(-1, 1, 5))) # condition number 

23.530908731657536

The condition number of the Vandermonde matrix can get as large as \(10^{16}\) (\(\approx 1/\varepsilon_M\) in general) before we start losing digits
Why?

Fitting a polynomial#

k = 4
x = LinRange(-2, 2, k)
u = sin.(x)
V = vander(x)
c = V \ u # c = V^{-1}u
scatter(x, u, label="\$u_i = sin (x_i)\$", legend=:topleft)
plot!(x -> (vander(x, k) * c)[1,1], label="\$p(x)\$")
plot!(sin, label=L"\sin(x)")

Differentiating#

We’re given the coefficients \(c = V^{-1} u\) of the polynomial

\[ p(x) = c_0 + c_1 x + c_2 x^2 + \dotsb. \]

What is

(9)#\[\begin{align} p(0) &= c_0 \\ p'(0) &= c_1 \\ p''(0) &= c_2 \cdot 2\\ p^{(k)}(0) &= c_k \cdot k! . \end{align}\]

function fdstencil1(source, target)
    "first derivative stencil from source to target"
    x = source .- target
    V = vander(x)
    inv(V)[2, :]' # coefficients for the first derivative, as a row vector of of V^{-1}
end
plot([z -> fdstencil1(x, z) * u, cos], label=["fdstencil1" L"\cos(x)"], xlims=(-3,3))
scatter!(x, 0*x, label="grid points")

Arbitrary order#

function fdstencil(source, target, k)
    "kth derivative stencil from source to target"
    x = source .- target
    V = vander(x)
    rhs = zero(x)'
    rhs[k+1] = factorial(k)
    rhs / V # rhs * inv(V), without explicitly calling inv
end
fdstencil(x, 0.5, 2)

1×4 adjoint(::Vector{Float64}) with eltype Float64:
 0.0703125  0.351563  -0.914062  0.492187

plot([z -> fdstencil(x, z, 2) * u,
        z -> -sin(z)],  label=["fdstencil" L"-\sin(x)"], xlim=(-3, 3)) 
scatter!(x, 0*x, label="grid points")

We didn’t call inv(V); what’s up?

Recall that we have

\[ V c = u , \]

\[ c = V^{-1}u. \]

\[ p(0) = c_0 + c_1 0 + c_2 0 + \dotsb = e_0^T \underbrace{V^{-1} u}_c = \underbrace{e_0^T V^{-1}}_{s^0} u \]

Then

\[ p(0) = s^0 u \]

This means:

Evaluating the interpolating polynomial at \(0\) is just a weighted sum of the data values. And those weights are the first row of \(V^{-1}\).

In summary:

Row of \(V^{-1}\)	Meaning
Row 1	Interpolation weights for evaluating at \(0\)
Row 2	Weights for first derivative at \(0\)
Row 3	Weights for second derivative at \(0\)

etc

Exercise 7.3: Convergence order#

Looking at this convergence graph, deduce the order of convergence for our stencil (for differentiating twice).

hs = 2 .^ -LinRange(-4, 10, 10)
function diff_error(u, du, h; n, k, z=0)
    x = LinRange(-h, h, n) .+ .5
    fdstencil(x, z, k) * u.(x) - du.(z)
end
errors = [diff_error(sin, t -> -sin(t), h, n=5, k=2, z=.5+0.1*h)
    for h in hs]
plot(hs, abs.(errors), marker=:circle)

plot!(h -> h^3, label="\$h^?\$", xscale=:log10, yscale=:log10)

8.2 Getting derivatives of the FD solution (more in-depth explanation)#

Let’s summarize the process we have just seen, Let’s say we have a target point \(\bar{x} \notin \{ x_1, x_2, \ldots, x_n \}\) (not one of the nodes we used in FD)
We’d like an estimate of \(u^{(k)}(\bar{x})\) from the \(u(x_i)\)
Taylor series:
\[ u(x_i) = u(\bar{x}) + (x_i - \bar{x})u'(\bar{x}) + \ldots + \frac{1}{k!}(x_i - \bar{x})^k u^{(k)}(\bar{x}) + \ldots \quad \text{for } i = 1, 2, \ldots n.\]
Linearly combine the \(u(x_i)\) to get best estimate for \(u^{(k)}(\bar{x})\):
\[ c_1 u(x_1) + c_2 u(x_2) + \ldots + c_n u(x_n) = u^{(k)}(\bar{x}) + \mathcal{O}(h^p) \]
(here, \(h\) is some average nodal distance, and \(p\) should be as large as possible)
What are the \(c_i\)? Use Taylor series to rewrite linear combination:
\[ c_1 \left( u(\bar{x}) + (x_1 - \bar{x})u'(\bar{x}) + \ldots + \frac{1}{k!}(x_1 - \bar{x})^k u^{(k)}(\bar{x})\right) + \ldots + c_n\left(\ldots\right) = u^{(k)}(\bar{x}) + \mathcal{O}(h^p)\]
Collect coefficients of \(u^{(k)}\) to get the constraint
\[\begin{split} \frac{1}{(i-1)!}\sum_{j = 1}^n c_j (x_j - \bar{x})^{(i-1)} = \begin{cases} 1 \quad \text{if } i-1 = k, \\ 0 \quad \text{otherwise.}\end{cases} \end{split}\]
Can express as a Vandermonde system:

\[\begin{split} \begin{bmatrix} 1 & 1 & \ldots \\ (x_1 - \bar{x}) & (x_2 - \bar{x}) & \ldots \\ \vdots & \vdots & \ddots \\ (x_1 - \bar{k})^{(n-1)} & (x_2 - \bar{k})^{(n-1)} & \ldots\end{bmatrix}\begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} = \begin{bmatrix} 0 \\ 1! \\ \vdots \\ 0\end{bmatrix} \end{split}\]

(place \(k!\) in the \((k+1)\)th row on the RHS for the \(k\)th derivative)

Observations#

When using \(n=3\) points, we fit a polynomial of degree 2 and have error \(O(h^3)\) for interpolation \(p(0)\).
Each derivative gives up one order of accuracy in general.
For a finite-difference stencil built by exactness on polynomials up to degree \(n-1\), the error for the \(k\)-th derivative behaves like \(O(h^{n-k})\).
- In our Exercise 7.3, \(n=5\) (five points) and \(k=2\) (second derivative), so \(n-k = 3\), i.e, third-order convergence.
Centered diff on uniform grids can have extra cancellation (superconvergence).
The Vandermonde matrix is notoriously ill-conditioned with many points \(n\). The recommendation is to use a stable algorithm from Fornberg.
The real trouble arises when the condition number is \(\kappa \approx 1/\varepsilon_M\) (see this recent preprint).
Question: what could we have changed about the interpolation method? Hint: there are two main degrees of freedom.

Answers:

We could have changed the representation: \(p(x) = c_0 + c_1 x + c_2 x^2 + \ldots\) uses the monomial basis \(1, x, x^2, \ldots\), but we could have used a different basis set of functions. This in turn changes what the unknowns \(c_0, c_1, \ldots\) represent.
We also need to be careful in picking the nodes \(x_1, x_2, \ldots\), on which the entries of the Vandermonde matrix explicitly depend.

9. High-order discretization of the Laplacian#

Now that we have developed an algorithm for the arbitrary \(k\)-th derivative stencil from a given source to target, we can use it for any high-order FD discretization of the Laplacian.

For the Poisson problem \(-u_{xx} = f\):

"""
Poisson solver with arbitrary boundary conditions. In the interior, it uses fdstencil(..., 2) for the second-derivative.

The BCs are specified as:

left = (k, value)
and
right = (m, value)

Examples:
left = (0, 0) → enforce u(x_1)=0
left = (1, 0) → enforce u'(x_1)=0
left = (2, 0) → enforce u''(x_1)=0, and so on

"""

# Solver for Poisson problem with arbitrary order
function poisson(x, spoints, forcing; left=(0, zero), right=(0, zero)) # keyword arguments (optional); default homogeneous Dirichelet BCs
    n = length(x)
    L = zeros(n, n)
    rhs = forcing.(x) # forcing is an input function
    for i in 2:n-1
        jleft = min(max(1, i-spoints÷2), n-spoints+1)
        js = jleft : jleft + spoints - 1
        L[i, js] = -fdstencil(x[js], x[i], 2) # Second derivative via the stencil we derived
    end
    L[1,1:spoints] = fdstencil(x[1:spoints], x[1], left[1]) # Exercise: what do these two lines mean? fdstencil returns a row vector of coeff [c_1, c_2, ..., c_spoints] s.t. c_1 u_1 + c_2 u_2 + ... = u^(k) (x_1)
    L[n,n-spoints+1:n] = fdstencil(x[n-spoints+1:n], x[n], right[1])  # x[n-spoints+1:n] are the last spoints grid points
    rhs[1] = left[2](x[1]) # second-derivative condition
    rhs[n] = right[2](x[n]) # second-derivative condition
    L, rhs
end

poisson (generic function with 1 method)

L, b = poisson(LinRange(-1, 1, 6), 3, zero, left=(1, zero)) # Nodes are 6 equispaced points on [-1, 1]
L

6×6 Matrix{Float64}:
 -3.75   5.0   -1.25   0.0    0.0    0.0
 -6.25  12.5   -6.25   0.0    0.0    0.0
  0.0   -6.25  12.5   -6.25   0.0    0.0
  0.0    0.0   -6.25  12.5   -6.25   0.0
  0.0    0.0    0.0   -6.25  12.5   -6.25
  0.0    0.0    0.0    0.0   -0.0    1.0

Question: how do we test the convergence of this PDE solver (and the correctness of the code)?

Good practice (test-driven design):
- Think of what a function will do (inputs, outputs, its objective)
- Write its docstring / documentation
- Think of a test for its functionality
- Write a unit test
- Then, and only then, write the function.

10. Method of manufactured solutions#

= A way to validate that a PDE system is being solved correctly. To verify our numerics.

Problem: analytic solutions to PDEs are hard to find#

Let’s choose a smooth function with rich derivatives,

\[ u(x) = \tanh(x) . \]

Then

\[ u'(x) = \cosh^{-2}(x) \]

and

\[ u''(x) = -2 \tanh(x) \cosh^{-2}(x) . \]

This works for nonlinear too.

x = LinRange(-2, 2, 21)
L, rhs = poisson(x, 5,
    x -> 2 * tanh(x) / cosh(x)^2,
    left=(0, tanh), # left homogeneous Dirichelet BC
    right=(1, x -> cosh(x)^-2)) # right homogeneous Neumann BC
u = L \ rhs
plot(x, u, marker=:circle, legend=:topleft, label = "discrete")
plot!(tanh, label = L"\tanh(x)")

Convergence rate#

ns = 2 .^ (4:10)
hs = 1 ./ ns
function poisson_error(n; spoints=3)
    x = LinRange(-2, 2, n)
    L, rhs = poisson(x, spoints, x -> 2 * tanh(x) / cosh(x)^2,
        left = (0, tanh),
        right = (1, x -> cosh(x)^-2))
    u = L \ rhs
    norm(u - tanh.(x), Inf)
end

poisson_error (generic function with 1 method)

plot(hs, [poisson_error(n, spoints=9) for n in ns], marker=:circle, label = "error")
plot!(h -> h^8, label="\$h^8\$", xscale=:log10, yscale=:log10)

# Question: what's going wrong at small h?

Observations:#

We saw above that, with \(n\) points, with this general Vandermonde-based stencil generator, the error for the \(k\)-th derivative behaves like \(O(h^{n-k})\). Here we are approximating a Poisson (Laplacian) operator, so second-derivative \(u''\), with \(9\) points, hence \(O(h^7)\).
But: for symmetric centered stencils on a uniform grid, odd powers cancel out, hence, the leading error term becomes an even power. So instead of \(O(h^7)\), we get superconvergence \(O(h^8)\). (This is a symmetry effect).
Why doesn’t the treatment of the boundary conditions “destroy” or “pollute” the order of convergence globally now?
- Because the boundary conditions are built with the same number of points (spoints), so they are also high-order accurate. Hence, the global system retains the high-order accuracy.
Question: What’s going on for small values of \(h\)?

Answer: Round-off errors start to dominate for very small values of \(h\). Hence, in this case the error is dominated by round-off errors that accumulate rather than by the truncation error.

11. Second order derivative with Neumann boundary conditions#

We have seen how to implement Dirichelet boundary conditions. Let’s revisit that from the point of view of symmetry of the resulting differentiation matrix.

Symmetry in boundary conditions: Dirichlet#

We have implemented Dirichlet conditions by modifying the first row of the matrix,

\[\begin{split} \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ \\ & & A_{2:n,:} & & \\ \\ \end{bmatrix} \begin{bmatrix} u_1 \\ \\ u_{2:n} \\ \\ \end{bmatrix} = \begin{bmatrix} f_1 \\ \\ f_{2:n} \\ \\ \end{bmatrix} . \end{split}\]

This matrix is not symmetric even if \(A_{2:n,:}\) is.
We can eliminate \(u_1\) and create a reduced system for \(u_{2:n}\).
Generalize: consider a \(2\times 2\) block system
\[\begin{split} \begin{bmatrix} I & 0 \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} = \begin{bmatrix} f_1 \\ f_2 \end{bmatrix} .\end{split}\]

We can rearrange as

\[ A_{22} u_2 = f_2 - A_{21} f_1, \]

which is symmetric if \(A_{22}\) is.

This is called “lifting” and is often done implicitly in the mathematics literature. It is convenient for linear solvers and eigenvalue solvers, but inconvenient for I/O and postprocessing, as well as some nonlinear problems.
Convenient alternative: write
\[\begin{split} \begin{bmatrix} I & 0 \\ 0 & A_{22} \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} = \begin{bmatrix} f_1 \\ f_2 - A_{21} f_1 \end{bmatrix}, \end{split}\]
which is symmetric and decouples the degrees of freedom associated with the boundary. This method applies cleanly to nonlinear problems.
Optionally scale the identity by some scalar related to the norm of \(A_{22}\).

Symmetry in boundary conditions: Neumann#

Let’s revisit the Poisson equation, now with mixed Dirichelet and Neumann boundary conditions:

(10)#\[\begin{gather} -\frac{d^2 u}{dx^2} = f(x) \quad x \in \Omega = (-1,1) \\ u(-1) = a, \quad \frac{d u}{d x}(1) = b . \end{gather}\]

Consider a FD discretization of the Neumann boundary condition on the right boundary:

\[ \frac{du}{dx}(1) = b . \]

We could use a one-sided backward difference formula as in
\[ \frac{u_n - u_{n-1}}{h} = b . \]

this gives us an extra discretization choice
may reduce order of accuracy compared to interior discretization, and we lose symmetry.

Temporarily introduce a ghost point value \(u_{n+1} = u(x_{n+1} = 1 + h)\) (possibly more) and define it to be a reflection of the values from inside the domain. In the case of a homogeneous Neumann boundary condition, \(b=0\), this reflection is \(u_{n+1} = u_{n-1}\). More generally,

\[ \frac{u_{n+1} - u_{n-1}}{2 h} = b, \]

\[ u_{n+1} = u_{n-1} + 2b h . \]

With this definition of ghost values, we can apply the interior discretization at the boundary. For our reference equation, we would write

\[ \frac{-u_{n-1} + 2 u_n - u_{n+1}}{h^2} = f(x_n) \]

which simplifies to

\[ \frac{u_n - u_{n-1}}{h^2} = f(x_n)/2 + b/h \]

after dividing by 2 and moving the boundary term to the right hand side.