My old option pricer (which you can download "refurbished" here) had a basic stochastic volatility (Heston) solver which never really passed the testing phase, so I kept it hidden. Besides, it was using SOR to solve the full system at each time-step, so it can't compete with the modern "state of the art" ADI variants. Or can it?
Well I decided to give it another try and see how it compares really after a few optimizing tweaks.
The Heston Partial Differential Equation
We are trying to solve the following PDE by discretizing it with the finite difference method:
$$\frac{\partial V}{\partial t} = \frac{1}{2} S^2 \nu\frac{\partial^2 V}{\partial S^2}+\rho \sigma S \nu \frac{\partial^2 V}{\partial S \partial \nu} + \frac{1}{2}\sigma^2\nu\frac{\partial^2 V}{\partial \nu^2} + (r_d-r_f)S \frac{\partial V}{\partial S}+ \kappa (\eta-\nu) \frac{\partial V}{\partial \nu}-r_d V $$
where $\nu$ is the variance of the underlying asset $S$ returns, $\sigma$ is the volatility of the $\nu$ process, $\rho$ the correlation between the $S$ and $ \nu$ processes and $\eta$ is the long-term variance to which $\nu$ is mean-reverting with rate $\kappa$. Here $S$ follows geometrical Brownian motion and $\nu$ the CIR (Cox, Ingersoll and Ross) mean-reverting process.
Time-discretization: Four-level fully implicit (4LFI, or FI-BDF3)
After we discretize the PDE we are left with an algebraic system of equations relating the values at each grid point (i,j) with those at its neighboring points. With the standard implicit scheme (Euler) the values at time n+1 are found as $ V_{i,j}^{(n+1)} = (B \cdot dt+V_{i,j}^{(n)})/(1-a_{i,j} \cdot dt) $ for each grid point (i,j). $B$ is a linear combination of $ V^{(n+1)} $ at (i,j)'s neighboring points and $a_{i,j}$ the coefficient of $ V_{i,j} $ arising from the spatial discretization. So we need to solve the system of unknowns $ V_{i,j}^{(n+1)} $.
The 3LFI (otherwise known as implicit BDF2) scheme just adds a little information from time point n-1 and can simply be described as: $ V_{i,j}^{(n+1)} = (B \cdot dt+2 \cdot V_{i,j}^{(n)}-0.5 \cdot V_{i,j}^{(n-1)})/(1.5-a_{i,j} \cdot dt) $, $a_{i,j}$ and $B$ being the same as before. This little addition makes the 3LFI/BDF2 scheme second order in time, whereas the standard implicit is only first order.
This scheme is strongly A-stable (?), so unlike something like Crank-Nicolson (CN) it has built-in oscillation-damping on top of being unconditionally stable. So I tested this first and while it worked pretty well, I found that it produced slightly larger errors than CN. That is while they're both second-order accurate, CN's error-constant was lower. Which is not surprising, central discretizations often produce higher accuracy than their one-sided counterparts of the same order. Moreover it was notably less time-converged compared to the ADI schemes using the same number of time-steps.
So I decided to go for higher-spec. Especially since it comes really cheap in terms of implementation. Adding an extra time-point gives us the 4LFI or BDF3 scheme, which is third order:
$$ BDF3 : \quad V_{i,j}^{(n+1)} = \left (B \cdot dt+3 \cdot V_{i,j}^{(n)}-\frac 3 2 \cdot V_{i,j}^{(n-1)}+\frac 1 3 \cdot V_{i,j}^{(n-2)}\right)/\left(\frac {11} 6 - a_{i,j} \cdot dt\right) \qquad (1)$$
While such a scheme is no longer strictly unconditionally stable (it's almost A-stable), it should still be almost as good as and should preserve the damping qualities of 3LFI/BDF2. As it happens, the results it produces are significantly more accurate than both the 3LFI/BDF2 and the ADI schemes for any number of time-steps. Third order beats second order easily here, despite being a one-sided one. Finally as an added bonus, it adds a little to the diagonal dominance of the iteration matrix of the SOR solver so it helps convergence as well.
Now since this is a multi-step scheme (it needs the values not just from the previous time level but also from two levels before that), we need to use something different for the first couple of time steps, a starting procedure. For the very first step we need a two-level (single step) scheme. We cannot use CN because this will allow spurious oscillations to occur given the non-smooth initial conditions of option payoff functions. Instead one can use the standard Implicit Euler scheme plus local extrapolation: we first use the IE scheme for 4 sub-steps of size dt/4 to get the values $ V_{fine}^1$ at the end of the first time step. We then repeat, this time using 2 sub-steps of size dt/2 to obtain $ V_{coarse}^1$ and get the final composite values for the first time step as $(4V_{fine}^1-V_{coarse}^1)/3 $. For the second time step we just use the 3LFI/BDF2.
Space-discretization and boundary conditions
Central finite differences are used for everything, except again for the v = 0 boundary, where we need to use a one-sided (three-point) upwind second-order discretization for the first v-derivative. Hout & Foulon [1] also use that upwind discretization in some cases where v > 1, but I stick to the central formula here. And of course I make sure that the strike falls in the middle between two grid points, which is something that occasionally people seem to neglect in a few publications I've seen. Suitable Neumann and/or Dirichlet boundary conditions are used everywhere except for the v = 0 variance boundary where we require that the PDE itself (with many terms being zero there) is satisfied, all as described in Hout & Foulon [1].
System solver
I used successive over-relaxation (SOR) to solve the resulting system because it's so easy to set up and accommodate early exercise features. On the other hand it is in general quite slow and moreover its speed may depend significantly on the chosen relaxation factor $\omega$. This is where the hand-made part comes in. I am using an SOR variant called TSOR together with a simple algorithm that adjusts $\omega$ as it goes, monitoring proceedings for signs of divergence as well (e.g. residual increasing instead of decreasing). I am not 100% sure my implementation is bug free, but it seems that the resulting SOR iteration matrix is not always such that it leads to convergence. There's a maximum $\omega$ which presumably makes the spectral radius of the iteration matrix less than one, but I've not pursued any further analysis on that, relying instead on the heuristic algorithm to keep it safe and close to optimal. All in all the modifications compared to the standard SOR amount to a few lines of code. I may write details of TSOR in another post but for the moment let's just say that it speeds things up about 3 times compared to standard SOR, plus it seems less dependent on $\omega$.
Well I decided to give it another try and see how it compares really after a few optimizing tweaks.
The Heston Partial Differential Equation
We are trying to solve the following PDE by discretizing it with the finite difference method:
$$\frac{\partial V}{\partial t} = \frac{1}{2} S^2 \nu\frac{\partial^2 V}{\partial S^2}+\rho \sigma S \nu \frac{\partial^2 V}{\partial S \partial \nu} + \frac{1}{2}\sigma^2\nu\frac{\partial^2 V}{\partial \nu^2} + (r_d-r_f)S \frac{\partial V}{\partial S}+ \kappa (\eta-\nu) \frac{\partial V}{\partial \nu}-r_d V $$
where $\nu$ is the variance of the underlying asset $S$ returns, $\sigma$ is the volatility of the $\nu$ process, $\rho$ the correlation between the $S$ and $ \nu$ processes and $\eta$ is the long-term variance to which $\nu$ is mean-reverting with rate $\kappa$. Here $S$ follows geometrical Brownian motion and $\nu$ the CIR (Cox, Ingersoll and Ross) mean-reverting process.
Time-discretization: Four-level fully implicit (4LFI, or FI-BDF3)
After we discretize the PDE we are left with an algebraic system of equations relating the values at each grid point (i,j) with those at its neighboring points. With the standard implicit scheme (Euler) the values at time n+1 are found as $ V_{i,j}^{(n+1)} = (B \cdot dt+V_{i,j}^{(n)})/(1-a_{i,j} \cdot dt) $ for each grid point (i,j). $B$ is a linear combination of $ V^{(n+1)} $ at (i,j)'s neighboring points and $a_{i,j}$ the coefficient of $ V_{i,j} $ arising from the spatial discretization. So we need to solve the system of unknowns $ V_{i,j}^{(n+1)} $.
The 3LFI (otherwise known as implicit BDF2) scheme just adds a little information from time point n-1 and can simply be described as: $ V_{i,j}^{(n+1)} = (B \cdot dt+2 \cdot V_{i,j}^{(n)}-0.5 \cdot V_{i,j}^{(n-1)})/(1.5-a_{i,j} \cdot dt) $, $a_{i,j}$ and $B$ being the same as before. This little addition makes the 3LFI/BDF2 scheme second order in time, whereas the standard implicit is only first order.
This scheme is strongly A-stable (?), so unlike something like Crank-Nicolson (CN) it has built-in oscillation-damping on top of being unconditionally stable. So I tested this first and while it worked pretty well, I found that it produced slightly larger errors than CN. That is while they're both second-order accurate, CN's error-constant was lower. Which is not surprising, central discretizations often produce higher accuracy than their one-sided counterparts of the same order. Moreover it was notably less time-converged compared to the ADI schemes using the same number of time-steps.
So I decided to go for higher-spec. Especially since it comes really cheap in terms of implementation. Adding an extra time-point gives us the 4LFI or BDF3 scheme, which is third order:
$$ BDF3 : \quad V_{i,j}^{(n+1)} = \left (B \cdot dt+3 \cdot V_{i,j}^{(n)}-\frac 3 2 \cdot V_{i,j}^{(n-1)}+\frac 1 3 \cdot V_{i,j}^{(n-2)}\right)/\left(\frac {11} 6 - a_{i,j} \cdot dt\right) \qquad (1)$$
While such a scheme is no longer strictly unconditionally stable (it's almost A-stable), it should still be almost as good as and should preserve the damping qualities of 3LFI/BDF2. As it happens, the results it produces are significantly more accurate than both the 3LFI/BDF2 and the ADI schemes for any number of time-steps. Third order beats second order easily here, despite being a one-sided one. Finally as an added bonus, it adds a little to the diagonal dominance of the iteration matrix of the SOR solver so it helps convergence as well.
Now since this is a multi-step scheme (it needs the values not just from the previous time level but also from two levels before that), we need to use something different for the first couple of time steps, a starting procedure. For the very first step we need a two-level (single step) scheme. We cannot use CN because this will allow spurious oscillations to occur given the non-smooth initial conditions of option payoff functions. Instead one can use the standard Implicit Euler scheme plus local extrapolation: we first use the IE scheme for 4 sub-steps of size dt/4 to get the values $ V_{fine}^1$ at the end of the first time step. We then repeat, this time using 2 sub-steps of size dt/2 to obtain $ V_{coarse}^1$ and get the final composite values for the first time step as $(4V_{fine}^1-V_{coarse}^1)/3 $. For the second time step we just use the 3LFI/BDF2.
Space-discretization and boundary conditions
Central finite differences are used for everything, except again for the v = 0 boundary, where we need to use a one-sided (three-point) upwind second-order discretization for the first v-derivative. Hout & Foulon [1] also use that upwind discretization in some cases where v > 1, but I stick to the central formula here. And of course I make sure that the strike falls in the middle between two grid points, which is something that occasionally people seem to neglect in a few publications I've seen. Suitable Neumann and/or Dirichlet boundary conditions are used everywhere except for the v = 0 variance boundary where we require that the PDE itself (with many terms being zero there) is satisfied, all as described in Hout & Foulon [1].
System solver
I used successive over-relaxation (SOR) to solve the resulting system because it's so easy to set up and accommodate early exercise features. On the other hand it is in general quite slow and moreover its speed may depend significantly on the chosen relaxation factor $\omega$. This is where the hand-made part comes in. I am using an SOR variant called TSOR together with a simple algorithm that adjusts $\omega$ as it goes, monitoring proceedings for signs of divergence as well (e.g. residual increasing instead of decreasing). I am not 100% sure my implementation is bug free, but it seems that the resulting SOR iteration matrix is not always such that it leads to convergence. There's a maximum $\omega$ which presumably makes the spectral radius of the iteration matrix less than one, but I've not pursued any further analysis on that, relying instead on the heuristic algorithm to keep it safe and close to optimal. All in all the modifications compared to the standard SOR amount to a few lines of code. I may write details of TSOR in another post but for the moment let's just say that it speeds things up about 3 times compared to standard SOR, plus it seems less dependent on $\omega$.