Since I do not currently have access to a proper production model (that could for example employ some Local Stochastic Volatility model for the underlying processes), I will use simple GBM. The number of free input parameters (dimensionality of the approximated function) is 28 (see below), which combined with the pricing discontinuities at the autocall dates still make for a very challenging problem. My experience so far (on training DNN's for simpler exotics) is that replacing GBM with a volatility model does not pose unsurmountable difficulties for the DNN. I am confident that similar (or higher) accuracy to the one showcased here can be achieved for the same MRBCA, at the expense of more (offline) effort in generating data and training the DNN.

The product class we are looking at (autocallable multi barrier reverse convertibles) is popular in the Swiss market. It is a structured product paying a guaranteed coupon throughout its lifetime with a complex barrier option embedded. The latter is contingent on the worst performing out of a basket of underlying assets. There is a down-and-in type continuously monitored barrier and an up-and-out discretely monitored one (at the autocall dates). The assets' performance (level) is measured as a percentage of their initial fixing values. Accordngly, the strike level *K* for the down-and-in payoff, the down-and-in barrier *B* and the early redemption (autocall) level *A* (the "up-and-out barrier") are quoted as percentages.

In short: The product pays the holder a guaranteed coupon throughout its lifetime (up to maturity or early redemption). If on any of the observation (autocall) dates the worst-performing asset level is above the early redemption level, the product expires immediately and the amount redeemed is 100% of the nominal value. If no early redemption event happens then at maturity :

In short: The product pays the holder a guaranteed coupon throughout its lifetime (up to maturity or early redemption). If on any of the observation (autocall) dates the worst-performing asset level is above the early redemption level, the product expires immediately and the amount redeemed is 100% of the nominal value. If no early redemption event happens then at maturity :

- If during the lifetime of the product the worst-performing asset level did not at any moment touch or cross the barrier level
*B*, the amount redeemed is 100% of the nominal value. - If the worst-performing asset level did touch or cross the barrier level
*B*at some point and its final fixing level is above the strike level*K*, the amount redeemed is again 100% of the nominal value. - If the worst-performing asset did touch or cross the barrier level
*B*at some point and its final fixing level is below the strike level*K*, the amount redeemed is the percentage of the nominal equal to the worst-performing asset performance (ratio of its final to initial fixing level).

I am not going to attempt an all-encompassing DNN representation of any possible MRBCA structure, but rather focus the effort on particular subcategory. So what I am looking for is a pricing approximation for MRBCA's on the worst of 4 assets, with original maturity of 2Y and semi-annual coupon payment and autocall dates. Also assuming the product is struck at-the-money (i.e. the strike level *K* is 100%, the most usual case in practice) with an early redemption (autocall) level of also 100%, again typical in the market. The latter two could of course also be included as variable inputs in the approximation. This may well be possible while maintaining the same accuracy but I haven't tried it yet.

So the DNN approximation will be for the clean price of any such product (given the inputs described next) at any time after its inception, up to its maturity. Indeed in what follows, T denotes the time left to maturity.

So the DNN approximation will be for the clean price of any such product (given the inputs described next) at any time after its inception, up to its maturity. Indeed in what follows, T denotes the time left to maturity.

- The asset level
*S*(% of initial fixing), volatility*vol*and dividend yield*d*for each of the 4 underlying GBM processes. - Seven-point discount factor curve (1D, 1W, 1M, 3M, 6M, 1Y, 2Y).
- Time left to maturity
*T*(in years). - Barrier level
*B*(% of initial fixings). - Coupon level
*Cpn*(% p.a.). - Correlation matrix (six distinct entries).

The DNN is trained for wide ranges of its inputs to allow it to be used for a long time without the need for retraining. The approximation is only guaranteed to be good within the input ranges that it has been trained for. Those are shown below.

Operational parameter ranges | ||||

Min | Max | |||

S_{i} | 20% | 500% | ||

vol_{i} | 10% | 40% | ||

d_{i} | 0 | 10% | ||

T | 0.001 | 2 | ||

r (disc. rates) | -2% | 2.50% | ||

B | 40% | 80% | ||

Cpn | 2% p.a. | 20% p.a. | ||

ρ | -55% | 99% |

The original pricing model / function we aim to "mimic" is of course based on Monte Carlo simulation and was written in C++. I omitted things like date conventions and calendars for ease of implementation. The continuously monitored (American) down-and-in barrier feature is taken care of via the use of a probabilistic correction (Brownian Bridge). Given the assumed GBM processes, this not only perfectly eliminates any simulation bias, but also enables the use of large time-steps thus allowing for significant speedup in the generation of the training samples. The discount curve interpolation is based on cubic splines and auxiliary points. The simulation can be driven by either pseudo random numbers or quasi random sequences (Sobol). I chose the former for the generation of the training samples as it proved to be more beneficial for the learning ability of the DNN under my current setup.

Note that in contrast with the use case in the previous post, here the training output data (the MC prices) are noisy and of limited accuracy. Does this represent a big problem for the DNN's ability to learn from them? It turns out that the answer is not really.

Note that in contrast with the use case in the previous post, here the training output data (the MC prices) are noisy and of limited accuracy. Does this represent a big problem for the DNN's ability to learn from them? It turns out that the answer is not really.

The DNN is trained by feeding it millions of *[vector(inputs), price]* pairs. This process in practice has to be repeated many times since there is no general formula for what works best in each case. The pricing accuracy of the training set samples does not have to be as high as the target accuracy. As it turns out the DNN has the ability to effectively smooth out the pricing data noise and come up with an accuracy that is higher than that on the individual prices it was trained on. Also the input space coverage does not have to be uniform as we may want for example to place more points where the solution changes rapidly in an effort to end up with a balanced error distribution.

When it comes to testing the resulting DNN approximation though we create a separate (out of sample) test set of highly accurate prices uniformly filling the input space. This is to say we don't weigh some areas of the solution (say near the barrier) more than others when we calculate the error metrics. We say this is the operational (inputs) range of the DNN and we provide (or at least aim to) similar accuracy everywhere within that range. So the test set is created by drawing random inputs from uniform distributions within their respective ranges. The one exception being the correlation matrices whose coefficients follow the distribution below. We then discard those matrices that include coefficients outside our target range of (-55% to 99%).

When it comes to testing the resulting DNN approximation though we create a separate (out of sample) test set of highly accurate prices uniformly filling the input space. This is to say we don't weigh some areas of the solution (say near the barrier) more than others when we calculate the error metrics. We say this is the operational (inputs) range of the DNN and we provide (or at least aim to) similar accuracy everywhere within that range. So the test set is created by drawing random inputs from uniform distributions within their respective ranges. The one exception being the correlation matrices whose coefficients follow the distribution below. We then discard those matrices that include coefficients outside our target range of (-55% to 99%).

The overall accuracy achieved by the DNN is measured by the usual Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics. We can also look at the error distribution to get an idea of how good the approximation is. What we cannot easily do is say what lies far in the tails of that distribution, or in other words provide some sort of limit for the maximum possible error. In contrast to the traditional MC model, there is no theoretical confidence interval for the DNN error.

The MAE and RMSE are calculated against a reference test set of 65K MC prices, each generated using 32 million Sobol-driven paths (with Brownian Bridge construction). Such prices are found (when re-calculating a subset using 268 million Sobol paths) to have an accuracy of 4.e-6, which is well below the target accuracy (about 1.e-4, or 1 cent in a nominal of 100$). The inputs were generated again using (22-dimensional, correlations excluded) Sobol points, in an effort to best represent the space. The average model price for this test set is 0.874 (87.4%).

In order to try and get an idea for the worst-case errors I tested the DNN against a much bigger (but less accurate) test set of 16.7 million Sobol points.

For the results presented here the DNN was trained on 80 million *[vector(inputs), price]* samples carefully chosen so as to ensure that error levels are as uniform as possible across the input space. At this training set size the convergence rate (error decrease with each doubling of training set size) was showing some signs of slowing down, but there was still room for improvement. Using a few hundred million samples would still be straightforward and would yield even better accuracy.

Still the overall quality of the approximation is excellent. The mean error is less than a cent and generally does not exceed 3 cents. The speed is as expected many orders of magnitude higher than an MC simulation with similar standard error (see below). The timings are for a single CPU core. Of course if GPU's are used instead the speed can still be improved significantly.

Still the overall quality of the approximation is excellent. The mean error is less than a cent and generally does not exceed 3 cents. The speed is as expected many orders of magnitude higher than an MC simulation with similar standard error (see below). The timings are for a single CPU core. Of course if GPU's are used instead the speed can still be improved significantly.

Deep Neural Network Pricing Performance | ||||

MAE | 6×10^{-5} | |||

RMSE | 9×10^{-5} | |||

Maximum absolute error in 16.7M test samples | 1.5×10^{-2} | |||

CPU time per price (1 core) | 6×10^{-6} secs |

in order to get similar accuracy from the traditional MC model one needs about 400K antithetic paths. With the present implementation this takes about 0.35 secs on 1 CPU core, which is about 60000 times slower than the DNN. If the MC pricing employed some volatility model needing fine time steps, the speedup factor could easily be in the order of millions (the DNN speed would remain the same).

The MC model prices a lot of the samples with almost zero error. This is because for many of the random input parameter vectors the solution is basically deterministic and the product behaves either like a bond or is certain to be redeemed early.

By far the most challenging dimension in the 28-dimentional function we are approximating here is the time to expiry T. The (clean) product price can be discontinuous at the autocall dates, posing a torture test for any numerical method. This is illustrated below where I am plotting a few sample solutions across T (keeping all other input parameters constant).**These "pathological" cases correspond to the random input parameter vectors that resulted in the worst DNN approximation errors among the 16.7 million reference set cases (top 5 worst errors)**. The MC price plots are based on 40000 valuation points using 132K Sobol-driven paths per valuation. It took about 10 mins to create each plot utilizing all 12 cores of a CPU . The corresponding 40000 DNN approximations took < 0.2sec on a single core.

By far the most challenging dimension in the 28-dimentional function we are approximating here is the time to expiry T. The (clean) product price can be discontinuous at the autocall dates, posing a torture test for any numerical method. This is illustrated below where I am plotting a few sample solutions across T (keeping all other input parameters constant).

Looking at these plots it comes as no great surprise that the DNN struggles here. Considering the vast variety of shapes the solution can take, it is nonetheless seriously impressive that the DNN can cope as well as it does overall. That said, the maximum errors above are about 1.5% (not quite visible, located within those ultra narrow dips a few hours from the auto-call dates), which is more than I would have been happy with. Still, for use in XVA type calculations and intraday portfolio valuation monitoring, the performance is more than adequate as is. For use in a production environment one would need to be even more stringent with ensuring the maximum errors do not exceed a certain threshold. When testing the DNN against the much smaller 65K reference set, the maximum error was an order of magnitude smaller (about 0.2%, or 20 cents). Looking at 100M cases may reveal an even worse case than the 1.5% error found in the 16.7M set. Nonetheless there are ways to identify and target the problematic areas of the input parameter space. I am thus confident the maximum errors can be brought down further together with the mean error metrics by increasing and further refining the synthetic training set.

In conclusion, we can say that the DNN has passed this second much more difficult test as well. There was never a doubt that the approximation accuracy increases with increasing training data. The question in my mind was rather "is the sufficient amount of training (for the DNN to produce a worthy replacement of the traditional MC and PDE-based pricing) practical in terms of time and cost"? Given the experience gathered so far I would say the answer is yes. The present results were achieved mainly on a top spec desktop with only limited use of cloud resources. Approximating fully fledged models incorporating local and/or stochastic volatility will require more computational power, but the offline effort would still correspond to reasonable time and cost. To this end, a third post in this series would look at the case of FX TARF pricing under an LV or LSV model.

P.S. The results summarily presented in these last two posts are the culmination of a lot of work and experimentation that took the better part of a year and thousands of CPU/GPU hours.

]]>The last few years have seen increased interest in Machine Learning – Neural Networks (NN’s) in finance. Here I will focus specifically on the application of NN’s to function approximation, so basically derivatives pricing and consequently risk calculations. The goal is to either “turbo-boost” models/calculations that are already in production systems, or enable the use of alternative/better models that were previously not practical due to the high computational cost. I’ve been hearing claims of millions of times faster calculations and that the Universal Approximation Theorem guarantees that all functions can be approximated accurately by the simplest kind of net, provided enough training samples. But as usual the devil is in the details; like exactly how many training samples are we talking about to achieve a typical production system level of accuracy? I was wondering, what if to guarantee acceptable accuracy one would need impractically large training sets (and hence data generation and training times)? And never mind millions of times faster (perhaps more realistic for quantum computers when they arrive?), I would be satisfied with 100 or even 10 times faster, provided it was easy enough to implement and deploy.

There have been waves of renewed interest in NN's for decades, but their recent resurgence is in large part due to them been made more accessible via Python packages like TensorFlow and PyTorch that hide away the nitty gritty that would otherwise dishearten most of the recent users. So given the low barrier to entry and having been spurred on by a couple of people, I decided to check out this seemingly all-conquering method. Moreover, this seems like an engineering exercise; one needs to be willing to try a lot of different combinations of “hyperparameters”, use suitable training sample generation techniques and bring in experience/tricks from traditional methods, all with the aim to improve the end result. Not to mention "free" time and relatively cheap, sustainable electricity. That’s me then this year I thought.

I am not sure what is the state of adoption of such NN applications in finance right now. Looking at relevant papers, they all seem to be fairly recent and more the proof-of-concept type. What puzzles me in particular is the narrow input parameter range used to train the NN's on. Surely one would need more coverage than that in practice? Consequently there is talk that such (NN) models would need to be retrained from time to time when market parameters get out of the “pre-trained” range. Now I may be missing something here. First, I think it would be less confusing to call them "NN approximations" instead of "NN models" since they simply seek to reproduce the output of existing

So I decided I want hard answers: can I approximate a model (function) in the whole (well almost) of its practically useful parameter space so that there’s no need to retrain it, like never (unless the original is changed of course)? To this aim I chose a volatility model calibration exercise as my first case study. Which is convenient because I had worked on this before. Note that a benefit of the NN approach (if it turns out they do the job well) is that one could make the calibration of any model super-fast, and thus enable such models as alternatives to the likes of Heston and SABR. The latter are popular exactly because they have fast analytical solutions or approximations that make calibration to market vanillas possible in practical timescales.

To demonstrate said benefit, the logical thing to do here would be to come up with a turbo-charged version of the non-affine model demo calibrator. The PDE method used in there to solve for vanilla prices under those models is highly optimized and could be used to generate the required millions of training samples for the NN in a reasonable amount of time. To keep things familiar though and save some time, I will just try this on the Heston model whose training samples can be generated a bit faster still (I used QLib’s implementation for this). If someone is interested in a high accuracy, production-quality robust calibration routine of any other volatility model, feel free to get in touch to discuss.

Like I said above, the trained parameter range is much wider than anything I’ve seen published so far but kind of arbitrary. I chose it to cover the wide moneyness (S/K) range represented by the 246 SPX option chain A from [1]. Time to maturity ranges from a few days up to 3 years. The model parameters should cover most markets. The network has about 750,000 trainable parameters, which may be too many, but one does not know beforehand what accuracy can be achieved with what architecture. It is possible that a smaller (and thus even faster) network can give acceptable results. 32 million optimally placed training samples were used. I favored accuracy over speed here, if anything to see just how accurate the NN approximation can practically get. But also because higher accuracy minimizes the chance of the optimizer converging to different locales (apparent local minima) depending on the starting parameter vector (see [3] for more on this).

Overall this is a vanilla calibration set-up, where a standard optimizer (Levenberg-Marquardt) is used to minimize the root mean square error between the market and model IV's, with the latter provided by the (Deep) NN approximation. There are other more specialized approaches involving NN's designed specifically for the calibration exercise, see for example the nice paper by Horvath et al. [4]. But I tried to keep things simple here. So how does it perform then?

Neural Network Approximation specification | ||||

Operational parameter ranges | ||||

min | max | |||

S/K | 0.5 | 5 | ||

T | 0.015 | 3 | ||

r | -0.02 | 0.1 | ||

d | 0 | 0.1 | ||

v_{0} | 0.002 | 0.5 | ||

v̅ | 0.005 | 0.25 | ||

κ | 1 | 20 | ||

ξ | 0.1 | 10 | ||

ρ | -0.95 | 0.1 | ||

Performance | ||||

Mean Absolute IV Error | 9.3×10^{-6} |

The mean absolute error is about 1.e−3 implied volatility percentage points (i.e. 0.1 implied volatility basis points).

The maximum error over a test set of 2 million (out of sample) random points was 5 IV basis points but that is in an area of the parameter hypercube of no interest in practice. For example on the moneyness - expiration plane, the error distribution looks like this:

Obviously for individual use cases the NN specification would be customized for a particular market, thus yielding even better accuracy and/or higher speed. Either way I think this is a pretty satisfactory result and it can be improved upon further if one allows for more resources.

In terms of calibration then, how does that IV accuracy translate to predicted model parameter accuracy? I will use real-world market data that I had used before to test my PDE-based calibrator; the two SPX option chains from [1] and the DAX chain from [2]. As can be seen in Table 1, the calibration is very accurate and takes a fraction of a second. In practice one would use the last result as the starting point which should help the optimizer converge faster still.

For those who need hard evidence that this actually works as advertised, there's the self-contained console app demo below to download. The options data files for the 3 test chains are included. The calibrator always starts from the same parameter vector (see Table 1) and uses only the CPU to keep things simple and facilitate comparisons with traditional methods. And I have also included a smaller, only slightly less accurate on average NN approximation that is more than double as fast still.

SPX Chain A from [1] (246 options) | SPX Chain B from [1] (68 options) | DAX Chain from [2] (102 options) | |||||||||

Exact | NN | error | Exact | NN | error | Exact | NN | error | |||

v0 | 0.007316 | 0.007315 | (0.01%) | 0.04575 | 0.04576 | (0.02%) | 0.1964 | 0.1964 | (0.00%) | ||

v̅ | 0.03608 | 0.03608 | (0.00%) | 0.06862 | 0.06862 | (0.00%) | 0.07441 | 0.07440 | (0.01%) | ||

k | 6.794 | 6.794 | (0.00%) | 4.906 | 4.903 | (0.06%) | 15.78 | 15.80 | (0.13%) | ||

x | 2.044 | 2.044 | (0.00%) | 1.526 | 1.525 | (0.07%) | 3.354 | 3.356 | (0.06%) | ||

r | -0.7184 | -0.7184 | (0.00%) | -0.7128 | -0.7129 | -(0.01%) | -0.5118 | -0.5118 | (0.00%) | ||

IV RMSE (b.p.) | 128.24 | 128.23 | 0.01 | 101.35 | 101.37 | 0.02 | 131.72 | 131.72 | 0.00 | ||

CPU time | 0.55 s | 0.16 s | 0.26 s |

So to conclude, it is fair to say that the NN (or should I say Machine Learning) approach passed the first real test I threw at it with relative ease. Yes, one needs to invest time to get a feeling of what works and come up with ways to optimize since this is basically an engineering problem. But the results show that at least in this case one can get all the accuracy practically needed in a reasonable amount of (offline) time and resources.

Finally let's briefly mention here the two main perceived issues with this approach: How does one guarantee accuracy everywhere in the input parameter hyperspace (i.e. looking at the tails of the error distribution)? I agree this is an issue but there are ways to increase confidence, especially in relatively simple cases like the one here. The other is lack of transparency and/or interpretability.

[1] Y. Papadopoulos, A. Lewis (2018), “A First Option Calibration of the GARCH Diffusion Model by a PDE Method.”, arXiv:1801.06141v1 [q-fin.CP].

[2] Kangro, R., Parna, K., and Sepp, A., (2004), “Pricing European Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform,” Acta et Commentationes Universitatis Tartuensis de Mathematica 8, p. 123-133.

[3] Cui, Y., del Bano Rollin, S., Germano, G. (2017), "Full and fast calibration of the Heston stochastic volatility model", European Journal of Operational Research, 263(2), p. 625–638

[4] Horvath, B., Muguruza, A., Tomas, M. (2019). "Deep learning volatility.", arXiv:1901.09647v2 [q-fin.MF]

]]>[2] Kangro, R., Parna, K., and Sepp, A., (2004), “Pricing European Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform,” Acta et Commentationes Universitatis Tartuensis de Mathematica 8, p. 123-133.

[3] Cui, Y., del Bano Rollin, S., Germano, G. (2017), "Full and fast calibration of the Heston stochastic volatility model", European Journal of Operational Research, 263(2), p. 625–638

[4] Horvath, B., Muguruza, A., Tomas, M. (2019). "Deep learning volatility.", arXiv:1901.09647v2 [q-fin.MF]

In my last couple of posts I spoke about how a fairly simple PDE / finite differences approach can actually enable fast and robust option calibrations of non-affine SV models. I also posted a little console app that demonstrates the approach for the GARCH diffusion model. I have since played around with that app a little more, so here I'm giving a second version that can calibrate the following 5 non-affine SV models (plus Heston for comparison).

For the GARCH diffusion or power-law model (and the PDE pricing engine used for all the models implemented in this demo) see [1]. For the Inverse Gamma (aka "Bloomberg") model see for example [2]. The XGBM model was suggested to me by Alan Lewis, who has been working on an exact solution for it (to be published soon). Here pricing is done via the PDE engine. The ACE-X1 model is one of the models I've tried calibrating, which seems to be doing a very good job in various market conditions. All the above models calibrate to a variance (or volatility) process that is arguably more realistic than that of the Heston model (which when calibrated, very often has zero as its most probable value in the long-run).

Please bear in mind that the above demo is just that, i.e. not production-level. In [1] the PDE engine was developed just for the GARCH diffusion model and Excel was used for the optimization. I then quickly coupled the code with a Levenberg-Marquardt implementation and adjusted the PDE engine for the other models without much testing (or specific optimizations). That said, it works pretty well in general, with calibration times ranging from a few seconds up to a minute. It offers three speed / accuracy settings, but even with the fastest setting calibrations should be more accurate (and much faster) than any Monte-Carlo based implementation. A production version for a chosen model would be many times faster still. Note that you will also need to download the VC++ Redistributable for VS2013. The 64-bit version (which is a little faster) also requires the installation of Intel's MKL library. The demo is free but please acknowledge if you use it and do share your findings.

EDIT April 2020: After downloading and running this on my new Windows 10 laptop I saw that the console was not displaying the inputs as intended (it was empty). To get around this please right click on the top of the console window, then click on Properties and there check "Use legacy console". Then close the console and re-launch.

EDIT April 2020: After downloading and running this on my new Windows 10 laptop I saw that the console was not displaying the inputs as intended (it was empty). To get around this please right click on the top of the console window, then click on Properties and there check "Use legacy console". Then close the console and re-launch.

As a small teaser, here's the performance of these models (in terms of IV-RMSE, recalibrated every day) for SPX options during two months at the height of the 2008 crisis. Please note that I used option data (from www.math.ku.dk/~rolf/Svend/) which include options +- 30% in moneyness only. Expirations range from 1M to 3Y. Including further out-of-the-money options does change the relative performance of the models. As does picking a different market period. That being said, my tests so far show that the considered models produce better IV fits than Heston in most cases, as well as representing arguably more realistic dynamics. Therefore they seem to be better candidates to be combined with local volatility and/or jumps than Heston.

Heston | GARCH | Power_law_0.8 | ACE-X1 | I-Ga | XGBM | |

average IV-RMS | 0.91% | 0.81% | 0.82% | 0.66% | 0.73% | 0.45% |

[1] Y. Papadopoulos, A. Lewis, “A First Option Calibration of the GARCH Diffusion Model by a PDE Method.,” arXiv:1801.06141v1 [q-fin.CP], 2018.

[2] N. Langrené, G. Lee and Z. Zili, “Switching to non-affine stochastic volatility: A closed-form expansion for the Inverse Gamma model,” arXiv:1507.02847v2 [q-fin.CP], 2016.

]]>[2] N. Langrené, G. Lee and Z. Zili, “Switching to non-affine stochastic volatility: A closed-form expansion for the Inverse Gamma model,” arXiv:1507.02847v2 [q-fin.CP], 2016.

For those interested there's now a detailed report of this joint collaboration with Alan Lewis on the arXiv: A First Option Calibration of the GARCH Diffusion Model by a PDE Method. Alan's blog can be found here.

EDIT: For the results reported in the paper, Excel's solver was used for the optimization. I've now quickly plugged the PDE engine into M. Lourakis' Levenberg-Marquardt implementation (levmar) and built a basic demo so that perhaps people can try to calibrate the model to their data. It doesn't offer many options, just a fast / accurate switch. The fast option is typically plenty accurate as well. So, if there's some dataset you may have used for calibrating say the Heston model, it would be interesting to see how will the corresponding GARCH diffusion fit compare. Irrespective of the fit, GARCH diffusion is arguably a preferable model, not least because it typically implies more plausible dynamics than Heston. Does this also translate to more stable parameters on recalibrations? Are the fitted (Q-measure) parameters closer to those obtained from the real (P-measure) world? If the answers to the above questions are mostly positive, then coupling the model with local volatility and/or jumps could give a better practical solution than what the industry uses today. (Of course one could just as easily do that with the optimal-*p* model instead of GARCH diffusion, as demonstrated in my previous post.) Something for the near future.

If you do download the calibrator and use it on your data, please do share your findings (or any problems you may encounter), either by leaving a comment below, or by sending me an email. As a bonus, I've also included the option to calibrate another (never previously calibrated) non-affine model, the general power-law model with p = 0.8 (sitting between Heston and GARCH diffusion, see [1]).

If you do download the calibrator and use it on your data, please do share your findings (or any problems you may encounter), either by leaving a comment below, or by sending me an email. As a bonus, I've also included the option to calibrate another (never previously calibrated) non-affine model, the general power-law model with p = 0.8 (sitting between Heston and GARCH diffusion, see [1]).

Note that (unless you have Visual Studio 2013 installed) you will also need to download the VC++ Redistributable for VS2013. The 64-bit version (which is a little faster) also requires the installation of Intel's MKL library.

EDIT April 2020: After downloading and running this on my new Windows 10 laptop I saw that the console was not displaying the inputs as intended (it was empty). To get around this please right click on the top of the console window, then click on Properties and there check "Use legacy console". Then close the console and re-launch.

I am also including in the download a sample dataset I found in [2] (DAX index IV surface from 2002), so that you can readily test the calibrator. I used it to calibrate both Heston (also calibrated in [2], together with many other affine models) and GARCH diffusion. In contrast to the two datasets we used in the paper, in this case GARCH diffusion (RMSE = 1.14%) "beats" Heston (RMSE = 1.32%). This calibration takes about 5 secs. This is faster than the times we report on the paper and the reason is that the data we considered there include some very far out-of-the-money options that slow things down as they require higher resolution. The Levenberg-Marquardt algo is also typically faster than Excel's solver for this problem. It is also "customizable", in the sense that one can adjust the grid resolution during the calibration based on the changing (converging) parameter vector. Still, this version is missing a further optimization that I haven't implemented yet, that I expect to reduce the time further by a factor of 2-3.

EDIT April 2020: After downloading and running this on my new Windows 10 laptop I saw that the console was not displaying the inputs as intended (it was empty). To get around this please right click on the top of the console window, then click on Properties and there check "Use legacy console". Then close the console and re-launch.

I am also including in the download a sample dataset I found in [2] (DAX index IV surface from 2002), so that you can readily test the calibrator. I used it to calibrate both Heston (also calibrated in [2], together with many other affine models) and GARCH diffusion. In contrast to the two datasets we used in the paper, in this case GARCH diffusion (RMSE = 1.14%) "beats" Heston (RMSE = 1.32%). This calibration takes about 5 secs. This is faster than the times we report on the paper and the reason is that the data we considered there include some very far out-of-the-money options that slow things down as they require higher resolution. The Levenberg-Marquardt algo is also typically faster than Excel's solver for this problem. It is also "customizable", in the sense that one can adjust the grid resolution during the calibration based on the changing (converging) parameter vector. Still, this version is missing a further optimization that I haven't implemented yet, that I expect to reduce the time further by a factor of 2-3.

The fitted parameters are:

GARCH: v0 = 0.1724, vBar = 0.0933, kappa = 7.644, xi = 7.096, rho = -0.5224.

Heston: v0 = 0.1964, vBar = 0.0744, kappa = 15.78, xi = 3.354, rho = -0.5118.

Note that both models capture the short-term smile/skew pretty well (aided by the large fitted xi's, aka vol-of-vols), but then result in a skew that decays (flattens) too fast for the longer expirations.

[1] Y. Papadopoulos, A. Lewis, “A First Option Calibration of the GARCH Diffusion Model by a PDE Method.,” arXiv:1801.06141v1 [q-fin.CP], 2018.

[2] Kangro, R., Parna, K., and Sepp, A., (2004), “Pricing European Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform,” Acta et Commentationes Universitatis Tartuensis de Mathematica 8, 123-133.

]]>[2] Kangro, R., Parna, K., and Sepp, A., (2004), “Pricing European Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform,” Acta et Commentationes Universitatis Tartuensis de Mathematica 8, 123-133.

However, other voices suggest that this isn't necessarily true and that models of the Heston type have simply not been used to their full potential. They say why use a deterministic starting point $v_0$ for the variance when the process is really hidden and stochastic? Instead, they propose to give such traditional SV models a "hot start", that is assume that the variance today is given by some distribution and not a fixed value. Mechkov [1] shows that when the Heston model is used like that it is indeed capable of "exploding" smiles as expiries tend to zero. Jacquier & Shi [2] present a study of the effect of the assumed initial distribution type.

The idea seems elegant and it's the kind of "trick" I like, because it's simple to apply to an existing solver so it doesn't hurt trying it out. And it gets particularly straightforward when the option price is found through a PDE solution. Then the solution is automatically returned for the whole range of possible initial variance values (corresponding to the finite difference grid in the

So here I'm going to try this out on a calibration to a chain of SPX options to see what it does. But why limit ourselves to Heston? Because it has a fast semi-analytical solution for vanillas you say. I say I can use a fast and accurate PDE solver instead, like the one I briefly tested in my previous post. Furthermore, is there any reason to believe that a square root diffusion specification for the variance should fit the market, or describe its dynamics better? Maybe linear diffusion could work best, or something in between. The PDE solver allows us to use any variance power

Sounds complicated? It actually worked on the first try. Here's what I got for the end of Q1 2017. Just using the PDE engine and Excel's solver for the optimization. The calibration involved 262 options of 9 different expiries ranging from 1W to 3Y and took a few minutes to complete. If one restricts the calibration to mainly short-term expiries then better results are obtained, but I wanted to see the overall fit when very short and long expiries are fitted simultaneously. I am also showing how the plain Heston model fares. Which on the face of it is not bad, apart for the very short (1W) smile. Visually what it does is try to "wiggle its way" into matching the short-term smiles. The wiggle seems perfectly set up for the 3W smile, but then it turns out excessive for the other expiries. The ROD model on the other hand avoids excessive "wiggling" and still manages to capture the steep smiles of the short expiries pretty well. The optimal model power

But the RMSE only tells half the story. The Feller ratio corresponding to Heston's fitted parameters is 0.09, which basically means that the by far most probable (risk-neutral) long-run volatility value is zero. In other words, the assumed volatility distribution is not plausible. The randomization idea is neither without any issues, despite the impressive improvement in fit. The optimizer calibrated to a correlation coefficient of -1 for this experiment, which seems extreme and not quite realistic.

By the way, this experiment is part of some research I've been doing in collaboration with Alan Lewis, the results of which will be available/published soon.

This year though I've worked a lot more on such solvers and I now realise that those run times are clearly not what one should be expecting, nor aim for with (fine-tuned) implementations. So just how fast should your numerical solver of the Heston (or similar) PDE be? What kind of run time should be expected for high-accuracy solutions? The answer is:

κ | η | σ | ρ | rd | rf | T | K | νo | |

Case 0 | 5 | 0.16 | 0.9 | 0.1 | 0.1 | 0 | 0.25 | 10 | 0.0625 |

Case 1 | 1.5 | 0.04 | 0.3 | -0.9 | 0.025 | 0 | 1 | 100 | 0.0625 |

Case 2 | 3 | 0.12 | 0.04 | 0.6 | 0.01 | 0.04 | 1 | 100 | 0.09 |

Case 3 | 0.6067 | 0.0707 | 0.2928 | -0.7571 | 0.03 | 0 | 3 | 100 | 0.0625 |

Case 4 | 2.5 | 0.06 | 0.5 | -0.1 | 0.0507 | 0.0469 | 0.25 | 100 | 0.0625 |

Case 5 | 3 | 0.04 | 0.01 | -0.7 | 0.05 | 0 | 0.25 | 100 | 0.09 |

My current "state-of-the-art" solver uses second-order spatial discretization and the Hundsdorfer-Verwer ADI scheme. For the results below I used the solver "as is", so no fine-tuning for each case, everything (grid construction) decided automatically by the solver. In order to give an idea of the accuracy achieved overall for each case I plot the solution error in the asset direction (so across the moneyness spectrum). The plots are cut-off where the option value becomes too small, the actual grids used by the solver extend further to the right. I am showing results for two different grid resolutions (NS x NV x NT), grid A (60 x 40 x 30) and grid B (100 x 60 x 50). The timings were taken on an i7-920 PC from 2009 in single-threaded mode. Obviously one can expect at least double single-thread speed from a modern high-spec machine. (ADI can be parallelized as well with almost no extra effort, with a parallel efficiency of about 80% achieved using basic OpenMP directives). The errors in each case are calculated by comparing with the exact (semi-analytic) values obtained using QuantLib at the highest precision. Note that I am using the relative errors which are harder to bring down as the option value tends to zero. But when one wants to use the PDE solver as the pricing engine for a calibration, then it is important to price far out-of-the-money options with low relative errors in order to get accurate implied volatilities and properly capture any smile behavior. The present solver can indeed be used to calibrate the Heston model (or any other model in this family) accurately in less than a minute and in many cases in just a few seconds.

So, if you have a Heston PDE solver then this is an efficiency reference you can use to benchmark against. Price the 6 options above and compare error plots and timings with those below. Let me know what you get. If you're getting larger errors than I do with the above resolutions, I'll tell you why that is!

There is of course another important quality for a PDE solver, robustness. The scheme I used here (H-V) is fairly robust, but can still produce some spurious oscillations when one uses too low NT/NS. I may expand this post testing other schemes in the near future.

So, if you have a Heston PDE solver then this is an efficiency reference you can use to benchmark against. Price the 6 options above and compare error plots and timings with those below. Let me know what you get. If you're getting larger errors than I do with the above resolutions, I'll tell you why that is!

There is of course another important quality for a PDE solver, robustness. The scheme I used here (H-V) is fairly robust, but can still produce some spurious oscillations when one uses too low NT/NS. I may expand this post testing other schemes in the near future.

]]>

As an example let's try to price a daily monitored up and out put option. I'll use simple Black Scholes to demonstrate, but the qualitative behaviour would be very similar under other models. In order to show the effect clearly I'll start with a uniform grid. The discetization uses central differences and is thus second order in space (asset S) and Crank-Nicolson is used in time. That will sound alarming if you're aware of C-N's inherent inability to damp spurious oscillations caused by discontinuities, but a bit of Rannacher treatment will take care of that (see here). In either case in order to take time discretization out of the picture here, I used 50000 time steps for the results below so there's no time-error (no oscillation issues either) and thus the plotted error is purely due to the S-dicretization. The placement of grid points relative to a discontinuity has a significant effect on the result. Having a grid point falling exactly on the barrier will produce different behaviour as opposed to having the barrier falling mid-way between two grid points. So Figure 1 has the story. The exact value (4.53888216 in case someone's interested) was calculated on a very fine grid. It can be seen that the worst we can do is place a grid point on the barrier and solve with no smoothing. The error using the coarsest grid (which still has 55 points up to the strike and it's close to what we would maybe use in practice) is clearly unacceptable (15.5%). The best we can do without smoothing (averaging), is to make sure the barrier falls in the middle between two grid points. This can be seen to significantly reduce the error, but only once we've sufficiently refined the grid (curve (b)). We then see what can be achieved by placing the barrier on a grid point and smooth by averaging as described above. Curve (c) shows linear smoothing already greatly improves things again compared to the previous effort in curve (b). Finally curve (d) shows that quadratic smoothing can add some extra accuracy yet.

S = 98, K = 110, B = 100, T = 0.25, vol = 0.16, r = 0.03, 63 equi-spaced monitoring dates (last one at T)

This case is also one which greatly benefits from the use of a non-uniform grid which concentrates more points near the discontinuity. The slides below show exactly that. Quadratic smoothing with the barrier on a grid point is used. First is the solution error plot for a uniform grid with of dS = 2, which corresponds to the first point from the left of curve (d) in figure 1. The second slide shows the error we get when a non-uniform grid of the same size (and hence same execution cost) is used. The error curve has been pretty much flattened even on this coarsest grid. The error now has gone from 15.5% on a unifom grid with no smoothing, down to 0.01% on a graded grid with smoothing applied, for the same computational effort. Job done. Here's the function I used for it: SmoothOperator.cpp.

]]>

Now there are generally two ways of reducing, or even getting rid of this simulation bias in special cases. The first is to shift the barrier closer to the asset spot in order to compensate for the times when we "blink". The question is by how much should the barrier be shifted? This was first treated by Broadie, Glasserman & Kou [1], for the Black-Scholes model, where they introduced a continuity correction based on the "magical" number 0.5826. Check it out if you haven't. This trick works pretty well when the spot is not very near the barrier and/or when we have many monitoring dates. But in the opposite case, it can be quite bad and produce errors of 5% or more (Gobet, [2]). This gets worse when there is a steep drop of the value near the barrier, as in a down and out put, or an up and out call with low volatility.

The second way is to use an analytic expression for the probability of hitting the barrier between two realized points in a simulated asset path. This is usually referred to as the probabilistic method, or the Brownian bridge technique, since it uses the probability of Brownian motion hitting a point conditional on two fixed end points. As already implied this probability is known analytically (see [2]) for Brownian motion (and consequently for GBM as well). So for the usual GBM asset model (Black-Scholes), this technique can remove the simulation bias completely. One can just use the probability directly, multiplying each path's payoff with the total path survival probability (based on the survival probability of each path segment

As can be seen the probabilistic method (enabled by checking "continuous monitoring") enables exact Monte Carlo pricing of this continuously monitored up and out call (for which of course the exact analytic solution is available and is shown as well and marked by the blue line). By contrast, the barrier-shifting method misprices this option by quite some margin (3.4%), as can be seen in the second slide where the probabilistic correction is unchecked and instead the barrier has been shifted downwards according to BGK's (not so magic here) formula.

Now the analytical (Brownian Bridge) hitting probability is only available for a BM/GMB process. But what if we have a not so simple process? Well, as long as the discretization scheme we use for the asset makes it locally BM/GBM (always the case when we use Euler to discretize

κ | η | σ | ρ | rd | rf | T | K | U | So | vo | |

Case 1 | 5 | 0.16 | 0.9 | 0.1 | 0.1 | 0 | 0.25 | 100 | 120 | 100 | 0.0625 |

Case 2 | 5 | 0.16 | 0.9 | 0.1 | 0.1 | 0 | 0.25 | 100 | 135 | 130 | 0.0625 |

Case 3 | 1.5 | 0.04 | 0.3 | -0.9 | 0.025 | 0 | 0.25 | 100 | 115 | 100 | 0.0625 |

Case 4 | 1.5 | 0.04 | 0.3 | -0.9 | 0.025 | 0 | 0.25 | 100 | 135 | 130 | 0.0625 |

Case 5 | 3 | 0.12 | 0.04 | 0.6 | 0.01 | 0.04 | 0.25 | 100 | 120 | 100 | 0.09 |

Case 6 | 2.5 | 0.06 | 0.5 | -0.1 | 0.0507 | 0.0469 | 0.5 | 100 | 120 | 100 | 0.0625 |

Case 7 | 6.21 | 0.019 | 0.61 | -0.7 | 0.0319 | 0 | 0.5 | 100 | 110 | 100 | 0.010201 |

Solver | Price | Difference | |

Case 1 | PDE/FDM cont. monitoring (exact) | 1.8651 | - |

MC with 63 timesteps & PCC | 1.8652 | 0.01% | |

Plain MC with 63 timesteps | 2.1670 | 16% | |

Case 2 | PDE/FDM cont. monitoring (exact) | 2.5021 | - |

MC with 63 timesteps & PCC | 2.5032 | 0.04% | |

Plain MC with 63 timesteps | 3.4159 | 37% | |

Case 3 | PDE/FDM cont. monitoring (exact) | 2.1312 | - |

MC with 63 timesteps & PCC | 2.1277 | -0.16% | |

Plain MC with 63 timesteps | 2.3369 | 10% | |

Case 4 | PDE/FDM cont. monitoring (exact) | 3.6519 | - |

MC with 63 timesteps & PCC | 3.6394 | -0.34% | |

Plain MC with 63 timesteps | 4.6731 | 28% | |

Case 5 | PDE/FDM cont. monitoring (exact) | 1.6247 | - |

MC with 63 timesteps & PCC | 1.6249 | 0.01% | |

Plain MC with 63 timesteps | 1.8890 | 16% | |

Case 6 | PDE/FDM cont. monitoring (exact) | 1.7444 | - |

MC with 125 timesteps & PCC | 1.7438 | -0.03% | |

Plain MC with 125 timesteps | 1.9209 | 10.1% | |

Case 7 | PDE/FDM cont. monitoring (exact) | 1.9856 | - |

MC with 125 timesteps & PCC | 1.9790 | -0.33% | |

Plain MC with 125 timesteps | 2.0839 | 4.95% |

Finally regarding the variance value used in the probability formulas, my brief testing showed that the best results were obtained when using the variance corresponding to the one of the two nodes *S*[i] and *S*[i+1] which is closer to the barrier. This was the choice for the results of Table 2.

Once I set up a more accurate long stepping discretization scheme, I may test to see how well this approximation holds when one uses fewer/longer time steps.

Once I set up a more accurate long stepping discretization scheme, I may test to see how well this approximation holds when one uses fewer/longer time steps.

[1] Broadie M, Glasserman P, & Kou S. A continuity correction for discrete barrier options, *Mathematical Finance**,* Vol. 7 (4) (1997), pp. 325-348.

[2] Emmanuel Gobet. Advanced Monte Carlo methods for barrier and related exotic options. Bensoussan A., Zhang Q. et Ciarlet P.*Mathematical Modeling and Numerical Methods in Finance*, Elsevier, pp.497-528, 2009, *Handbook of Numerical Analysis*

]]>[2] Emmanuel Gobet. Advanced Monte Carlo methods for barrier and related exotic options. Bensoussan A., Zhang Q. et Ciarlet P.

Well I decided to give it another try and see how it compares really after a few optimizing tweaks.

We are trying to solve the following PDE by discretizing it with the finite difference method:

$$\frac{\partial V}{\partial t} = \frac{1}{2} S^2 \nu\frac{\partial^2 V}{\partial S^2}+\rho \sigma S \nu \frac{\partial^2 V}{\partial S \partial \nu} + \frac{1}{2}\sigma^2\nu\frac{\partial^2 V}{\partial \nu^2} + (r_d-r_f)S \frac{\partial V}{\partial S}+ \kappa (\eta-\nu) \frac{\partial V}{\partial \nu}-r_d V $$

where $\nu$ is the variance of the underlying asset $S$ returns, $\sigma$ is the volatility of the $\nu$ process, $\rho$ the correlation between the $S$ and $ \nu$ processes and $\eta$ is the long-term variance to which $\nu$ is mean-reverting with rate $\kappa$. Here $S$ follows geometrical Brownian motion and $\nu$ the CIR (Cox, Ingersoll and Ross) mean-reverting process.

After we discretize the PDE we are left with an algebraic system of equations relating the values at each grid point (

The 3LFI

This scheme is strongly A-stable (?), so unlike something like Crank-Nicolson (CN) it has built-in oscillation-damping on top of being unconditionally stable. So I tested this first and while it worked pretty well, I found that it produced slightly larger errors than CN. That is while they're both second-order accurate, CN's error-constant was lower. Which is not surprising, central discretizations often produce higher accuracy than their one-sided counterparts of the same order. Moreover it was notably less time-converged compared to the ADI schemes using the same number of time-steps.

So I decided to go for higher-spec. Especially since it comes really cheap in terms of implementation. Adding an extra time-point gives us the

$$ BDF3 : \quad V_{i,j}^{(n+1)} = \left (B \cdot dt+3 \cdot V_{i,j}^{(n)}-\frac 3 2 \cdot V_{i,j}^{(n-1)}+\frac 1 3 \cdot V_{i,j}^{(n-2)}\right)/\left(\frac {11} 6 - a_{i,j} \cdot dt\right) \qquad (1)$$

While such a scheme is no longer strictly unconditionally stable (it's almost A-stable), it should still be almost as good as and should preserve the damping qualities of 3LFI/BDF2. As it happens, the results it produces are significantly more accurate than both the 3LFI/BDF2 and the ADI

Now since this is a multi-step scheme (it needs the values not just from the previous time level but also from two levels before that), we need to use something different for the first couple of time steps, a starting procedure. For the very first step we need a two-level (single step) scheme. We cannot use CN because this will allow spurious oscillations to occur given the non-smooth initial conditions of option payoff functions. Instead one can use the standard Implicit Euler scheme plus local extrapolation: we first use the IE scheme for 4 sub-steps of size

Central finite differences are used for everything, except again for the

I used successive over-relaxation (SOR) to solve the resulting system because it's so easy to set up and accommodate early exercise features. On the other hand it is in general quite slow and moreover its speed may depend significantly on the chosen relaxation factor $\omega$. This is where the hand-made part comes in. I am using an SOR variant called

As I mentioned above, I wanted to compare this general purpose, old-school method to what people use nowadays for Heston (and other parabolic PDE's with mixed derivatives), which will usually be an ADI variant like Graig-Sneyd, Modified Graig-Sneyd, or Hundsdorfer-Verwer. To this end I did not implement said methods myself, but I downloaded and used QuantLib which does implement them. It's the first time that I use QuantLib and was pleasantly surprised by the available functionality (but less so by the lack of quantitative-level documentation). So I will be comparing with their implementation.

I will use the following 6 sets of parameters found in the relevant literature in order to make a few quick comparisons:

κ | η | σ | ρ | rd | rf | T | K | νo | Ref | |

Case 0 | 5 | 0.16 | 0.9 | 0.1 | 0.1 | 0 | 0.25 | 10 | 0.0625 | [2] |

Case 1 | 1.5 | 0.04 | 0.3 | -0.9 | 0.025 | 0 | 1 | 100 | 0.0625 | [1] |

Case 2 | 3 | 0.12 | 0.04 | 0.6 | 0.01 | 0.04 | 1 | 100 | 0.09 | [1] |

Case 3 | 0.6067 | 0.0707 | 0.2928 | -0.7571 | 0.03 | 0 | 3 | 100 | 0.0625 | [1] |

Case 4 | 2.5 | 0.06 | 0.5 | -0.1 | 0.0507 | 0.0469 | 0.25 | 100 | 0.0625 | [1] |

Case 5 | 3 | 0.04 | 0.01 | -0.7 | 0.05 | 0 | 0.25 | 100 | 0.09 | [3] |

The first case has been used by Ikonen & Toivanen [2] and other authors, cases 1-4 are from Hout & Foulon [1] and the last one is from Sullivan & Sullivan [3] who quote that such a set would be typical of a calibration of the model to equity market options. Note the very small volatility of variance $ \sigma$ which makes the PDE strongly convection-dominated in the

So let's start with a very unscientific comparison. Table 2 shows the prices, errors and timings for at-the-money puts at the current variance levels $\nu_o$ of table 1, as calculated by the present solver and the modified Craig-Sneyd ADI scheme in QLib. This is just a quick and dirty way to get a first idea since there are more differences than just the method. QLib discretizes in log(

Finally a fine grid was chosen deliberately since this is where SOR-based solvers are usually reported to suffer the most in terms of efficiency when compared with splitting schemes like ADI. The stopping criterion was set low enough so that the prices are converged to all six digits shown. This is definitely lower than the average discretization error and thus unnecessary, so in real-life situations the present solver could be set up to be a little faster still.

Solver | Price | |Error| | CPU secs | |

Case 0 | Present | 0.501468 | 0.000002 | 4.1 |

QLib MGS | 0.501469 | 0.000003 | 7.5 | |

Exact | 0.5014657 | - | - | |

Case 1 | Present | 7.46150 | 0.000234 | 6.2 |

QLib MGS | 7.46175 | 0.000015 | 7.5 | |

Exact | 7.461732 | - | - | |

Case 2 | Present | 14.4157 | 0.000008 | 7.4 |

QLib MGS | 14.4166 | 0.000861 | 7.5 | |

Exact | 14.41569 | - | - | |

Case 3 | Present | 12.0821 | 0.001076 | 6.6 |

QLib MGS | 12.0830 | 0.000122 | 7.5 | |

Exact | 12.08315 | - | - | |

Case 4 | Present | 4.71629 | 0.000007 | 3.8 |

QLib MGS | 4.71637 | 0.000072 | 7.5 | |

Exact | 4.716294 | - | - | |

Case 5 | Present | 4.83260 | 0.000002 | 4.3 |

QLib MGS | 4.83267 | 0.000069 | 7.5 | |

Exact | 4.832600 | - | - |

As I've said above this is really a dirty comparison, but basically I just wanted to see how much slower the present scheme is, is it 3 times, or 5 times, or 10. The next thing to check would be robustness, but I'll leave that for a different post. So then, to my surprise the present method seems to be overall on par speed-wise with the ADI method, faster even for the chosen number of time steps. This is in contrast to what one reads in the literature where SOR is always the "slow" method one uses to compare with the more efficient modern methods. Not the case here with the

QLib by default focuses the

A better picture of the quality of the solution is possible if we look at the error distribution across different values of

When we price an option by solving a PDE we get the entire solution surface for the grid range

So what do we see in this (used frequently in literature) case ? Two things: First that the non-uniform grid construction in log(

Another advantage of SOR when it comes to option pricing, apart from its simplicity, is the ease with which one can incorporate early exercise features. It's literally one extra line of code (OK, maybe a couple more counting the equally trivial changes in the boundary conditions). So let's have a look at the accuracy of the results produced by the present method. I will again use test case 0 from table 1 which has been used extensively in the literature for American option pricing under Heston (see for example [2]). We should note here that with the ADI schemes there are various ways with which one can handle early exercise. One of the best is probably the operator splitting method of Ikonen & Toivanen in [2], although in that paper they use it with different time-discretizations. I will compare with their results below but first I will again make a quick comparison with QLib's American Heston pricing functionality, using the same ADI scheme (MGS) like before. QLib does not use a "proper" method for handling early exercise, it simply applies the American constraint explicitly, after the prices have been obtained like European at each time step (explicit payoff method). Such an approach cannot give very accurate results. With SOR on the other hand (in this context called PSOR, i.e. Projected SOR) the American constraint is really "blended into" the solution. We apply it every time we update the value of each grid point and the next point we update can "feel" the effect of the previous points we have updated, i.e. it can feel that the values of the previous points have been floored by the payoff. And this is repeated many times (as many as the iterations needed for the solution to converge for that time step), which leads to all the points properly "sharing" what each needs to know about the early exercise effect. So in the end of each time step the PDE and the constraint are satisfied at each point simultaneously. Unlike with the explicit, a posteriori enforcement of the constraint in QLib, which unavoidably cancels all the good work the ADI had done before to exactly satisfy the discretized PDE at each point. The result of these different ways of incorporating early exercise can be seen below.

Present solver BDF3 - PTSOR | QLib MGS - EP | |||||

Nt | Price | |Error| | CPU secs | Price | |Error| | CPU secs |

50 | 0.519890 | 0.000004 | 0.33 | 0.519376 | 0.000557 | 0.28 |

100 | 0.519893 | 0.000001 | 0.44 | 0.519653 | 0.000280 | 0.51 |

200 | 0.519894 | 0.000000 | 0.43 | 0.519793 | 0.000140 | 0.97 |

400 | 0.519894 | 0.000000 | 0.73 | 0.519863 | 0.000070 | 1.9 |

800 | 0.519894 | 0.000000 | 1.1 | 0.519900 | 0.000033 | 3.7 |

1600 | 0.519894 | 0.000000 | 1.6 | 0.519918 | 0.000015 | 7.3 |

20000 | 0.519894 | 0 | - | 0.519934 | 0 | - |

Table 3 shows the pricing of the American put of case 0 from table 1 (also used for such tests in [2]) with a fixed spatial grid of 200 x 100 and increasing number of time steps. The time-converged value was calculated in both cases using 20000 time steps. Note that the converged values are not the same for the two solvers because they are using different spatial discretizations.

As we can see, apart from the first order convergence, the explicit payoff method of QLib also comes with a large error constant. Even with 1600 time steps the error is greater than that of the present solver with only 50 time steps. That said, the predictably first order convergence of the Qlib approach means that one can use Richardson extrapolation and get much improved accuracy.

The other thing to note is that the present solver becomes much faster than the ADI when the number of time steps is increased. Unfortunately this proves to be a rather useless advantage since given its high temporal accuracy one would not need to use a large number of time steps anyway.

Ikonen & Toivanen in [2] provide extensive results for the same test case which include the use of the standard (P)SOR with three second order time schemes (Crank-Nicolson, BDF2 and Runge Kutta), as well as their operator splitting (OS) method coupled with a multigrid (MG) solver. Their space discretization is very similar to the one used here (second-order central schemes for first and second derivative terms), although they do add some diffusion in areas of the solution where convection dominates, in order to avoid positive non-diagonal coefficients. To this end they also use a special scheme for the discretization of the mixed derivative term. This extra safety added should in general result in some loss of accuracy. Still I think it's probably not unreasonable to make direct accuracy comparisons with their results using the same (uniform) grid sizes. I also use the same computational domain

Present solver | Ikonen & Toivanen (2009) | ||||||||||||

(NS,Nν,Nt) | PTSOR -BDF3 | PSOR-CN | OS-CN w/ MG | ||||||||||

Error | Term. error | Avg. iter | Avg. ω | CPU | Error | Avg. iter | ω | CPU | Error | CPU | |||

(40,16,16) | 4.85E-03 | - | 9.3 | 1.11 | 0.01 | 1.55E-02 | 10.4 | 1.4 | 0 | 1.49E-02 | 0 | ||

(40,32,16) | 4.99E-03 | - | 10.4 | 1.27 | 0.01 | 1.61E-02 | 12.9 | 1.4 | 0 | 1.55E-02 | 0 | ||

(80,16,16) | 1.10E-03 | - | 14.7 | 1.21 | 0.02 | 4.57E-03 | 22.1 | 1.6 | 0.01 | 4.06E-03 | 0.01 | ||

(80,32,16) | 1.33E-03 | - | 13.9 | 1.36 | 0.03 | 4.44E-03 | 22.1 | 1.6 | 0.02 | 3.69E-03 | 0.02 | ||

(80,32,32) | 1.39E-03 | - | 8.3 | 1.26 | 0.03 | 4.19E-03 | 14.4 | 1.5 | 0.04 | 3.77E-03 | 0.04 | ||

(80,64,32) | 1.42E-03 | - | 10.6 | 1.36 | 0.05 | 1.72E-03 | 16.1 | 1.6 | 0.09 | 3.83E-03 | 0.09 | ||

(160,32,32) | 3.11E-04 | 7% | 11.6 | 1.4 | 0.06 | 1.36E-03 | 32.9 | 1.7 | 0.17 | 9.79E-04 | 0.08 | ||

(160,64,32) | 3.45E-04 | - | 16.9 | 1.59 | 0.14 | 1.30E-03 | 32 | 1.7 | 0.33 | 8.78E-04 | 0.17 | ||

(160,64,64) | 3.64E-04 | - | 8.4 | 1.44 | 0.14 | 1.15E-03 | 20.1 | 1.7 | 0.42 | 9.19E-04 | 0.33 | ||

(160,128,64) | 3.74E-04 | 2% | 10.7 | 1.58 | 0.34 | 1.16E-03 | 21.9 | 1.7 | 0.92 | 9.13E-04 | 0.75 | ||

(320,64,64) | 1.09E-04 | 10% | 11.7 | 1.6 | 0.36 | 4.22E-04 | 49.9 | 1.8 | 2.1 | 2.43E-04 | 0.74 | ||

(320,128,64) | 1.10E-04 | 9% | 13.21 | 1.66 | 0.84 | 4.05E-04 | 45.4 | 1.8 | 3.7 | 2.17E-04 | 1.6 | ||

(320,128,128) | 1.05E-04 | 4% | 8 | 1.57 | 1 | 3.28E-04 | 26.9 | 1.7 | 4.5 | 2.24E-04 | 2.7 | ||

(320,256,128) | 1.04E-04 | 3% | 10.2 | 1.68 | 2.6 | 3.30E-04 | 29.8 | 1.8 | 9.8 | 2.46E-04 | 5.5 | ||

(640,128,128) | 4.19E-05 | 23% | 10.9 | 1.66 | 2.8 | 1.43E-04 | 65.3 | 1.8 | 21.3 | 6.88E-05 | 6.6 | ||

(640,256,128) | 4.38E-05 | 27% | 11.7 | 1.71 | 6.5 | 1.42E-04 | 69.4 | 1.8 | 45.2 | 6.71E-05 | 13.6 |

The error shown in table 4 is the

So then, the one thing we can directly compare is the number of SOR iterations needed. These can be seen to be lower across the board with the present set-up compared to the one in [2]. More interestingly, the big difference is with the finer grids where the standard SOR requires increasingly higher numbers of iterations to converge. This is not so in the case of the present solver, where the required number of iterations remains more or less constant. This is a result of both the use of the TSOR variant and the BDF3 scheme. Consequently, the present solver would seem to be significantly faster than the equivalent SOR solver in [2], about 6 to 7 times faster for the fine grids, based on numbers of iterations but also CPU times (note that the CPU time per iteration seems to be about the same in both cases). It could also be faster than the multi-grid solver of [2] but again, these are two completely different implementations at different times from different people, so such a comparison can only be seen as vaguely indicative. Note that the perceived speed up factor of 6-7 for the finer grids is quite sensitive to the exact termination criterion. If for example I use a stricter criterion that results in the SOR termination error being 5% or 10% instead, then the speed up factor looks more like 3-4. I don't know exactly how converged the PSOR solutions in [2] are, the authors noting that "the error due to the termination of iterations is smaller than the discretization error. On the other hand, the criterion is not unnecessary tight which would increase the CPU time without increasing accuracy".

In terms of accuracy achieved, table 4 shows that the present solver gives errors that are 2 to 4 times lower. This is most probably due to the extra diffusion added with the special discretization used in [2] as mentioned above. Indeed looking at some European option prices also reported in [2], even though second order convergence is observed, the error overall seems higher than what I get with my discretization using the same grids. In terms of temporal convergence, the present method seems close to first order for low numbers of time steps

Finally, table 5 shows benchmark values for the 10 points considered in table 4, for future reference. Many publications have presented their estimated benchmarks for those 10 points and since I like calculating benchmarks, here are mine (obviously the most accurate!). Please note though that these are not the reference values used to calculate the errors in table 4, since those needed to be for the same computational domain of

with NS = 5120, Nv = 1920, Nt = 4500.

S | |||||

ν | 8 | 9 | 10 | 11 | 12 |

0.0625 | 2.0000000 | 1.1076218 | 0.5200330 | 0.2136783 | 0.0820444 |

0.25 | 2.0783684 | 1.3336383 | 0.7959825 | 0.4482762 | 0.2428072 |

In this rather lengthy post I pitted an experimental, easy to implement solver for partial differential equations with mixed derivatives, against more advanced and complicated methods. There is really nothing new about it, except maybe the TSOR variant (which I chose not to explain for the time being). I also don't think I've seen the BDF3 scheme used for the solution of the Heston PDE before, so someone may find this interesting as well, maybe. The first results show the present combo to be very competitive against the modified Craig-Sneyd ADI implementation in QuantLib and significantly faster than other SOR-based solvers reported in literature such as the one in [2] (or the one in [3] which I've also compared against). I will not say that it really is as "good" as the ADI methods, since I have not yet thoroughly tested it in terms of robustness. In terms of stability/oscillation damping is should fare well, but that needs to be confirmed as well. It certainly has its quirks, for example it "likes" certain grid proportions better (in terms of CPU/accuracy ratio), speeds up with small time steps and slows down with larger ones and needs some heuristic logic for adjusting the relaxation factor. All to do with the iteration matrix of course, but it's not difficult to set it up for near-optimal operation. Also need to note that the implementation (QLib) I tested against may not be representative of how fast an ADI method could be, so the timings above may be misleading.

All this took me quite a bit more time to prepare than I thought it would, so I'll stop here. When I feel like more numerical games I may add another post testing stability and robustness, as well as non-uniform grids. Does the use of central spatial discretizations all around mean that we may get spurious oscillations in convection-dominated pricing problems? If that's the case, replacing the central finite differences with second order upwind ones where needed is really straightforward. It would also come with no extra performance cost, as would be the case with ADI since it would mean that fast tridiagonal solvers would have to be replaced by band-diagonal ones that are not as fast. Also my non-uniform grid implementation currently seems in many cases up to two times slower than the present (uniform) grid implementation. This is partly due to more arithmetic operations and the SOR iteration matrix being less "favourable" for the same grid size.

[1] Hout K. J. In ’t & Foulon S. ADI finite difference schemes for option pricing in the Heston model with correlation, *International Journal of Numerical Analysis and Modeling,* 7 (2) (2010), pp. 303-320.

[2] S. Ikonen & J. Toivanen, Operator splitting methods for pricing American

options under stochastic volatility,*Numerische Mathematik, *113 (2009), pp. 299–324.

[3] C. O'Sullivan & S. O'Sullivan, Pricing European and American Options in the Heston Model with Accelerated Explicit Finite Differencing Methods,*International Journal of Theoretical and Applied Finance*, Vol. 30, No. 4 (2013).

]]>[2] S. Ikonen & J. Toivanen, Operator splitting methods for pricing American

options under stochastic volatility,

[3] C. O'Sullivan & S. O'Sullivan, Pricing European and American Options in the Heston Model with Accelerated Explicit Finite Differencing Methods,

The other day I noticed this interesting graph lying in a corner on my desktop and I thought I might as well post it here. It was part of the background tests I did while I was calculating the benchmark results for the driven cavity flow problem. In order to have first an idea of how well I could expect my 2D Richardson extrapolation (RE) set-up to work under "perfect" conditions, I thought I would apply it first on a simpler, more benign problem. To this end I solved the 2D Laplace equation for the steady state temperature distribution on a unit square plate, with the top side maintained at a constant unit temperature and the remaining three sides at zero (Dirichlet-type boundary conditions). This problem has an analytic solution in the form of an infinite (Fourier) series, which we can use to calculate the error of different numerical methods/schemes. So I was curious to see to what extent can RE improve on a second-order scheme's accuracy and especially how the results fare against a proper, natively fourth-order scheme.

I remember reading long time ago that RE will indeed improve the accuracy of a second-order scheme, increase its convergence order to $O({h^4})$ if implemented properly, but will not produce as accurate results as a scheme that is fourth-order accurate ($O({h^4})$ truncation error) by design. I had seen examples of this in 1D, but I hadn't seen results for 2D so I thought I'd do this test. Let's imagine for now that grid points are uniformly spaced, as was indeed the case in this test. The second-order scheme used is the standard 5-point stencil and the fourth-order scheme is the fantastic 9-point compact stencil.

RE implementation

RE implementation

The RE solution is a linear combination of two solutions calculated on two different grids (representing the solution on two different sets of discrete points, or nodes if you prefer). In order to combine the two solutions we need to have them at the same locations, so if there's no common set of points we then need to somehow bring/project one solution onto the grid of the other. So we need to interpolate. Usually people will perform RE by combining the solution on a grid with say N steps in each dimension (*h*=1/N), with that on a grid with half or double the spacing *h*. This way there's no need to use interpolation if we're projecting our final RE result on the coarser grid, because the finer grid already has the solution on all the coarser grid's points. If we want the RE results "printed" on the finer grid's resolution instead, then we need to interpolate in order to "guess" what the coarser grid's solution is at those extra points that the finer grid has. In the 2D case this means we would need to produce N$\times$N values based on information from only 25% as many (the 0.5$\times$0.5$\times$N$\times$N values of our coarse grid).

I prefer keeping the two grid resolutions closer to each other. Too close is not good either though because then the two solutions are not "distinct" enough and the extrapolation loses accuracy. My tests here showed that a spacing ratio of 0.85 is about right. In this case when we interpolate the results of the coarser (0.85$\times$0.85$\times$N$\times$N) grid "upwards", we generate our N$\times$N values now using 72% as many values, as opposed to 25% in the half spacing case. We now definitely need to interpolate of course because the two grids' sets of points are going to be distinct. Simple (2D) local polynomial interpolation was used here. The important detail is that the interpolating polynomial accuracy (degree) needs to be the same or higher than that which we expect to obtain from the extrapolation. I used sixth-degree polynomials for the results in the graph below. That worked fine because the combination of the smoothly-varying field and the high grid resolutions meant that any 7 contiguous points used to fit the polynomials were always pretty much on a straight line. When the function we are trying to find a solution for is less benign, using such high-order polynomials will in general make things worse due to Runge's phenomenon. We would then need to cramp many more grid points in order to avoid this problem. In practice, non-uniform grids are often used anyway in order to properly resolve areas of steep gradients by concentrating more points there, regardless of whether we plan to extrapolate or not.

I prefer keeping the two grid resolutions closer to each other. Too close is not good either though because then the two solutions are not "distinct" enough and the extrapolation loses accuracy. My tests here showed that a spacing ratio of 0.85 is about right. In this case when we interpolate the results of the coarser (0.85$\times$0.85$\times$N$\times$N) grid "upwards", we generate our N$\times$N values now using 72% as many values, as opposed to 25% in the half spacing case. We now definitely need to interpolate of course because the two grids' sets of points are going to be distinct. Simple (2D) local polynomial interpolation was used here. The important detail is that the interpolating polynomial accuracy (degree) needs to be the same or higher than that which we expect to obtain from the extrapolation. I used sixth-degree polynomials for the results in the graph below. That worked fine because the combination of the smoothly-varying field and the high grid resolutions meant that any 7 contiguous points used to fit the polynomials were always pretty much on a straight line. When the function we are trying to find a solution for is less benign, using such high-order polynomials will in general make things worse due to Runge's phenomenon. We would then need to cramp many more grid points in order to avoid this problem. In practice, non-uniform grids are often used anyway in order to properly resolve areas of steep gradients by concentrating more points there, regardless of whether we plan to extrapolate or not.

Results

So here's the graph I was talking about. It shows how the average absolute error (across all solution grid points) falls as we make the grid spacing *h* smaller. Well not exactly all grid points. I've actually excluded the first 5% from the top side of the square domain downwards. The reason is that due to the singularities in the top two corners (the boundary conditions there change discontinuously from unit at the top to zero at the sides), the solution in their vicinity misbehaves. Here we don't care about that so in order to get a clean picture of the orders of convergence under "normal" circumstances, I left those points out of the average error calculation. So on to the results. Expectations were generally confirmed, or even exceeded in the case of the double RE. The latter basically combines two (fourth-order) RE results to extrapolate to a theoretically sixth-order accurate result. With double RE being an extrapolation based on other (single RE) extrapolations (all of which based on interpolations!), I was not sure at all I'd get anywhere near sixth-order. But sure enough, $O({h^{6+}})$ was obtained for this smooth field. As expected, even though the single RE actually achieved slightly better order of convergence than the native fourth-order 9-point scheme in this case, the actual error it produced was significantly higher, about 22 times higher to be exact for the range of *h* used. The double RE though managed to beat the fourth-order scheme for actual accuracy achieved, producing lower errors when the grid is reasonably fine (for *h < *0.01 roughly).

]]>