Applied numerical methods
  • Home
  • The Cavity Sessions
  • THE BENCHMARKS
  • Option Pricer
  • Miscellaneous
  • Contact

How fast and accurate is your FX TARF pricing engine?                            Part 2: Local Volatility Model

2/16/2023

0 Comments

 
In my previous post I talked about fast and accurate pricing of FX Target Redemption Forwards (TARF's) using the PDE/FDM method. I did that in a flat volatility (Black Scholes) setting to briefly show that even in this simple case there can be a big difference in the performance of the TARF pricing engine, depending on how considerate one is, God (or the devil) is in the details.

Let me clarify that here I am
obviously talking about accuracy in terms of the discretization error of the numerical solution, not the model error. Just like a Monte Carlo simulation needs as many optimizations (and iterations) as one can afford in order to get an acceptably converged result, the same holds for finite difference/element methods (FDM/FEM) - just replace iterations with grid/mesh size. 

Setting up the PDE-based TARF pricing engine properly is not trivial in its implemention and testing its convergencge properties with flat volatility should be the first step. But of course in most applications a volatility model is necessary. So in this post I will just take the engine I built for the previous post and instead of flat volatility use a
(space and time-dependent) local volatility function a la Dupire. This is a pretty easy upgrade. After all the engine's building blocks are a bunch of individual 1-D BS PDE solvers and all we need to do is allow the volatility to be a function, nothing else has to change. The only question is how much would this affect the computational efficiency of the engine. As it turns out, not much at all, provided your local volatility surface is not discontinuous and too wild (and you cache and re-use some coefficients between the solvers). 

Since I do not currently have access to market data, my brief tests here will be based on a sample market implied volatility (IV) surface sourced from [1] :​
Market data for USD/JPY from 11 June 2014 (spot = 102.00). Strikes implied assuming delta-neutral ATM type
10 Put 25 Put ATM 25 Call 10 Call rates
Mat vol strike vol strike vol strike vol strike vol strike   dom for
1M 6.08% 99.7800 5.76% 100.8834 5.53% 101.9929 5.63% 103.0908 5.81% 104.1573   -0.04% 0.20%
3M 7.08% 97.4668 6.53% 99.7695 6.15% 101.9921 6.20% 104.1550 6.44% 106.3104   0.07% 0.29%
6M 8.19% 94.7459 7.42% 98.4743 6.95% 102.0008 7.00% 105.4620 7.36% 109.0553   0.16% 0.40%
1Y 9.90% 90.0345 8.61% 96.3376 7.95% 102.0061 8.04% 107.6629 8.69% 114.0638   0.21% 0.52%

Local volatility surface construction

​Without going into too much detail, the local volatility (LV) surface construction here is based on direct application of Dupire’s formula (using IV’s, not option values). To calculate the derivatives involved I calculated a fine grid of IV’s via interpolation of the market IV surface. For that I used Gaussian kernel interpolation in the delta dimension (later turned to strike scale). Such an approach by construction guarantees no arbitrage on single smiles and has an inherent extrapolation method (as suggested in [2]). In the time dimension I used cubic spline interpolation. For my sample IV surface this is just fine, but generally it could lead to calendar arbitrage and thus something more sophisticated would be used in practice to make sure of that. Either way, the LV surface construction is not the focus of this post, so I am assuming there is a good one available. Obviously the smoother the resulting surface, the smaller the negative effect on the finite difference scheme discretization error.

For the purpose of the present test I used 300 x 400 points in the strike and time dimension respectively (shown below). Most of the points in the strike dimension are placed around the spot. The above LV calibration procedure takes about 20 milliseconds. This surface is then interpolated (bi-)linearly by each solver to find the required local volatility function values on the pricing grid points. As for the quality of the calibration, using the same LV-enabled 1-D FD solvers making up the TARF pricing engine to price the market vanillas, produces an almost perfect fit in this case (IV RMSE about 0.001%). 
Picture

Test results

So without further ado, let's look at the performance of the engine using local volatility. I will only use one test case for now (I might add more at a later edit), which my limited testing suggests is representative of the average performance. As can be seen, the average discretization error here is very low at about 0.16bp per unit notional (the maximum is 0.63bp) and the average CPU time is 60 milliseconds (timings exclude the LV calibration which as mentioned above takes about 20 milliseconds). It is obvious that the introduction of local volatility does not materially affect the efficiency of the PDE engine as showcased in the previous post for flat volatility (please read there for more details).

I will follow this up by introducing some sort of stochastic volatility as well, still aiming to keep the valuation time in the milliseconds (but we'll see about that).
Table 1. Performance of optimized PDE pricing engine for a Bullish TARF under the local volatility model (LV surface as shown above). Time to maturity = 1Y, number of fixings remaining = 52, target  level = 15, leverage factor = 2, KO type = Full pay, spot = 102, domestic rate = 0.21%, foreign rate = 0.52%. The individual PDE solvers' space-time grid resolution is NS x NT = 200 x 350. The benchmark values were obtained with very high grid resolution and extrapolation and should be fully converged for the digits displayed. ​
strike TARF Value benchmark abs(error) CPU (sec)
94 17.772979 17.776078 3.1E-03 0.063
95 18.843706 18.843834 1.3E-04 0.069
96 17.691132 17.693761 2.6E-03 0.066
97 14.168072 14.168990 9.2E-04 0.057
98 5.400859 5.407167 6.3E-03 0.06
99 -13.002030 -13.003726 1.7E-03 0.06
100 -46.862257 -46.860844 1.4E-03 0.06
101 -99.130065 -99.131584 1.5E-03 0.06
102 -163.481890 -163.485062 3.2E-03 0.06
103 -235.729239 -235.728137 1.1E-03 0.06
104 -314.541043 -314.540662 3.8E-04 0.063
105 -398.791243 -398.791119 1.2E-04 0.061
106 -487.453001 -487.451061 1.9E-03 0.06
107 -579.580102 -579.579445 6.6E-04 0.06
108 -674.364545 -674.363590 9.5E-04 0.051
109 -771.158197 -771.157111 1.1E-03 0.049
110 -869.474221 -869.472986 1.2E-03 0.049

References

[1] N. Langrené, G. Lee and Z. Zili, “Switching to non-affine stochastic volatility: A closed-form expansion for the Inverse Gamma model,” arXiv:1507.02847v2 [q-fin.CP], 2016.
[2] Hakala, J. and U. Wystup, Foreign Exchange Risk, Risk Books, 2002
0 Comments

How fast and accurate is your FX TARF pricing engine?

12/16/2022

0 Comments

 
Wanting to take my mind off the constant bad news on this planet, I've been looking for little challenges. Pleased with my recent 5K runs dropping to around 24min (I know real runners, it's poor), I thought I should match it with some sort of equivalent mental effort. And what better way to test your brain power, stamina and discipline, than getting an answer to a real-world problem that has a PDE representation and have that PDE solved numerically (wait what?!). Anyway, spurred on by a recent interview question that made me realize I had focused too much on my machine learning adventures and pushed previous knowledge too far back in memory, I decided to price a Target Redemption Forward (TARF) as efficiently as possible. This was also a weak link in at least one of the places I've worked; I am not sure of the exact model or numerical method used (though I do know it was not a full LSV model), but they were sure taking a long time to price, in cases more than 30 seconds.

The easy way to price these is of course via Monte Carlo simulation; it is simple to implement and straightforward to accommodate any added contract bells and whistles. But MC is slow and it's tough to get good Greeks out of it. There is of course another way. 

Pricing TARF's the PDE way

The TARF contract consists of a series of periodic payments (e.g. weekly or monthly) up to some maturity, but the contract can end prematurely when/if on a particular fixing date the specified target is reached. A bullish TARF for instance defines a strike K (usually set below the spot S at the time of issue) and a leverage factor g. At every fixing date the holder makes a profit of (S - K) if S is above K and a loss of g (K - S) is S fixes below K (per unit of notional); profits accrue towards the target (but losses do not). 

So the structure is knocked out when/if the accrued amount A reaches the target. This is a strong path dependency feature; but unlike accumulators where the knock out depends on the path of the spot itself (
which is already part of the numerical solution), here we need to somehow involve A in the solution. We do that by adding A as an extra pseudo-dimension, discretized by a number of individual 1-D PDE solvers (for Black Scholes, local volatility (LV) model, or local regime switching). Each of these solvers corresponds to an A-level within the range of possible values (from zero to the target).

Solving backwards from maturity, we march the option value in time separately for each solver until we reach a fixing/settlement date (for simplicity here I assume that fixing and settlement dates coincide). There we have a jump condition that must be applied 
in order to continue solving. More specifically, for each S-grid point on each solver we determine the payment to be made based on the S value and the solver's A level. Adding positive payments to the solver's A, gives us the implied A level that corresponds to each S grid point of the solver just after (in normal time direction) the payment has been made. Given the implied A level we can find the option value there by interpolating from the solutions of the solvers whose A levels are nearest (which we have all marched up to that time point). We do that with say cubic interpolation. The (initial, or rather final) values with which we start solving again towards the next fixing date are then found by adding the payout to the interpolated continuation value.

That is the outline (by the way, this "auxiliary state variable" method can be seen in action when pricing 
Asian options within the old option pricer in this site). I will stick with the basics here (no volatility model) in order to show that even in this simple setting the implementation leaves enough room for one to come up with a great, or not so great pricing engine.

So each individual solver is solving the Black Scholes pricing PDE using the Crank Nicolson method with Rannacher time-stepping on a non-uniform S-grid that places more points near the strike. This standard building block is well-optimized and one can use it as per above to put together a basic implementation of the TARF pricing engine. I did this and the results while still better than MC, left a lot to be desired in my view. Let's just say that a basic implementation results in a solution that does not behave optimally (and a lot worse than it does for Asian options by the way). So I tried to find ways to improve things. And here I will just give an indication of how much one can improve. The details that make this possible will not be the subject of this short post, but if one really wants to know you can contact me (using the contact form or the email in this paper).

Test cases

Here I am showing a couple of the many tests I did that showcase the typical performance differential between an optimized and a "naive", basic implementation. The individual solvers used were identical (S-grid, boundary conditions, etc) in both engine implementations. I was aiming for an accuracy of about $ 10^{-4}  $ and the time discretization error was smaller than that when NT = 500 time steps were used. I played a bit with the S resolution (NS) to this end. The spacing between the the A (S-t) planes was roughly the same as the average S-grid spacing in both implementations (though uniform for the basic and non-uniform for the optimized). Note that there are generally three ways of handling the settlement of the fixing that results in the target being reached, i.e. three knockout types. a) No pay: payment is not made, b) Capped pay: there is partial payment so as to "fill" the target exactly, c) Full pay: payment is made in full. All three are of course trivially accommodated by the method.

The plots below show the spatial discretization error for the part of the solution that is of interest (the actual solver S-grid extends further) and assuming the accrued amount today is zero, i.e. A = 0. This may mean we are pricing at the time of issue, or at some time during the lifetime of the contract (in which case target stands for "remaining target"). The error profile is obtained by comparing the solution to that from an optimized engine using NS = 6000 and NT = 500 (this has really, really low spatial discretization error, < $10^{-8}$ to be precise). One CPU (Ryzen 5900X) core was used for the timings shown, though if need be the method is easily parallelized. Code is written in C++ and compiled with gcc on Linux. Interest rates are zero. 

​
First we look at a one year contract with weekly fixings/payments and volatility typical of many FX rates. The knockout type is no pay and the leverage factor is two. The optimized version achieves the desired accuracy in 0.09 seconds, while the basic version needs a lot higher spatial resolution and 5 seconds to come close to that.
Picture
And now a two year contract and high underlying volatility (more common to precious metals). The knockout type in the case is full pay and the leverage factor is one. In this setting the optimized engine's relative advantage is more pronounced, achieving the desired level of accuracy in 0.06 seconds while the basic engine really struggles here and fails to come close even with a spatial resolution ten times higher. The keen observer will notice the basic problem here immediately. 
Picture
And the same in log scale:
Picture
In case it is not clear to all readers, the plots above show that with a single split-second PDE solution we get very low error valuations for the whole range of spot values. This is of course under Black Scholes, but the same level of efficiency should carry over with a volatility model. Thus if the model and/or volatility dynamic assumed allows it, a whole spot ladder calculation should be possible in less than a second. 

By the way, if the aim is a single spot valuation, it is often (but not always) preferable to cluster the grid points around the current spot (typical for example when one solves the PDE in log space). In this case one can get  higher efficiency still, i.e. same discretization error in even less CPU time. 

Some benchmark valuations

While I am at it, why not post some highly accurate benchmark valuations ("benchmark" being a theme in this site). It may be difficult to believe (and in a way pointless, same way as making watches waterproof to depths no human can survive is pointless), but all digits provided should be correct. Perhaps if someone wants to perform basic sanity checks on their TARF pricing engine these can be of some help. So to be clear, assuming fixing and settlement dates coincide and are all dt = 1/52 yrs apart (say forcing an ACT/364 day count with no holidays affecting the weekly fixings dates), flattening the volatility surface and zeroing rates curves, should have you approaching the values below as you increase your pricing engine's resolution (or Monte Carlo iterations). 
Table 1. Benchmark valuations for a bullish Target Redemption Forward (TARF) under Black Scholes. In all cases the strike K = 1 and both domestic and foreign rates are zero.  The values were obtained with a highly optimized PDE-based pricing engine and all digits shown should be converged, except the last one which may be off by 1 or 2 points. 
Case T (yrs) num. fixings target lev. factor KO type spot BS vol TARF value
1 1 52 0.3 2 No Pay 1.06 10% -0.02096094
2 1 52 0.5 2 No Pay 1.06 10% -0.017514630
3 1 52 0.7 2 No Pay 1.06 10% 0.035902523
4 1 52 0.3 2 No Pay 1.1 10% 0.235120763
5 1 52 0.3 2 Capped pay 1.1 10% 0.286424762
6 1 52 0.3 2 Full pay 1.1 10% 0.335294058
7 1 52 0.3 1 No Pay 1.03 10% -0.29368756
8 1 52 0.3 1 No pay 1.1 20% -0.01224772
9 1 52 0.7 1 No pay 1.1 20% 0.037345388
10 1 52 0.7 1 Full pay 1.1 20% 0.152792822
11 2 104 0.3 1 No pay 1.35 40% 0.059232579
12 2 104 0.5 1 No pay 1.35 40% 0.315294774
13 2 104 0.5 1 Full pay 1.1 40% -3.81704655
I will try to follow this up with an update using some volatility model.
0 Comments

Neural Networks in quant finance for high accuracy approximations      Second case study: Structured products pricing  - Multi Barrier Reverse Convertible Autocallable (MBRCA) on worst-performing of 4 assets

1/28/2022

0 Comments

 
As promised in the previous post, I am going to present here some first results on using Deep Neural Networks (DNN's) for producing ultra fast (and accurate) approximations of complex structured product pricing models. That means: take the model currently used (whatever it may be) that is computationally expensive (Monte Carlo) and replace it with a lightning fast equivalent with the same (or maybe higher?) accuracy as that used in the production environment. As anticipated, this proved a more challenging and time-consuming undertaking than the stochastic volatility model calibration of the previous post. The end result though is still very satisfactory and has room for improvement given more resources. A surprise to me given the much higher dimensionality and discontinuous (in time) nature of the approximated (model pricing) function. 

Since I do not currently have access to a proper production model (that could for example employ some Local Stochastic Volatility model for the underlying processes), I will use simple GBM. The number of free input parameters (dimensionality of the approximated function) is 28 (see below), which combined with the pricing discontinuities at the autocall dates still make for a very challenging problem. My experience so far (on training DNN's for simpler exotics) is that replacing GBM with a volatility model does not pose unsurmountable difficulties for the DNN. I am confident that similar (or higher) accuracy to the one showcased here can be achieved for the same MRBCA, at the expense of more (offline) effort in generating data and training the DNN. 

The product class

The product class we are looking at (autocallable multi barrier reverse convertibles) is popular in the Swiss market. It is a structured product paying a guaranteed coupon throughout its lifetime with a complex barrier option embedded. The latter is contingent on the worst performing out of a basket of underlying assets. There is a down-and-in type continuously monitored barrier and an up-and-out discretely monitored one (at the autocall dates). The assets' performance (level) is measured as a percentage of their initial fixing values. Accordngly, the strike level K for the down-and-in payoff, the down-and-in barrier B and the early redemption (autocall) level A (the "up-and-out barrier") are quoted as percentages.  

In short: The product pays the holder a guaranteed coupon throughout its lifetime (up to maturity or early redemption). If on any of the observation (autocall) dates
the worst-performing asset level is above the early redemption level, the product expires immediately and the amount redeemed is 100% of the nominal value.  If no early redemption event happens then at maturity :
  1. If during the lifetime of the product the worst-performing asset level did not at any moment touch or cross the barrier level B, the amount redeemed is 100% of the nominal value.
  2. If the worst-performing asset level did touch or cross the barrier level B at some point and its final fixing level is above the strike level K, the amount redeemed is again 100% of the nominal value.
  3. If the worst-performing asset did touch or cross the barrier level B at some point and its final fixing level is below the strike level K, the amount redeemed is the percentage of the nominal equal to the worst-performing asset performance (ratio of its final to initial fixing level).

The specific product to be approximated

I am not going to attempt an all-encompassing DNN representation of any possible MRBCA structure, but rather focus the effort on particular subcategory. So what I am looking for is a pricing approximation for MRBCA's on the worst of 4 assets, with original maturity of 2Y and semi-annual coupon payment and autocall dates. Also assuming the product is struck at-the-money (i.e. the strike level K is 100%, the most usual case in practice) with an early redemption (autocall) level of also 100%, again typical in the market. The latter two could of course also be included as variable inputs in the approximation. This may well be possible while maintaining the same accuracy but I haven't tried it yet.

So the DNN approximation will be for the clean price of any such product (given the inputs described next) at any time after its inception, up to its maturity. Indeed in what follows, T denotes the time left to maturity.

28 model inputs - features for the DNN training

  • The asset level S (% of initial fixing),  volatility vol and dividend yield d for each of the 4 underlying GBM processes.
  • Seven-point discount factor curve (1D, 1W, 1M, 3M, 6M, 1Y, 2Y).
  • Time left to maturity T (in years).
  • Barrier level B (% of initial fixings).
  • Coupon level  Cpn (% p.a.).
  • Correlation matrix (six distinct entries).

​The DNN is trained for wide ranges of its inputs to allow it to be used for a long time without the need for retraining. The approximation is only guaranteed to be good within the input ranges that it has been trained for. Those are shown below.
Operational parameter ranges
    Min Max
  Si 20% 500%
  voli 10% 40%
  di 0 10%
  T 0.001 2
  r (disc. rates) -2% 2.50%
  B 40% 80%
  Cpn 2% p.a. 20% p.a.
  ρ -55% 99%

Pricing model implementation

​The original pricing model / function we aim to "mimic" is of course based on Monte Carlo simulation and was written in C++. I omitted things like date conventions and calendars for ease of implementation. The continuously monitored (American) down-and-in barrier feature is taken care of via the use of a probabilistic correction (Brownian Bridge). Given the assumed GBM processes, this not only perfectly eliminates any simulation bias, but also enables the use of large time-steps thus allowing for significant speedup in the generation of the training samples. The discount curve interpolation is based on cubic splines and auxiliary points. The simulation can be driven by either pseudo random numbers or quasi random sequences (Sobol). I chose the former for the generation of the training samples as it proved to be more beneficial for the learning ability of the DNN under my current setup.

Note that in contrast with the use case in the previous post, here the training output data (the MC prices) are noisy and of limited accuracy. Does this represent a big problem for the DNN's ability to learn from them? It turns out that the answer is not really.

DNN Training and validation

​The DNN is trained by feeding it millions of [vector(inputs), price] pairs. This process in practice has to be repeated many times since there is no general formula for what works best in each case. The pricing accuracy of the training set samples does not have to be as high as the target accuracy. As it turns out the DNN has the ability to effectively smooth out the pricing data noise and come up with an accuracy that is higher than that on the individual prices it was trained on. Also the input space coverage does not have to be uniform as we may want for example to place more points where the solution changes rapidly in an effort to end up with a balanced error distribution.

When it comes to testing the resulting DNN approximation though we create a separate (out of sample) test set of highly accurate prices uniformly filling the input space. This is to say we don't weigh some areas of the solution (say near the barrier) more than others when we calculate the error metrics. We say this is the operational (inputs) range of the DNN and we provide (or at least aim to) similar accuracy everywhere within that range. So the test set is created by drawing random inputs from uniform distributions within their respective ranges. The one exception being the correlation matrices whose coefficients follow the distribution below. We then discard those matrices that include coefficients outside our target range of (-55% to 99%).
Picture

The overall accuracy achieved by the DNN is measured by the usual Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics. We can also look at the error distribution to get an idea of how good the approximation is. What we cannot easily do is say what lies far in the tails of that distribution, or in other words provide some sort of limit for the maximum possible error. In contrast to the traditional MC model, there is no theoretical confidence interval for the DNN error.

The MAE and RMSE are calculated against a reference test set of 65K MC prices, each generated using 32 million Sobol-driven paths (with Brownian Bridge construction). Such prices are found (when re-calculating a subset using 268 million Sobol paths) to have an accuracy of 4.e-6, which is well below the target accuracy (about 1.e-4, or 1 cent in a nominal of 100$). The inputs were generated again using (
22-dimensional, correlations excluded) Sobol points, in an effort to best represent the space. The average model price for this test set is 0.874 (87.4%).

In order to try and get an idea for the worst-case errors I tested the DNN against a much bigger (but less accurate) test set of 16.7 million Sobol points. 
​

DNN approximation performance

For the results presented here the DNN was trained on 80 million [vector(inputs), price] samples carefully chosen so as to ensure that error levels are as uniform as possible across the input space. At this training set size the convergence rate (error decrease with each doubling of training set size) was showing some signs of slowing down, but there was still room for improvement. Using a few hundred million samples would still be straightforward and would yield even better accuracy.

Still the overall quality of the approximation is excellent. The mean error is less than a cent and generally does not exceed 3 cents. The speed is as expected many orders of magnitude higher than an MC simulation with similar standard error (see below). The timings are for a single CPU core. Of course if GPU's are used instead the speed can still be improved significantly.​
Deep Neural Network Pricing Performance
  MAE 6×10-5  
  RMSE 9×10-5
Maximum absolute error in 16.7M test samples 1.5×10-2
  CPU time per price (1 core) 6×10-6 secs
Picture

in order to get similar accuracy from the traditional MC model one needs about 400K antithetic paths. With the present implementation this takes about 0.35 secs on 1 CPU core, which is about 60000 times slower than the DNN. If the MC pricing employed some volatility model needing fine time steps, the speedup factor could easily be in the order of millions (the DNN speed would remain the same).​
Picture
The MC model prices a lot of the samples with almost zero error. This is because for many of the random input parameter vectors the solution is basically deterministic and the product behaves either like a bond or is certain to be redeemed early.

By far the most challenging dimension in the 28-dimentional function we are approximating here is the time to expiry T. The (clean) product price can be discontinuous at the autocall dates, posing a torture test for any numerical method. This is illustrated below where I am plotting a few sample solutions across T (keeping all other input parameters constant). These "pathological" cases correspond to the random input parameter vectors that resulted in the worst DNN approximation errors among the 16.7 million reference set cases (top 5 worst errors). The MC price plots are based on 40000 valuation points using 132K Sobol-driven paths per valuation. It took about 10 mins
to create each plot utilizing all 12 cores of a CPU . The corresponding 40000 DNN approximations took < 0.2sec on a single core. 
Picture
Picture
Picture

Looking at these plots it comes as no great surprise that the DNN struggles here. Considering the vast variety of shapes the solution can take, it is nonetheless seriously impressive that the DNN can cope as well as it does overall. That said, the maximum errors above are about 1.5% (not quite visible, located within those ultra narrow dips a few hours from the auto-call dates), which is more than I would have been happy with. Still, for use in XVA type calculations and intraday portfolio valuation monitoring, the performance is more than adequate as is. For use in a production environment one would need to be even more stringent with ensuring the maximum errors do not exceed a certain threshold. When testing the DNN against the much smaller 65K reference set, the maximum error was an order of magnitude smaller (about 0.2%, or 20 cents). Looking at 100M cases may reveal an even worse case than the 1.5% error found in the 16.7M set. Nonetheless there are ways to identify and target the problematic areas of the input parameter space. I am thus confident the maximum errors can be brought down further together with the mean error metrics by increasing and further refining the synthetic training set.

In conclusion, we can say that the DNN has passed this second much more difficult test as well. There was never a doubt that the approximation accuracy increases with increasing training data. The question in my mind was rather "is the sufficient amount of training (for the DNN to produce a worthy replacement of the traditional MC and PDE-based pricing) practical in terms of time and cost"? Given the experience gathered so far I would say the answer is yes. The present results were achieved mainly on a top spec desktop with only limited use of cloud resources. Approximating fully fledged models incorporating local and/or stochastic volatility will require more computational power, but the offline effort would still correspond to reasonable time and cost. To this end, a third post in this series would look at the case of FX TARF pricing under an LV or LSV model.

P.S. The results summarily presented in these last two posts are the culmination of a lot of work and experimentation that took the better part of a year and thousands of CPU/GPU hours. ​
0 Comments

Neural Networks in quant finance for high accuracy approximations      First case study: Stochastic volatility model calibration (Heston)

11/5/2021

0 Comments

 

The last few years have seen increased interest in Machine Learning – Neural Networks (NN’s) in finance. Here I will focus specifically on the application of NN’s to function approximation, so basically derivatives pricing and consequently risk calculations. The goal is to either “turbo-boost” models/calculations that are already in production systems, or enable the use of alternative/better models that were previously not practical due to the high computational cost. I’ve been hearing claims of millions of times faster calculations and that the Universal Approximation Theorem guarantees that all functions can be approximated accurately by the simplest kind of net, provided enough training samples. But as usual the devil is in the details; like exactly how many training samples are we talking about to achieve a typical production system level of accuracy? I was wondering, what if to guarantee acceptable accuracy one would need impractically large training sets (and hence data generation and training times)? And never mind millions of times faster (perhaps more realistic for quantum computers when they arrive?), I would be satisfied with 100 or even 10 times faster, provided it was easy enough to implement and deploy.

There have been waves of renewed interest in NN's for decades, but their recent resurgence is in large part due to them been made more accessible via Python packages like TensorFlow and PyTorch that hide away the nitty gritty that would otherwise dishearten most of the recent users.
So given the low barrier to entry and having been spurred on by a couple of people, I decided to check out this seemingly all-conquering method. Moreover, this seems like an engineering exercise; one needs to be willing to try a lot of different  combinations of “hyperparameters”, use suitable training sample generation techniques and bring in experience/tricks from traditional methods, all with the aim to improve the end result. Not to mention "free" time and relatively cheap, sustainable electricity. That’s me then this year I thought. 

I am not sure what is the state of adoption of such NN applications in finance right now. Looking at relevant papers, they all seem to be fairly recent and more the proof-of-concept type. What puzzles me in particular is the narrow input parameter range used to train the NN's on. Surely one would need more coverage than that in practice? Consequently there is talk that such (NN) models would need to be retrained from time to time when market parameters get out of  the “pre-trained” range. Now I may be missing something here. First, I think it would be less confusing to call them "NN approximations" instead of "NN models" since they simply seek to reproduce the output of existing models. The application of NN’s in this context really produces a kind of look-up table to interpolate from. As for the limited range of parameters that is usually chosen for training in the relevant literature, I imagine it has to do with the fact that the resulting approximation's accuracy is of course higher like that. But this does not give me any assurance that the method would be suitable for real-world use, where one would need quite a bit higher accuracy still than what I have seen so far and also wider input ranges.

So I decided I want hard answers: can I approximate a model (function) in the whole (well almost) of its practically useful parameter space so that there’s no need to retrain it, like never (unless the original is changed of course)? To this aim I chose a volatility model calibration exercise as my first case study. Which is convenient because I had worked on this before. Note that a benefit of the NN approach (if it turns out they do the job well) is that one could make the calibration of any model super-fast, and thus enable such models as alternatives to the likes of Heston and SABR. The latter are popular exactly because they have fast analytical solutions or approximations that make calibration to market vanillas possible in practical timescales.
​
To demonstrate said benefit, the logical thing to do here would be to come up with a turbo-charged version of the non-affine model demo calibrator. The PDE method used in there to solve for vanilla prices under those models is highly optimized and could be used to generate the required millions of training samples for the NN in a reasonable amount of time. To keep things familiar though and save some time, I will just try this on the Heston model whose training samples can be generated a bit faster still (I used QLib’s implementation for this). If someone is interested in a high accuracy, production-quality robust calibration routine of any other volatility model, feel free to get in touch to discuss. 

Like I said above, the trained parameter range is much wider than anything I’ve seen published so far but kind of arbitrary. I chose it to cover the wide moneyness (S/K) range represented by the 246 SPX option chain A from [1]. Time to maturity ranges from a few days up to 3 years. The model parameters should cover most markets. 
The network has about 750,000 trainable parameters, which may be too many, but one does not know beforehand what accuracy can be achieved with what architecture. It is possible that a smaller (and thus even faster) network can give acceptable results. 32 million optimally placed training samples were used. I favored accuracy over speed here, if anything to see just how accurate the NN approximation can practically get. But also because higher accuracy minimizes the chance of the  optimizer converging to different locales (apparent local minima) depending on the starting parameter vector (see [3] for more on this).  

Overall this is a vanilla calibration set-up, where a standard optimizer (
Levenberg-Marquardt) is used to minimize the root mean square error between the market and model IV's, with the latter provided by the (Deep) NN approximation. There are other more specialized approaches involving NN's designed specifically for the calibration exercise, see for example the nice paper by Horvath et al. [4]. But I tried to keep things simple here. So how does it perform then? 
​​​
Neural Network Approximation specification
Operational parameter ranges
    min max
  S/K 0.5 5
  T 0.015 3
  r -0.02 0.1
  d 0 0.1
  v0 0.002 0.5
  v̅ 0.005 0.25
  κ 1 20
  ξ 0.1 10
  ρ -0.95 0.1
Performance
  Mean Absolute IV Error 9.3×10-6

The mean absolute error is about 1.e−3 implied volatility percentage points (i.e.  0.1 implied volatility basis points).
Picture
The maximum error over a test set of 2 million (out of sample) random points was 5 IV basis points but that is in an area of the parameter hypercube of no interest in practice. For example on the moneyness - expiration plane, the error distribution looks like this:
Picture

​Obviously for individual use cases the NN specification would be customized for a particular market, thus yielding even better accuracy and/or higher speed. Either way I think this is a pretty satisfactory result and it can be improved upon further if one allows for more resources.

​In terms of calibration then, how does that IV accuracy translate to predicted model parameter accuracy? I will use real-world market data that I had used before to test my PDE-based calibrator; the two SPX option chains from [1] and the DAX chain from [2].  As can be seen in Table 1, the calibration is very accurate and takes a fraction of a second. In practice one would use the last result as the starting point which should help the optimizer converge faster still.

For those who need hard evidence that this actually works as advertised, there's the self-contained console app demo below to download. The options data files for the 3 test chains are included. The calibrator always starts from the same parameter vector (see Table 1) and uses only the CPU to keep things simple and facilitate comparisons with traditional methods.
 And I have also included a smaller, only slightly less accurate on average NN approximation that is more than double as fast still.​​
Neural Network based Heston model calibrator demo v1.1 (Windows)
Table 1. Model parameters calibrated to real market data: comparison between NN-based and exact (using Qlib at maximum accuracy settings) calibration . The starting parameter vector is (0.1, 0.1, 5, 2, -0.7).  ​
SPX Chain A from [1] (246 options)   SPX Chain B from [1] (68 options)   DAX Chain from [2] (102 options)
  Exact NN error   Exact NN error   Exact NN error
v0 0.007316 0.007315 (0.01%) 0.04575 0.04576 (0.02%) 0.1964 0.1964 (0.00%)
v̅ 0.03608 0.03608 (0.00%) 0.06862 0.06862 (0.00%) 0.07441 0.07440 (0.01%)
k 6.794 6.794 (0.00%) 4.906 4.903 (0.06%) 15.78 15.80 (0.13%)
x 2.044 2.044 (0.00%) 1.526 1.525 (0.07%) 3.354 3.356 (0.06%)
r -0.7184 -0.7184 (0.00%)   -0.7128 -0.7129 -(0.01%)   -0.5118 -0.5118 (0.00%)
IV RMSE (b.p.) 128.24 128.23 0.01   101.35 101.37 0.02   131.72 131.72 0.00
CPU time 0.55 s     0.16 s     0.26 s

​So to conclude, it is fair to say that the NN (or should I say Machine Learning) approach passed the first real test I threw at it with relative ease. Yes, one needs to invest time to get a feeling of what works and come up with ways to optimize since this is basically an engineering problem. But the results show that at least in this case one can get all the accuracy practically needed in a reasonable amount of (offline) time and resources.

Finally let's briefly mention here the two main perceived issues with this approach: How
 does one guarantee accuracy everywhere in the input parameter hyperspace (i.e. looking at the tails of the error distribution)? I agree this is an issue but there are ways to increase confidence, especially in relatively simple cases like the one here. The other is lack of transparency and/or interpretability. 

COMING UP NEXT: The follow-up post will feature a much more ambitious test: accurate pricing approximation of Barrier Reverse Convertible Autocallables on 4 underlyings. The accuracy demands will be somewhat lower (reflecting the original Monte Carlo pricing), but the training data will be noisy and the dimensionality much higher. I anticipate significant difficulties in guaranteeing an acceptable level of accuracy everywhere given the rapidly changing price function near the barrier for small times to expiry and especially the true temporal discontinuities on the auto-call dates.  Can the NN deliver? Stay tuned to find out!

References

​[1] Y. Papadopoulos, A. Lewis (2018), “A First Option Calibration of the GARCH Diffusion Model by a PDE Method.”, arXiv:1801.06141v1 [q-fin.CP].
[2] Kangro, R., Parna, K., and Sepp, A., (2004), “Pricing European Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform,” Acta et Commentationes Universitatis Tartuensis de Mathematica 8, p. 123-133.
​[3] Cui, Y., del Bano Rollin, S., Germano, G. (2017), "Full and fast calibration of the Heston stochastic volatility model", European Journal of Operational Research, 263(2), p. 625–638
​[4] Horvath, B., Muguruza, A., Tomas, M. (2019). "Deep learning volatility.", arXiv:1901.09647v2 [q-fin.MF]​
0 Comments

Fast option calibration: more non-affine stochastic volatility models.

2/27/2018

0 Comments

 
When it comes to stochastic volatility (SV) models (whether in their pure form, or typically in combination with local volatility and/or jumps), their adoption in practice has been largely driven by whether they offer an "analytical" solution for vanilla options. The popularity of the Heston model probably best exemplifies this. When such a solution doesn't exist, calibration (to the vanilla option market) is considered more difficult and time consuming. But what are the chances that a model that happens to have a (semi-) analytical solution is also the best for the job? It could happen of course, but obviously this isn't the right criterion when choosing the model. Another model may well fit the market's implied volatility surfaces better and describe its dynamics more realistically.

In my last couple of posts I spoke about how a fairly simple PDE / finite differences approach can actually enable fast and robust option calibrations of non-affine SV models. I also posted a little console app that demonstrates the approach for the GARCH diffusion model. I have since played around with that app a little more, so here I'm giving a second version that can calibrate the following 5 non-affine SV models (plus Heston for comparison).​
Picture
For the GARCH diffusion or power-law model (and the PDE pricing engine used for all the models implemented in this demo) see [1].  For the Inverse Gamma (aka "Bloomberg") model see for example [2]. ​The XGBM model was suggested to me by Alan Lewis, who has been working on an exact solution for it (to be published soon). Here pricing is done via the PDE engine. The ACE-X1 model is one of the models I've tried calibrating, which seems to be doing a very good job in various market conditions. All the above models calibrate to a variance (or volatility) process that is arguably more realistic than that of the Heston model (which when calibrated, very often has zero as its most probable value in the long-run). 
Non-affine SV Option Calibrator demo (Windows)
Please bear in mind that the above demo is just that, i.e. not production-level. In [1] the PDE engine was developed just for the GARCH diffusion model and Excel was used for the optimization. I then quickly coupled the code with a Levenberg-Marquardt implementation and adjusted the PDE engine for the other models without much testing (or specific optimizations). That said,  it works pretty well in general, with calibration times ranging from a few seconds up to a minute. It offers three speed / accuracy settings, but even with the fastest setting calibrations should be more accurate (and much faster) than any Monte-Carlo based implementation. A production version for a chosen model would be many times faster still. Note that you will also need to download the VC++ Redistributable for VS2013. The 64-bit version (which is a little faster) also requires the installation of Intel's MKL library. The demo is free but please acknowledge if you use it and do share your findings.
​
EDIT April 2020: After downloading and running this on my new Windows 10 laptop I saw that the console was not displaying the inputs as intended (it was empty). To get around this please right click on the top of the console window, then click on Properties and there check "Use legacy console". Then close the console and re-launch.
As a small teaser, here's the performance of these models (in terms of IV-RMSE, recalibrated every day) for SPX options during two months at the height of the 2008 crisis. Please note that I used option data (from www.math.ku.dk/~rolf/Svend/) which include options +- 30% in moneyness only. Expirations range from 1M to 3Y. Including further out-of-the-money options does change the relative performance of the models. As does picking a different market period. That being said, my tests so far show that the considered models produce better IV fits than Heston in most cases, as well as representing arguably more realistic dynamics. Therefore they seem to be better candidates to be combined with local volatility and/or jumps than Heston.
Picture
  Heston GARCH Power_law_0.8 ACE-X1 I-Ga XGBM
average IV-RMSE 0.91% 0.81% 0.82% 0.66% 0.73% 0.45%

References

[1] Y. Papadopoulos, A. Lewis, “A First Option Calibration of the GARCH Diffusion Model by a PDE Method.,” arXiv:1801.06141v1 [q-fin.CP],  2018.
[2] N. Langrené, G. Lee and Z. Zili, “Switching to non-affine stochastic volatility: A closed-form expansion for the Inverse Gamma model,” arXiv:1507.02847v2 [q-fin.CP], 2016.
0 Comments

Option calibration of non-affine stochastic volatility models.                      Case study: The GARCH diffusion model.

1/21/2018

0 Comments

 
My last two posts were really about just a couple of the many experiments I did as part of a project I had been working on. The aim was to design a method for efficient option calibrations of the GARCH diffusion stochastic volatility model. This (and other non-affine) models are usually thought of as impractical for implied (risk-neutral) calibrations because they lack fast semi-analytic (characteristic function-based) vanilla option pricing formulas. But using an efficient PDE-based solver may be an alternative worth looking at. It may prove fast enough to be used for practical calibrations, or at the very least provide insight on the overall performance of (non-affine) models that have seen increased interest in recent years. Besides, analytical tractability definitely shouldn't the main criterion for choosing a model.

For those interested there's now a detailed report of this joint collaboration with Alan Lewis on the arXiv: A First Option Calibration of the GARCH Diffusion Model by a PDE Method. 
Alan's blog can be found here.
GARCH Diffusion Option Calibrator demo (Windows)
EDIT: For the results reported in the paper, Excel's solver was used for the optimization. I've now quickly plugged the PDE engine into M. Lourakis' Levenberg-Marquardt implementation (levmar) and built a basic demo so that perhaps people can try to calibrate the model to their data. It doesn't offer many options, just a fast / accurate switch. The fast option is typically plenty accurate as well. So, if there's some dataset you may have used for calibrating say the Heston model, it would be interesting to see how will the corresponding GARCH diffusion fit compare. Irrespective of the fit, GARCH diffusion is arguably a preferable model, not least because it typically implies more plausible dynamics than Heston. Does this also translate to more stable parameters on recalibrations? Are the fitted (Q-measure) parameters closer to those obtained from the real (P-measure) world? If the answers to the above questions are mostly positive, then coupling the model with local volatility and/or jumps could give a better practical solution than what the industry uses today. (Of course one could just as easily do that with the optimal-p model instead of GARCH diffusion, as demonstrated in my previous post.)  Something for the near future.  
​
​If you do download the calibrator and use it on your data, please do share your findings (or any problems you may encounter), either by leaving a comment below, or by sending me an email. As a bonus, I've also included the option to calibrate another (never previously calibrated) non-affine model, the general power-law model with p = 0.8 (sitting between Heston and GARCH diffusion, see [1]). 
Note that (unless you have Visual Studio 2013 installed) you will also need to download the VC++ Redistributable for VS2013. The 64-bit version (which is a little faster) also requires the installation of Intel's MKL library. 

EDIT April 2020: After downloading and running this on my new Windows 10 laptop I saw that the console was not displaying the inputs as intended (it was empty). To get around this please right click on the top of the console window, then click on Properties and there check "Use legacy console". Then close the console and re-launch.
​
I am also i
ncluding in the download a sample dataset I found in [2] (DAX index IV surface from 2002), so that you can readily test the calibrator. I used it to calibrate both Heston (also calibrated in [2], together with many other affine models) and GARCH diffusion. In contrast to the two datasets we used in the paper, in this case GARCH diffusion (RMSE = 1.14%) "beats" Heston (RMSE = 1.32%). This calibration takes about 5 secs. This is faster than the times we report on the paper and the reason is that the data we considered there include some very far out-of-the-money options that slow things down as they require higher resolution. The Levenberg-Marquardt algo is also typically faster than Excel's solver for this problem. It is also "customizable", in the sense that one can adjust the grid resolution during the calibration based on the changing (converging) parameter vector. Still, this version is missing a further optimization that I haven't implemented yet, that I expect to reduce the time further by a factor of 2-3.
Picture

​The fitted parameters are:
GARCH:  v0 = 0.1724, vBar = 0.0933, kappa = 7.644, xi = 7.096, rho = -0.5224.
Heston:  v0 = 0.1964, vBar = 0.0744, kappa = 15.78, xi = 3.354, rho = -0.5118.
Note that both models capture the short-term smile/skew pretty well (aided by the large fitted xi's, aka vol-of-vols), but then result in a skew that decays (flattens) too fast for the longer expirations.

References

[1] Y. Papadopoulos, A. Lewis, “A First Option Calibration of the GARCH Diffusion Model by a PDE Method.,” arXiv:1801.06141v1 [q-fin.CP],  2018.
[2] Kangro, R., Parna, K., and Sepp, A., (2004), “Pricing European Style Options under Jump Diffusion Processes with Stochastic Volatility: Applications of Fourier Transform,” Acta et Commentationes Universitatis Tartuensis de Mathematica 8, 123-133.
0 Comments

The ROD stochastic volatility 'model'

12/11/2017

0 Comments

 
​It well known that pure diffusion stochastic volatility (SV) models like Heston struggle to reproduce the steep smiles observed in the implied volatilities of short-term index options. Trying to remedy this deficiency, the usual approach is to introduce jumps. More recently and as an alternative to jumps, some researchers have been experimenting with the use of fractional Brownian motion as the driver for the volatility process. Volatility they say is rough. Such a model is apparently able to capture steep short-term smiles. Basically both these approaches assume that pure diffusion models are not up to the task so alternatives are needed.

However, other voices suggest that this isn't necessarily true and that models of the Heston type have simply not been used to their full potential. They say why use a deterministic starting point
$v_0$ for the variance when the process is really hidden and stochastic? Instead, they propose to give such traditional SV models a "hot start", that is assume that the variance today is given by some distribution and not a fixed value. Mechkov [1] shows that when the Heston model is used like that it is indeed capable of "exploding" smiles as expiries tend to zero. Jacquier & Shi [2] present a study of the effect of the assumed initial distribution type. 

The idea seems elegant and it's the kind of "trick" I like, because it's simple to apply to an existing solver so it doesn't hurt trying it out. And it gets particularly straightforward when the option price is found through a PDE solution. Then the solution is automatically returned for the whole range of possible initial variance values (corresponding to the finite difference grid in the v-direction). So all one has to do in order to find the "hot-started" option price at some asset spot
 $ S_0 $ is average the solution across the $ S=S_0 $ grid line using the assumed distribution. Very simple. Of course we have to choose the initial distribution family-type and then we can find the specific parameters of that distribution that best fit our data. How? We can include them in the calibration.

So here I'm going to try this out on a calibration to a chain of SPX options to see what it does. But why limit ourselves to Heston? Because it has a fast semi-analytical solution for vanillas you say. I say I can use a fast and accurate PDE solver instead, like the one I briefly tested in my previous post. Furthermore, is there any reason to believe that a square root diffusion specification for the variance should fit the market, or describe its dynamics better? Maybe linear diffusion could work best, or something in between. The PDE solver allows us to use any variance power p for the diffusion coefficient between 0.5 and 1, so why not let the calibration decide the optimal p, together with the initial distribution parameters and the other model parameters. Let's call this the Randomized Optimal Diffusion (ROD) SV "model".
 
Sounds complicated? It actually worked on the first try. Here's what I got for the end of Q1 2017. Just using the PDE engine and Excel's solver for the optimization. The calibration involved 262 options of 9 different expiries ranging from 1W to 3Y and took a few minutes to complete. If one restricts the calibration to mainly short-term expiries then better results are obtained, but I wanted to see the overall fit when very short and long expiries are fitted simultaneously. I am also showing how the plain Heston model fares. Which on the face of it is not bad, apart for the very short (1W) smile. Visually what it does is try to "wiggle its way" into matching the short-term smiles. The wiggle seems perfectly set up for the 3W smile, but then it turns out excessive for the other expiries. The ROD model on the other hand avoids excessive "wiggling" and still manages to capture the steep smiles of the short expiries pretty well. The optimal model power p here was found to be 0.84. This is responsible for the less wiggly behavior compared to Heston's p = 0.5. The ability to capture the steep short-term smiles is solely due to the hot-start. And it's the hot-start that "allows" for that higher p. To put it another way, by 
making it easy to fit the short-term smiles it then allows for a better fit for all expiries. The overall RMSE for the Heston model is 1.56% while for the ROD model it's 0.86%. 

But the RMSE only tells half the story. The Feller ratio corresponding to Heston's fitted parameters is 0.09, which basically means that the by far most probable (risk-neutral) long-run volatility value is zero. In other words, the assumed volatility distribution is not plausible. The randomization idea is neither without any issues, despite the impressive improvement in fit. The optimizer calibrated to a correlation coefficient of -1 for this experiment, which seems extreme and not quite realistic.
Picture
Picture
Picture
By the way, this experiment is part of some research I've been doing in collaboration with Alan Lewis, the results of which will be available/published soon.

References

​[1] Serguei Mechkov.  'Hot-Start' Initialization of the Heston Model.
[2] Antoine Jacquier, Fangwei Shi. The randomized Heston model.
0 Comments

How efficient is your Heston (or similar) PDE solver?

12/4/2017

0 Comments

 
Last year I wrote an overly long post comparing the performance of an old Heston PDE solver of mine with those available in QuantLib. I was surprised to see that despite not using ADI techniques, it still seemed to produce very accurate solutions in a few seconds, just like QuantLib's ADI solvers. Published papers from the last 10 years or so also indicate that the calculation time in order to get high-accuracy prices from this 2D PDE is a few seconds give or take (on 5-10 yrs old CPUs). I've even seen a paper from last year presenting a "very efficient", state-of-the-art finite element method that gets Heston vanilla prices in 30 (!) seconds.
This year though I've worked a lot more on such solvers and I now realise that those run times are clearly not what one should be expecting, nor aim for with (fine-tuned) implementations. So just how fast should your numerical solver of the Heston (or similar) PDE be? What kind of run time should be expected for high-accuracy solutions? The answer is: a few milliseconds.  Back in that Aug 2016 post I used some test cases from various papers and I'll use those again here, all puts:
​​
  κ η σ ρ rd rf T K νo
Case 0 5 0.16 0.9 0.1 0.1 0 0.25 10 0.0625
Case 1 1.5 0.04 0.3 -0.9 0.025 0 1 100 0.0625
Case 2 3 0.12 0.04 0.6 0.01 0.04 1 100 0.09
Case 3 0.6067 0.0707 0.2928 -0.7571 0.03 0 3 100 0.0625
Case 4 2.5 0.06 0.5 -0.1 0.0507 0.0469 0.25 100 0.0625
Case 5 3 0.04 0.01 -0.7 0.05 0 0.25 100 0.09
My current "state-of-the-art" solver uses second-order spatial discretization and the Hundsdorfer-Verwer ADI scheme. For the results below I used the solver "as is", so no fine-tuning for each case, everything (grid construction) decided automatically by the solver. In order to give an idea of the accuracy achieved overall for each case I plot the solution error in the asset direction (so across the moneyness spectrum). The plots are cut-off where the option value becomes too small, the actual grids used by the solver extend further to the right. I am showing results for two different grid resolutions  (NS x NV x NT), grid A (60 x 40 x 30) and grid B (100 x 60 x 50). The timings were taken on an i7-920 PC from 2009 in single-threaded mode. Obviously one can expect at least double single-thread speed from a modern high-spec machine. (ADI can be parallelized as well with almost no extra effort, with a parallel efficiency of about 80% achieved using basic OpenMP directives). The errors in each case are calculated by comparing with the exact (semi-analytic) values obtained using QuantLib at the highest precision. Note that I am using the relative errors which are harder to bring down as the option value tends to zero. But when one wants to use the PDE solver as the pricing engine for a calibration, then it is important to price far out-of-the-money options with low relative errors in order to get accurate implied volatilities and properly capture any smile behavior. The present solver can indeed be used to calibrate the Heston model (or any other model in this family) accurately in less than a minute and in many cases in just a few seconds.

So, if you have a Heston PDE solver then this is an efficiency reference you can use to benchmark against. Price the 6 options above and compare error plots and timings with those below. Let me know what you get. If you're getting larger errors than I do with the above resolutions, I'll tell you why that is!

There is of course another important quality for a PDE solver, robustness. The scheme I used here (H-V) is fairly robust, but can still produce some spurious oscillations when one uses too low NT/NS. I may expand this post testing other schemes in the near future.
0 Comments

Smoothing discontinuities - Discrete barrier option pricing

1/8/2017

0 Comments

 
As briefly discussed here, the presence of discontinuities negatively affects the accuracy and achieved order of convergence of finite difference (and other discretization) methods. In some cases the effect is so significant that one really needs to apply special treatment. One such case in finance is discretely monitored barrier option pricing. At each monitoring date the solution profile along the asset axis is flattened after the barrier (for knock-out type options), to zero or the rebate amount. Thankfully the treatment needed is very simple and consists in adjusting the value at the grid point that sits nearer to the barrier, to its average from half (grid) spacing to its left to half spacing to its right. We thus need to find the area under the solution using the known values at the nearest grid points. We can simply assume piecewise linear variation between adjacent grid values and so use the trapezoidal rule (I'll call that linear smoothing). Or if we want to be a little more sophisticated we can fit a 2nd order polynomial to the three nearest points and integrate using that (let's call that quadratic smoothing).  

As an example let's try to price a daily monitored up and out put option. I'll use simple Black Scholes to demonstrate, but the qualitative behaviour would be very similar under other models. In order to show the effect clearly I'll start with a uniform grid. The discetization uses central differences and is thus second order in space (asset S) and Crank-Nicolson is used in time. That will sound alarming if you're aware of C-N's inherent inability to damp spurious oscillations caused by discontinuities, but a bit of Rannacher treatment will take care of that (see here). In either case in order to take time discretization out of the picture here, I used 50000 time steps for the results below so there's no time-error (no oscillation issues either) and thus the plotted error is purely due to the S-dicretization. The placement of grid points relative to a discontinuity has a significant effect on the result. Having a grid point falling exactly on the barrier will produce different behaviour as opposed to having the barrier falling mid-way between two grid points. So Figure 1 has the story. The exact value (4.53888216 in case someone's interested) was calculated on a very fine grid. It can be seen that the worst we can do is place a grid point on the barrier and solve with no smoothing. The error using the coarsest grid (which still has 55 points up to the strike and it's close to what we would maybe use in practice) is clearly unacceptable (15.5%). The best we can do without smoothing (averaging), is to make sure the barrier falls in the middle between two grid points. This can be seen to significantly reduce the error, but only once we've sufficiently refined the grid (curve (b)). We then see what can be achieved by placing the barrier on a grid point and smooth by averaging as described above. Curve (c) shows linear smoothing already greatly improves things again compared to the previous effort in curve (b). Finally curve (d) shows that quadratic smoothing can add some extra accuracy yet.
​
Picture
Figure 1. Solution grid convergence for a European, daily monitored up and out put, using uniform grid in S.
S = 98, K = 110, B = 100, T = 0.25, vol = 0.16, r = 0.03, 63 equi-spaced monitoring dates (last one at T)
This case is also one which greatly benefits from the use of a non-uniform grid which concentrates more points near the discontinuity. The slides below show exactly that. Quadratic smoothing with the barrier on a grid point is used. First is the solution error plot for a uniform grid with of dS = 2, which corresponds to the first point from the left of curve (d) in figure 1. The second slide shows the error we get when a non-uniform grid of the same size (and hence same execution cost) is used. The error curve has been pretty much flattened even on this coarsest grid. The error now has gone from 15.5% on a unifom grid with no smoothing, down to 0.01% on a graded grid with smoothing applied, for the same computational effort. Job done. Here's the function I used for it: SmoothOperator.cpp.
0 Comments

Monte Carlo pricing of continuous barrier options with Heston

11/28/2016

0 Comments

 
Pricing discretely monitored barrier options with Monte Carlo comes natural. One just mimics what would really happen, which is to check if the underlying asset crosses the barrier at specific times only. The "simulation" bias is thus zero. In this case pricing can be accurate if our discretization bias is low (preferably using something better than Euler). But things change when the contract specifies continuous monitoring. The simulation cannot mimic the reality of continuous monitoring since we can only use a finite number of time steps for which we can check if the barrier has been crossed. And one would be forgiven to think that using many time steps (such as corresponding to daily monitoring) would be almost equivalent to true continuous monitoring. But this isn't the case. The probability that the asset could have crossed the barrier in between the implemented time steps is still significant and increasing the number of time steps makes the solution converge only slowly towards the continuous limit. Such a naive simulation will therefore always overestimate the value of a continuous barrier option since it ignores the probability that the option knocks out when we are not looking (i.e. in between time steps).

Now there are generally two ways of reducing, or even getting rid of this simulation bias in special cases. The first is to shift the barrier closer to the asset spot in order to compensate for the times when we "blink". The question is by how much should the barrier be shifted? This was first treated by Broadie, Glasserman & Kou [1], for the Black-Scholes model, where they introduced a continuity correction based on the "magical" number 0.5826. Check it out if you haven't. This trick works pretty well when the spot is not very near the barrier and/or when we have many monitoring dates. But in the opposite case, it can be quite bad and produce errors of 5% or more (Gobet, [2]). This gets worse when there is a steep drop of the value near the barrier, as in a down and out put, or an up and out call with low volatility. 

The second way is to use an analytic expression for the probability of hitting the barrier between two realized points in a simulated asset path. This is usually referred to as the probabilistic method, or the Brownian bridge technique, since it uses the probability of Brownian motion hitting a point conditional on two fixed end points. As already implied this probability is known analytically (see [2]) for Brownian motion (and consequently for GBM as well). So for the usual GBM asset model (Black-Scholes), this technique can remove the simulation bias completely. One can just use the probability directly, multiplying each path's payoff with the total path survival probability (based on the survival probability of each path segment S[i], S[i+1]). This is preferable when applicable since it results in lower simulation variance. Alternatively one can sample this probability through a uniform variate draw and use it to decide whether the barrier has been crossed or not. This is the only option when we have to perform regression for American exercise. All this is implemented for the Black-Scholes model in the current version of the pricer which you can download here and check this technique in action (see also here). Below is just a simple example. 
As can be seen the probabilistic method (enabled by checking "continuous monitoring") enables exact Monte Carlo pricing of this continuously monitored up and out call (for which of course the exact analytic solution is available and is shown as well and marked by the blue line). By contrast, the barrier-shifting method misprices this option by quite some margin (3.4%), as can be seen in the second slide where the probabilistic correction is unchecked and instead the barrier has been shifted downwards according to BGK's (not so magic here) formula. ​
​
Same trick but now with Heston
​

Now the analytical (Brownian Bridge) hitting probability is only available for a BM/GMB process. But what if we have a not so simple process? Well, as long as the discretization scheme we use for the asset makes it locally BM/GBM (always the case when we use Euler to discretize S, or log(S)), then we can still use this trick. It won't remove the simulation bias completely, but it may do a good job nonetheless. So I thought I'd try it with Heston and see how it performs. I only have two discretization schemes set up, the Full Truncation Euler and the Quadratic Exponential (with Martingale correction). Both are considered short-stepping schemes, though QE-M can do pretty well with long steps as well, especially for vanillas, not so much for path-dependent options. FTE definitely needs short time steps to give acceptable accuracy. Both schemes though use normal increments to march log(S) in time, conditional on the variance process path realization which comes first. So we can still use our BB probability formulas like we did with the Black-Scholes model. The only catch here is that the BB formulas assume constant variance and we have stochastic variance instead (not to mention correlation between the variance and the asset processes). That's why it will only be an approximation. For each asset path segment S[i], S[i+1] we have the corresponding variances v[i] and v[i+1] which we can use in the BB probability formula. Now since we want to see to what extent the simulation (monitoring) bias can be removed, we'd like to get the discretization bias out of the way so that it doesn't cloud the results. For this reason I used the QE-M scheme with daily time steps. The test cases considered are in table 1, all Up and Out calls  for 5 different Heston parameter sets, chosen from the literature without too much thought, although I did try to set up case 4 to be a kind of torture test and the Feller condition is violated in case 7. Those two cases are the ones with the largest error as can be seen in table 2. The "exact" price for the continuously monitored options was calculated by the PDE/FDM solver presented in the previous post. The price when the monitoring is discrete (daily) is also shown for comparison. A large number of Sobol-driven paths were used so that the (Q)MC prices shown are converged to all digits shown (last digit may be off by one). As can be seen, it seems that the method yields accurate approximations of the correct continuous price indeed, the highest error being 0.35%. Note also that even with daily monitoring the discrete version prices (plain MC, no correction) are quite different from the continuous ones (up to 37%), showcasing the need for the probabilistic (BB) correction when it's a continuous barrier price we're after.
Table 1. Test cases parameter sets for Up and Out call options.
  κ η σ ρ rd rf T K U So vo
Case 1 5 0.16 0.9 0.1 0.1 0 0.25 100 120 100 0.0625
Case 2 5 0.16 0.9 0.1 0.1 0 0.25 100 135 130 0.0625
Case 3 1.5 0.04 0.3 -0.9 0.025 0 0.25 100 115 100 0.0625
Case 4 1.5 0.04 0.3 -0.9 0.025 0 0.25 100 135 130 0.0625
Case 5 3 0.12 0.04 0.6 0.01 0.04 0.25 100 120 100 0.09
Case 6 2.5 0.06 0.5 -0.1 0.0507 0.0469 0.5 100 120 100 0.0625
Case 7 6.21 0.019 0.61 -0.7 0.0319 0 0.5 100 110 100 0.010201
Table 2. Effectiveness of Monte Carlo with the Probabilistic Continuity Correction for the pricing of cont. monitored barrier options under the Heston model.  The MC prices are converged and the time-discretization error is negligible.
  Solver Price Difference
Case 1 PDE/FDM cont. monitoring (exact) 1.8651 -
MC with 63 timesteps & PCC 1.8652 0.01%
Plain MC with 63 timesteps  2.1670 16%
Case 2 PDE/FDM cont. monitoring (exact) 2.5021 -
MC with 63 timesteps & PCC 2.5032 0.04%
Plain MC with 63 timesteps  3.4159 37%
Case 3 PDE/FDM cont. monitoring (exact) 2.1312 -
MC with 63 timesteps & PCC 2.1277 -0.16%
Plain MC with 63 timesteps  2.3369 10%
Case 4 PDE/FDM cont. monitoring (exact) 3.6519 -
MC with 63 timesteps & PCC 3.6394 -0.34%
Plain MC with 63 timesteps  4.6731 28%
Case 5 PDE/FDM cont. monitoring (exact) 1.6247 -
MC with 63 timesteps & PCC 1.6249 0.01%
Plain MC with 63 timesteps  1.8890 16%
Case 6 PDE/FDM cont. monitoring (exact) 1.7444 -
MC with 125 timesteps & PCC 1.7438 -0.03%
Plain MC with 125 timesteps  1.9209 10.1%
Case 7 PDE/FDM cont. monitoring (exact) 1.9856 -
MC with 125 timesteps & PCC 1.9790 -0.33%
Plain MC with 125 timesteps  2.0839 4.95%
​Finally regarding the variance value used in the probability formulas, my brief testing showed that the best results were obtained when using the variance corresponding to the one of the two nodes S[i] and S[i+1] which is closer to the barrier. This was the choice for the results of Table 2.​
Once I set up a more accurate long stepping discretization scheme, I may test to see how well this approximation holds when one uses fewer/longer time steps.


References

[1] Broadie M, Glasserman P, & Kou S.  A continuity correction for discrete barrier options, Mathematical Finance, Vol. 7 (4) (1997), pp. 325-348.
[2] Emmanuel Gobet. Advanced Monte Carlo methods for barrier and related exotic options. Bensoussan A., Zhang Q. et Ciarlet P. Mathematical Modeling and Numerical Methods in Finance, Elsevier, pp.497-528, 2009, Handbook of Numerical Analysis
0 Comments
<<Previous

    Author

    Yiannis Papadopoulos

    Archives

    February 2023
    December 2022
    January 2022
    November 2021
    February 2018
    January 2018
    December 2017
    January 2017
    November 2016
    August 2016
    April 2016
    October 2015

    Categories

    All

    RSS Feed

Home
Driven Cavity Benchmarks
Option Pricing App​
Contact