Applied numerical methods
  • Home
  • The Cavity Sessions
  • THE BENCHMARKS
  • Option Pricer
  • Miscellaneous
  • Contact

Higher-order accuracy: Richardson extrapolation vs native higher-order scheme in 2D

4/15/2016

0 Comments

 

​The other day I noticed this interesting graph lying in a corner on my desktop and I thought I might as well post it here. It was part of the background tests I did while I was calculating the benchmark results for the driven cavity flow problem. In order to have first an idea of how well I could expect my 2D Richardson extrapolation (RE) set-up to work under "perfect" conditions, I thought I would apply it first on a simpler, more benign problem. To this end I solved the 2D Laplace equation for the steady state temperature distribution on a unit square plate, with the top side maintained at a constant unit temperature and the remaining three sides at zero (Dirichlet-type boundary conditions). This problem has an analytic solution in the form of an infinite (Fourier) series, which we can use to calculate the error of different numerical methods/schemes. So I was curious to see to what extent can RE improve on a second-order scheme's accuracy and especially how the results fare against a proper, natively fourth-order scheme. 
I remember reading long time ago that RE will indeed improve the accuracy of a second-order scheme, increase its convergence order to $O({h^4})$ if implemented properly, but will not produce as accurate results as a scheme that is fourth-order accurate ($O({h^4})$ truncation error) by design. I had seen examples of this in 1D, but I hadn't seen results for 2D so I thought I'd do this test. Let's imagine for now that grid points are uniformly spaced, as was indeed the case in this test. The second-order scheme used is the standard 5-point stencil and the fourth-order scheme is the fantastic 9-point compact stencil.
Picture
$O({h^2})$ scheme, 5-point stencil
Picture
$O({h^4})$ scheme, 9-point stencil

​RE implementation
The RE solution is a linear combination of two solutions calculated on two different grids (representing the solution on two different sets of discrete points, or nodes if you prefer). In order to combine the two solutions we need to have them at the same locations, so if there's no common set of points we then need to somehow bring/project one solution onto the grid of the other. So we need to interpolate. Usually people will perform RE by combining the solution on a grid with say N steps in each dimension (h=1/N), with that on a grid with half or double the spacing h. This way there's no need to use interpolation if we're projecting our final RE result on the coarser grid, because the finer grid already has the solution on all the coarser grid's points. If we want the RE results "printed" on the finer grid's resolution instead, then we need to interpolate in order to "guess" what the coarser grid's solution is at those extra points that the finer grid has. In the 2D case this means we would need to produce N$\times$N values based on information from only 25% as many (the 0.5$\times$0.5$\times$N$\times$N values of our coarse grid). 
​

I prefer keeping the two grid resolutions closer to each other. Too close is not good either though because then the two solutions are not "distinct" enough and the extrapolation loses accuracy. My tests here showed that a spacing ratio of 0.85 is about right. In this case when we interpolate the results of the coarser (0.85$\times$0.85$\times$N$\times$N) grid "upwards", we generate our N$\times$N values now using 72% as many values, as opposed to 25% in the half spacing case. We now definitely need to interpolate of course because the two grids' sets of points are going to be distinct. Simple (2D) local polynomial interpolation was used here. The important detail is that the interpolating polynomial accuracy (degree) needs to be the same or higher than that which we expect to obtain from the extrapolation. I used sixth-degree polynomials for the results in the graph below. That worked fine because the combination of the smoothly-varying field and the high grid resolutions meant that any 7 contiguous points used to fit the polynomials were always pretty much on a straight line. When the function we are trying to find a solution for is less benign, using such high-order polynomials will in general make things worse due to Runge's phenomenon. We would then need to cramp many more grid points in order to avoid this problem. In practice, non-uniform grids are often used anyway in order to properly resolve areas of steep gradients by concentrating more points there, regardless of whether we plan to extrapolate or not.  
​
Results
So here's the graph I was talking about. It shows how the average absolute error (across all solution grid points) falls as we make the grid spacing h smaller. Well not exactly all grid points. I've actually excluded the first 5% from the top side of the square domain downwards. The reason is that due to the singularities in the top two corners (the boundary conditions there change discontinuously from unit at the top to zero at the sides), the solution in their vicinity misbehaves. Here we don't care about that so in order to get a clean picture of the orders of convergence under "normal" circumstances, I left those points out of the average error calculation. So on to the results. Expectations were generally confirmed, or even exceeded in the case of the double RE. The latter basically combines two (fourth-order) RE results to extrapolate to a theoretically sixth-order accurate result. With double RE being an extrapolation based on other (single RE) extrapolations (all of which based on interpolations!), I was not sure at all I'd get anywhere near sixth-order. But sure enough, $O({h^{6+}})$ was obtained for this smooth field. As expected, even though the single RE actually achieved slightly better order of convergence than the native fourth-order 9-point scheme in this case, the actual error it produced was significantly higher, about 22 times higher to be exact for the range of h used. The double RE though managed to beat the fourth-order scheme for actual accuracy achieved, producing lower errors when the grid is reasonably fine (for h < 0.01 roughly).
​
Picture
0 Comments

Monte Carlo wins: How can randomly placed points perform better than a regular grid for integral evaluation in higher dimensions?

10/8/2015

0 Comments

 

​I decided to run a short nume
rical experiment the other day after reading, thinking about and trying to offer an intuitive answer to this question: Why does Monte Carlo integration work better than naive numerical integration in high dimensions? The first thing that comes to mind is the "curse of dimensionality": If we want to keep adding points uniformly to a regular grid, then the total number of points increases exponentially with the dimension d and a regular grid is just not a viable option. Say for example that we want to evaluate the integral of a function that depends on the value that some quantity takes on each month of the year. This is a problem of dimension 12. Why? Because each evaluation of the function requires a set of 12 different variables (one for each month). Now if we wanted the same accuracy that we would get in 2-D from say a 10$\times$10 (really coarse!) grid (total grid points $N=100$), we would need a total of $N=10^{12}$, i.e. a trillion points. That would be a lot of work to get a poor result. To get a noticeable increase in accuracy we could go up to 20 points per dimension (which would still be pretty coarse). But even that modest increase would mean that we would then need $N=20^{12}$ points, which is 4096 times more than before! What if our function depended on daily observations (d=365)? Let's not even go there.

The actual question though was not referring to that fundamental problem that regular grids face in high dimensions, but rather asked more specifically: "Why does Monte Carlo's random point selection lead to more accurate integral evaluations than a regular grid approach using the same total number of points N?". This may certainly seem counter-intuitive as one should probably be forgiven to assume that the regular grid point placement should represent the space better than a random one and thus lead to better integral estimation. I looked into this for a second (OK, maybe an hour) and as it happens, maybe surprisingly, the regular grid placement is just not optimal for integration. ​This was part of my answer:

"Placing points on a regular grid is not the best you can do integration-wise, as it leaves large holes that can be avoided by smarter placement. Think of a 3D regular grid. When you look at it from any side, i.e. when you look at a 2D projection of the points, all the points from the third dimension fall on the same place (they overlap) and thus leave large holes that don't aid the integration. The same happens in higher dimensions, so for better integral estimation any lower-dimensional face (projection) should be as homogeneously and uniformly filled with points as possible, without having points overlapping and creating holes. This is what Quasi-Monte Carlo (QMC) sequences try to do. They place points (deterministically) at locations that (should) avoid leaving them overlapping in lower dimensional projections and for this reason they can beat the regular grid even in 2D and 3D. 
Think then of Monte Carlo as having the same advantage as QMC (i.e. leaving less glaring holes in lower-dimensional faces, exactly because it lacks that regular structure that is responsible for the holes) but with the extra disadvantage of not placing those points very uniformly and homogeneously (like a regular grid and QMC do). At some point (at around d > 4) the advantage wins over the disadvantage and MC starts beating the regular grid."

So does a typical numerical experiment support the above claims?

Estimating Pi via Monte Carlo evaluation of the volume of an n-ball

Let's take a closer look at the introductory Monte Carlo experiment of trying to calculate the value of $\pi$ through a "dartboard" approach. This is usually done by estimating the ratio of the area of a circle to the area of the square that just superscribes it, by randomly "throwing darts" (simulated by generating random point coordinates (x,y)) inside the square and calculating the proportion of the total points that fall into the inscribed circle. This ratio can also be calculated through calculus in terms of $\pi$ as $\pi/4$. At this point the "unknown" $\pi$ is defined as the ratio of a circle's circumference to its diameter. 

Here we are going to do the same, but using a sphere and a 5-D ball instead. The purpose is not really to find $\pi$, but to contrast the efficiency of Monte Carlo integration against a regular Reimann sum and also Quasi Monte Carlo (QMC) as dimension goes up. In other words see which particular point placing strategy gives the better results, i.e. more accurate estimation of $\pi$ for a given number N of sampling points. Monte Carlo places the points in a way that intents to be random (but it's really deterministic since we use computer algorithms (pseudo-random number generators) to generate them. For a Reimann sum we use a regular grid of points that places $N^{1/d}$ points uniformly along each axis. Finally QMC places the points seemingly randomly but really deterministically, seeking some desirable space distribution properties.

Figures 1 and 2 show the results which seem to be confirming the theory. It can be seen that in 3-D MC (whose error should be of order $O(1/\sqrt{N})$) still loses to the mid-point regular grid ($O(1/{N^{2/3}})$), as expected, and beats the plain regular grid ($O(1/{N^{1/3}})$) again as expected. It can also be seen that QMC beats the mid-point regular grid even in 3-D, as was suggested in my answer above. Presumably because the combination of its uniformity and lower-dimension projection hole-filling properties beats the "perfect" regularity (and point "wasting") of the grid. In 5-D MC (still $O(1/\sqrt{N})$) starts overtaking the mid-point regular grid (now $O(1/{N^{2/5}})$) but still loses to QMC, again as expected.


Picture
Figure 1. Pi estimation absolute error through calculation of a sphere volume with different sampling point placement strategies.
​
Picture
Figure 2. Pi estimation absolute error through calculation of a 5-ball volume with different sampling point placement strategies.

​These graphs might look "wrong" at first sight, the MC error shown as a straight line and the mid-point regular grid a jagged one! That's because I have plotted the mean absolute error for each experiment size N for the MC simulation, instead of plotting just a single simulation's error progression with increasing N. This can be determined numerically for each experiment size, for example obtain 10000 estimations $\hat \pi$ from N=1024-point experiments and find the average of $|\hat \pi-\pi|$. Or we can use this as an opportunity to work a bit of statistics and calculate the expectation $\mathbb E$($|\hat \pi-\pi|$) analytically. Here's how: Calculus tells us that the ratio of the volume of the sphere to the volume of the cube supersribing it is $\pi/6$. So $\pi$ can be estimated as 6 times the proportion of the total N points that will fall into the sphere. One "dart throw" then corresponds to a binomial variate which takes a value of 6 if it (a point) falls into the sphere and zero otherwise. We also know the probability of these outcomes which are $\pi/6$ and $(1-\pi/6)$ respectively. Then the mean of our variate is of course $\pi$ and its variance $\sigma_b^2=$ $(6-\pi)^2(\pi/6)+(0-\pi)^2(1-\pi/6)=$ $2.996657^2$. Then we note that the estimator $\hat\pi$ is just the sum of N such variates divided by N. The Central Limit Theorem tells us that for large N $\hat\pi$ will follow a normal distribution like $\hat\pi\sim \mathcal{N}\left(\mu=\pi,\sigma_{\pi N}^2=\frac{\sigma_b^2}{N}\right)$. It follows that $\hat\pi-\pi\sim \mathcal{N}\left(\mu=0,\sigma_{\pi N}^2=\frac{\sigma_b^2}{N}\right)$. And finally the absolute value of this (i.e. the mean absolute error of our MC experiment using N points) follows a half-normal distribution (special case of the folded normal distribution) and as such will have a mean of $\sigma_{\pi N}\sqrt2/\sqrt\pi$ and variance of $\sigma_{\pi N}^2(1-2/\pi)$. These formulas were used for the MC mean and and indication of the max (3 standard deviations) error in the graph. The mid-point regular grid error on the other hand does show some fluctuation depending on the exact number of points used per dimension. One point more or less leads to a slightly different error magnitude line. The regular grid does not show this.  

For the 5-D experiment the equivalent calculus result relates the volume of the 5-ball inscribed in the unit 5-hypercube to $\pi^2$ instead of $\pi$, as $\pi^2/60$. So our dartboard experiment will actually give us an estimate of $\pi^2$. Then the estimate of $\pi$ would just be the square root of that, right? Well almost. Similarly to the sphere case we give a value of 60 if a point falls into the 5-ball and 0 if it doesn't. We also know (a posteriori!) the probability of these outcomes which are $\pi^2/60$ and $(1-\pi^2/60)$ respectively. The single "dart throw" is then a binomial variate with mean $\pi^2$ and variance $\sigma_b^2=(60-\pi^2)^2(\pi^2/60)+$ $(0-\pi^2)^2$ $(1-\pi^2/60)$ $=22.2433624^2$. And when we throw a large number of darts N, then the average of their outcomes follows a normal distribution like $\hat{\pi^2}\sim \mathcal{N}\left(\mu=\pi^2,\sigma_{\pi^2 N}^2=\frac{\sigma_b^2}{N}\right)$. But is $\mathbb E(\sqrt{\hat{\pi^2}})$ equal to $\pi$? Not really. The square root of our $\pi^2$ estimator follows a different distribution with expectation generally smaller than $\pi$, depending on N (Jensen's inequality ensures this is the case, combined with the fact that the square root is a concave function). For N=100, the expectation of our experiment would actually be 3.121. So even if we had say 1 billion estimations of $\pi$ from our 100-dart-throwing into the 5-hypercube experiment, the average of all those estimations would come out as 3.121, not 3.14... So in reality our 5-D experiment does not provide us with an unbiased way of estimating $\pi$, like the 2-D and 3-D experiments do. And assuming we really didn't know the value of $\pi$ then we wouldn't know how big the bias was either. This bias though is rapidly removed as N goes up, for N=1024 the expectation becomes 3.13964 and for N = 1048256 it is 3.141591 (see below how to actually calculate this when we know $\pi$ already). Moreover, this inherent bias in our estimator is always much smaller than the mean absolute error of the experiment (as shown in figures 1 and 2). And as all three methods of point-sampling integration are affected the same, the picture of their comparative performance is not affected.

Finally, out of curiosity, let's see how we can calculate the actual expectation of our 5-D experiment's $\pi$ estimator $\pi_{est}$, $\mathbb E(\pi_{est})$ = $\mathbb E(\sqrt{\hat{\pi^2}})$. Since $\hat{\pi^2}\sim \mathcal{N}\left(\mu=\pi^2,\sigma_{\pi^2 N}^2=\frac{\sigma_b^2}{N}\right)$, the cumulative distribution function (CFD) of $\pi_{est}$ can be seen to be: $F_{\pi_{est}}(x) = $ $\Pr\left(\pi_{est}\leqslant x\right) = \Pr\left(\hat{\pi_2} \leqslant x^2\right) = $ $\Phi\left(\frac{x^2-\pi^2}{\sigma_b/{\sqrt N}}\right) $ where $\Phi$ is the standard normal CDF. The probability density function (PDF) of $\pi_{est}$ is then $ f_{\pi_{est}}(x) = F_{\pi_{est}}^\prime(x)$ $=\frac{2x}{\sigma_b/{\sqrt N}}\phi\left(\frac{x^2-\pi^2}{\sigma_b/{\sqrt N}}\right) $ where $\phi$ is the standard normal PDF. Substituting for $\phi$ we get $ f_{\pi_{est}}(x) = \sqrt {2N/{\pi\sigma_b^2}} \cdot x \cdot exp\left(-N(x^2-\pi^2)^2 / {2\sigma_b^2}\right)$. Then: $$\mathbb E(\pi_{est}) = \int_0^\infty{x \cdot f_{\pi_{est}}(x) dx} = \int_0^\infty{\sqrt {\frac{2N}{\pi\sigma_b^2}} \cdot x^2 \cdot exp\left(\frac{-N(x^2-\pi^2)^2} {2\sigma_b^2}\right)dx}$$ I could not find an analytical solution to this integral, but one can always calculate it numerically with something like Simpson's rule, which is what I did for the figures I quoted above.
0 Comments
Forward>>

    Author

    Yiannis Papadopoulos

    Archives

    December 2022
    January 2022
    November 2021
    February 2018
    January 2018
    December 2017
    January 2017
    November 2016
    August 2016
    April 2016
    October 2015

    Categories

    All

    RSS Feed

Home
Driven Cavity Benchmarks
Option Pricing App​
Contact