Sponsored Links

Jumat, 25 Mei 2018

Sponsored Links

simulation - Truncate lognormal distribution with excel - Cross ...
src: i.stack.imgur.com

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above (or both). The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the Tobit model.


Video Truncated normal distribution



Definition

Suppose X ~ N ( ? , ? 2 ) {\displaystyle X\sim N(\mu ,\sigma ^{2})} has a normal distribution and lies within the interval X ? ( a , b ) , - ? <= a < b <= ? {\displaystyle X\in (a,b),\;-\infty \leq a<b\leq \infty } . Then X {\displaystyle X} conditional on a < X < b {\displaystyle a<X<b} has a truncated normal distribution.

Its probability density function, f {\displaystyle f} , for a <= x <= b {\displaystyle a\leq x\leq b} , is given by

f ( x ; ? , ? , a , b ) = ? ( x - ? ? ) ? ( ? ( b - ? ? ) - ? ( a - ? ? ) ) {\displaystyle f(x;\mu ,\sigma ,a,b)={\frac {\phi ({\frac {x-\mu }{\sigma }})}{\sigma \left(\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})\right)}}}

and by f = 0 {\displaystyle f=0} otherwise.

Here,

? ( ? ) = 1 2 ? exp ( - 1 2 ? 2 ) {\displaystyle \phi (\xi )={\frac {1}{\sqrt {2\pi }}}\exp \left(-{\frac {1}{2}}\xi ^{2}\right)}

is the probability density function of the standard normal distribution and ? ( ? ) {\displaystyle \Phi (\cdot )} is its cumulative distribution function

? ( x ) = 1 2 ( 1 + erf ( x / 2 ) ) . {\displaystyle \Phi (x)={\frac {1}{2}}\left(1+\operatorname {erf} (x/{\sqrt {2}})\right).}

By definition, if b = ? {\displaystyle b=\infty } , then ? ( b - ? ? ) = 1 {\displaystyle \Phi \left({\tfrac {b-\mu }{\sigma }}\right)=1} , and similarly, if a = - ? {\displaystyle a=-\infty } , then ? ( a - ? ? ) = 0 {\displaystyle \Phi \left({\tfrac {a-\mu }{\sigma }}\right)=0} .


The above formulae show that when - ? < a < b < + ? {\displaystyle -\infty <a<b<+\infty } the scale parameter ? 2 {\displaystyle \sigma ^{2}} of the truncated normal distribution is allowed to assume negative values. The parameter ? {\displaystyle \sigma } is in this case imaginary, but the function f {\displaystyle f} is nevertheless real, positive, and normalizable. The scale parameter ? 2 {\displaystyle \sigma ^{2}} of the canonical normal distribution must be positive because the distribution would not be normalizable otherwise. The doubly truncated normal distribution, on the other hand, can in principle have a negative scale parameter (which is different from the variance, see summary formulae), because no such integrability problems arise on a bounded domain. In this case the distribution cannot be interpreted as a canonical normal conditional on a < X < b {\displaystyle a<X<b} , of course, but can still be interpreted as a maximum-entropy distribution with first and second moments as constraints, and has an additional peculiar feature: it presents two local maxima instead of one, located at x = a {\displaystyle x=a} and x = b {\displaystyle x=b} .



Maps Truncated normal distribution


Moments

If the random variable has been truncated only from below, some probability mass has been shifted to higher values, giving a first-order stochastically dominating distribution and hence increasing the mean to a value higher than the mean ? {\displaystyle \mu } of the original normal distribution. Likewise, if the random variable has been truncated only from above, the truncated distribution has a mean less than ? . {\displaystyle \mu .}

Regardless of whether the random variable is bounded above, below, or both, the truncation is a mean-preserving contraction combined with a mean-changing rigid shift, and hence the variance of the truncated distribution is less than the variance ? 2 {\displaystyle \sigma ^{2}} of the original normal distribution.

Let ? = ( a - ? ) / ? {\displaystyle \alpha =(a-\mu )/\sigma } and ? = ( b - ? ) / ? : {\displaystyle \beta =(b-\mu )/\sigma :}

Two sided truncation
E ( X | a < X < b ) = ? + ? ? ( a - ? ? ) - ? ( b - ? ? ) ? ( b - ? ? ) - ? ( a - ? ? ) = ? + ? ? ( ? ) - ? ( ? ) ? ( ? ) - ? ( ? ) {\displaystyle \operatorname {E} (X\mid a<X<b)=\mu +\sigma {\frac {\phi ({\frac {a-\mu }{\sigma }})-\phi ({\frac {b-\mu }{\sigma }})}{\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}}\!=\mu +\sigma {\frac {\phi (\alpha )-\phi (\beta )}{\Phi (\beta )-\Phi (\alpha )}}\!}
Var ( X | a < X < b ) = ? 2 [ 1 + a - ? ? ? ( a - ? ? ) - b - ? ? ? ( b - ? ? ) ? ( b - ? ? ) - ? ( a - ? ? ) - ( ? ( a - ? ? ) - ? ( b - ? ? ) ? ( b - ? ? ) - ? ( a - ? ? ) ) 2 ] = ? 2 [ 1 + ? ? ( ? ) - ? ? ( ? ) ? ( ? ) - ? ( ? ) - ( ? ( ? ) - ? ( ? ) ? ( ? ) - ? ( ? ) ) 2 ] {\displaystyle \operatorname {Var} (X\mid a<X<b)=\sigma ^{2}\left[1+{\frac {{\frac {a-\mu }{\sigma }}\phi ({\frac {a-\mu }{\sigma }})-{\frac {b-\mu }{\sigma }}\phi ({\frac {b-\mu }{\sigma }})}{\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}}-\left({\frac {\phi ({\frac {a-\mu }{\sigma }})-\phi ({\frac {b-\mu }{\sigma }})}{\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}}\right)^{2}\right]\!=\sigma ^{2}\left[1+{\frac {\alpha \phi (\alpha )-\beta \phi (\beta )}{\Phi (\beta )-\Phi (\alpha )}}-\left({\frac {\phi (\alpha )-\phi (\beta )}{\Phi (\beta )-\Phi (\alpha )}}\right)^{2}\right]\!}
Care must be taken in the numerical evaluation of these formulas, which can result in catastrophic cancellation when the interval [ a , b ] {\displaystyle [a,b]} does not include ? {\displaystyle \mu } . There are better ways to rewrite them that avoid this issue.
One sided truncation (of lower tail)

In this case ? ( ? ) = 0 , ? ( ? ) = 1 , {\displaystyle \;\phi (\beta )=0,\;\Phi (\beta )=1,} and

E ( X | X > a ) = ? + ? ? ( ? ) / Z , {\displaystyle \operatorname {E} (X\mid X>a)=\mu +\sigma \phi (\alpha )/Z,\!}
Var ( X | X > a ) = ? 2 [ 1 + ? ? ( ? ) / Z - ( ? ( ? ) / Z ) 2 ] , {\displaystyle \operatorname {Var} (X\mid X>a)=\sigma ^{2}[1+\alpha \phi (\alpha )/Z-(\phi (\alpha )/Z)^{2}],}

where Z = 1 - ? ( ? ) . {\displaystyle Z=1-\Phi (\alpha ).}

One sided truncation (of upper tail)
E ( X | X < b ) = ? - ? ? ( ? ) ? ( ? ) {\displaystyle \operatorname {E} (X\mid X<b)=\mu -\sigma {\frac {\phi (\beta )}{\Phi (\beta )}}\!}
Var ( X | X < b ) = ? 2 [ 1 - ? ? ( ? ) ? ( ? ) - ( ? ( ? ) ? ( ? ) ) 2 ] . {\displaystyle \operatorname {Var} (X\mid X<b)=\sigma ^{2}\left[1-\beta {\frac {\phi (\beta )}{\Phi (\beta )}}-\left({\frac {\phi (\beta )}{\Phi (\beta )}}\right)^{2}\right].\!}

Barr and Sherrill (1999) give a simpler expression for the variance of one sided truncations. Their formula is in terms of the chi-square CDF, which is implemented in standard software libraries. Bebu and Mathew (2009) provide formulas for (generalized) confidence intervals around the truncated moments.

A recursive formula

As for the non-truncated case, there is a recursive formula for the truncated moments.


Transforming Data with a LogNormal Distribution - YouTube
src: i.ytimg.com


Simulating

A random variate x defined as x = ? - 1 ( ? ( ? ) + U ? ( ? ( ? ) - ? ( ? ) ) ) ? + ? {\displaystyle x=\Phi ^{-1}(\Phi (\alpha )+U\cdot (\Phi (\beta )-\Phi (\alpha )))\sigma +\mu } with ? {\displaystyle \Phi } the cumulative distribution function and ? - 1 {\displaystyle \Phi ^{-1}} its inverse, U {\displaystyle U} a uniform random number on ( 0 , 1 ) {\displaystyle (0,1)} , follows the distribution truncated to the range ( a , b ) {\displaystyle (a,b)} . This is simply the inverse transform method for simulating random variables. Although one of the simplest, this method can either fail when sampling in the tail of the normal distribution, or be much too slow. Thus, in practice, one has to find alternative methods of simulation.

One such truncated normal generator (implemented in Matlab and in R (programming language) as trandn.R ) is based on an acceptance rejection idea due to Marsaglia. Despite the slightly suboptimal acceptance rate of Marsaglia (1964) in comparison with Robert (1995), Marsaglia's method is typically faster, because it does not require the costly numerical evaluation of the exponential function.

For more on simulating a draw from the truncated normal distribution, see Robert (1995), Lynch (2007) Section 8.1.3 (pages 200-206), Devroye (1986). The MSM package in R has a function, rtnorm, that calculates draws from a truncated normal. The truncnorm package in R also has functions to draw from a truncated normal.

Chopin (2011) proposed (arXiv) an algorithm inspired from the Ziggurat algorithm of Marsaglia and Tsang (1984, 2000), which is usually considered as the fastest Gaussian sampler, and is also very close to Ahrens's algorithm (1995). Implementations can be found in C, C++, Matlab and Python.

Sampling from the multivariate truncated normal distribution is considerably more difficult. Exact or perfect simulation is only feasible in the case of truncation of the normal distribution to a polytope region. In more general cases, Damien and Walker (2001) introduce a general methodology for sampling truncated densities within a Gibbs sampling framework. Their algorithm introduces one latent variable and, within a Gibbs sampling framework, it is more computationally efficient than the algorithm of Robert (1995).


Normal Modes of Electron Vortex Beams | Philosophical Transactions ...
src: rsta.royalsocietypublishing.org


See also

  • Normal distribution
  • Truncated distribution

epigenetic regulation of OBP11 | Proceedings of the Royal Society ...
src: rspb.royalsocietypublishing.org


References

  • Greene, William H. (2003). Econometric Analysis (5th ed.). Prentice Hall. ISBN 0-13-066189-9. 
  • Norman L. Johnson and Samuel Kotz (1970). Continuous univariate distributions-1, chapter 13. John Wiley & Sons.
  • Lynch, Scott (2007). Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. New York: Springer. ISBN 978-1-4419-2434-6. 
  • Robert, Christian P. (1995). "Simulation of truncated normal variables". Statistics and Computing. 5 (2): 121-125. arXiv:0907.4010 . doi:10.1007/BF00143942. 
  • Barr, Donald R.; Sherrill, E.Todd (1999). "Mean and variance of truncated normal distributions". The American Statistician. 53 (4): 357-361. doi:10.1080/00031305.1999.10474490. 
  • Bebu, Ionut; Mathew, Thomas (2009). "Confidence intervals for limited moments and truncated moments in normal and lognormal models". Statistics and Probability Letters. 79: 375-380. doi:10.1016/j.spl.2008.09.006. 
  • Damien, Paul; Walker, Stephen G. (2001). "Sampling truncated normal, beta, and gamma densities". Journal of Computational and Graphical Statistics. 10 (2): 206-215. doi:10.1198/10618600152627906. 
  • Nicolas Chopin, "Fast simulation of truncated Gaussian distributions". Statistics and Computing 21(2): 275-288, 2011, doi:10.1007/s11222-009-9168-1
  • Burkardt, John. "The Truncated Normal Distribution" (PDF). Department of Scientific Computing website. Florida State University. Retrieved 15 February 2018. 

Source of the article : Wikipedia

Comments
0 Comments