Stochastic (Monte Carlo) estimation of Type I error Probability in Dixon's Q-test  

[based on: C. E. EFSTATHIOU: "Estimation of Type I Error Probability from Experimental Dixon's "Q" Parameter on Testing for Outliers within small Size Data Sets", Talanta, 69(5), 1068-1071 (2006)., PDF]

The traditional way of performing Q-test is based on the use of tabulated critical-Q values. However, as it is common to all similarly performed significance tests, this way cannot provide us with the exact value of type-I error probability (p). Hence, if e.g. the suspect value can be rejected at 95% confidence level, but it must be retained at 99%, we only know that: 0,01 < p < 0,05.

Statistical software packages for performing significance tests have made obsolete the use of tabulated critical values, since their usual outcome is directly the value of p, which is internally computed.

Unfortunately, in the case of the Q-test, the required mathematics for the calculation of p are particularly arduous and special numerical integration techniques are needed. In his original publication [Ann. Math. Stat., 22 (1951) 68], Dixon gave analytical expressions of the type Q = F(p), only for Í = 3 and 4. These functions can be rearranged and obtain the form of p = f(Q), which are used in this applet. 

Thus for N = 3 the  equation used is:  

whereas for N = 4 the equation used is (provided that p<0.5, otherwise the Monte Carlo approach is used):

For N > 4 there are no similar analytical expressions, but we can use a simple Monte Carlo approach for the estimation of p. [Note: for a short introduction in Monte Carlo techniques, see applet: Drunken sailor's random walk]. 

The algorithm used is as follows:

(i) A set of N random values from the same normally distributed population is obtained, and the Q-value corresponding to this random set (Qrand) is calculated.

(ii) If this Qrand value is greater than the input Qexp value, then a counter (C) is incremented.

(iii) Steps (i) and (ii) are repeated Nsim times (Nsim : number of simulations).

(iv) The ratio C / Nsim is the estimated value of p.

The flow-chart of this algorithm is shown to the right:  

The reasoning behind this algorithm is that all Nsim  Qrand–values have been obtained from the same by definition "outliers-free" sets of N random values, since they have been all obtained from the same normal population. Therefore, the fraction of Qrand values greater that the examined Qexp value, represents the probability of normal occurrence Qrand at this range.

 

Obviously, a large number Nsim of simulations is needed to obtain a reasonably accurate value of p. In the present applet it is Nsim= 300,000. Ôhis number yields p values accurate to at least 2-significant figures within a reasonably short calculation period (typically: 5-10 s).   

NOTE: Normally distributed random numbers with mean ́ and standard deviation óx can be readily produced by using the equation shown to the right as "random-number generator":  

rj is a random number uniformly distributed between 0 and 1. Such random numbers (actually: pseudo-random) are provided by most high-level computer languages. As n increases, the generated x values tend to obtain a normal distribution. Typically,  n=12. This "generator" is based on the "Central Limit Theorem" (see applet: Central Limit Theorem)