1010 USP39-NF34 ANALYTICAL DATA INTERPRETATION AND TREATMENT (中英文) 下载本文

No distributional assumptions were made on the data in Table 1, as the purpose of this Appendix is to illustrate the calculations involved in a precision study.

因为本附录的目的是展示精密度研究中所涉及的计算,所以没有对表1中数据做出分布假设。

Table 2. The Predicted Impact of the Test Plan (No. of Runs and No. of Replicates per Run) on the Precision of the

Mean

表2 实验计划对均值精密度的预期影响(实验组数与每组重复次数)

No. of Runs No. of Replicates per Run Variance of the Mean SD of the Mean 1 1 1.251 1.118 1 2 1.200 1.095 1 3 1.183 1.088 2 1 0.625 0.791 2 2 0.600 0.775 2 3 0.592 0.769 % RSDa 1.11 1.09 1.08 0.78 0.77 0.76 a

A mean value of 100.96, based on the 15 data points presented in Table 1, was used (as the divisor) to compute the %RSD.

a

, 基于表1中15个数据的均值为100.96,均值(作为除数)被用于计算%RSD。

APPENDIX C: EXAMPLES OF OUTLIER TESTS FOR ANALYTICAL DATA

附录C:分析数据异常值检验的实例

Given the following set of 10 measurements: 100.0, 100.1, 100.3, 100.0, 99.7, 99.9, 100.2, 99.5, 100.0, and 95.7 (mean = 99.5, standard deviation = 1.369), are there any outliers?

假设下列数据组来自10次测量结果:100.0, 100.1, 100.3, 100.0, 99.7, 99.9, 100.2, 99.5, 100.0和95.7(平均值为99.5,标准偏差为1.369),其中存在异常值吗?

Generalized Extreme Studentized Deviate (ESD) Test

极端学生化偏离(ESD)检验

This is a modified version of the ESD Test that allows for testing up to a previously specified number, r, of outliers from a normally distributed population. For the detection of a single outlier (r = 1), the generalized ESD procedure is also known as Grubb's test. Grubb's test is not recommended for the detection of multiple outliers. Let r equal 2, and n equal 10.

这是一种改进过的ESD检验,它可以从一个正态分布的总体当中发现预先设定数量(r)的异常值。对于仅检测1个异常值的情况,极端学生化偏离检验也就是常说的Grubb's检验。不建议将Grubb's检验用于多个异常值的检验。设定r=2,而n=10。

Stage 1 (n = 10)—Normalize each result by subtracting the mean from each value and dividing this difference by the standard deviation (see Table 3)4.

阶段1(n = 10)—通过将每个数值减去平均值,然后再差值除以标准偏差的方法,从而进行结果的正态化(参见表3)4。

Table 3. Generalized ESD Test Results 表3 极端学生化偏离(ESD)检验

Mean = SD = 4

n = 10 Data 100.3 100.2 100.1 100.0 100.0 100.0 99.9 99.7 99.5 95.7 99.54 1.369 Normalized +0.555 +0.482 +0.409 +0.336 +0.336 +0.336 +0.263 +0.117 ?0.029 ?2.805 n = 9 Data 100.3 100.2 100.1 100.0 100.0 100.0 99.9 99.7 99.5 99.95 0.245 Normalized +1.361 +0.953 +0.544 +0.136 +0.136 +0.136 ?0.272 ?1.089 ?1.905

The difference between each value and the mean is termed the residual. Other Studentized residual outlier tests exist where the residual, instead of being divided by the standard deviation, can be divided by the standard deviation times the square root of n ? 1 divided by n.

每个数值与均值的差被称为残差。当不使用标准差作为除数,而采用标准偏差与n-1与n商的平方根时,也可以使用其他的学生化残差异常值检测方法。

Take the absolute value of these results, select the maximum value (R1 = 2.805), and compare it to a previously specified tabled critical value l1 (2.290) based on the selected significance level (for example, 5%). The maximum value is larger than the tabled value and is identified as being inconsistent with the remaining data. Sources for l-values are included in many statistical textbooks. Caution should be exercised when using any statistical table to ensure that the correct (i.e., level of acceptable error) are used when extracting table values.

从这些结果的绝对值中选出最大值(R1 = 2.805),然后比较其与预先规定的在选定显著性水平(如,5%)下的查表临界值(tabled critical value) l1 (2.290)。如果大于查表临界值,最大值就被识别为与其他值不一致。l值可以从许多统计教材中查阅。使用任何统计表时应注意所提取查表值的正确标识(如,可接受误差的水平)。 Stage 2 (n = 9)—Remove the observation corresponding to the maximum absolute normalized result from the original data set, so that n is now 9. Again, find the mean and standard deviation (Table 3, right two columns), normalize each value, and take the absolute value of these results. Find the maximum of the absolute values of the 9 normalized results (R2 = 1.905), and compare it to l2 (2.215). The maximum value is not larger than the tabled value. 阶段2(n = 9)—从原始数据组中去除最大绝对正态结果对应的观测值,现在n值为9。再次的,计算出均值和标准偏差(表3,右侧两列),将每个值正态化并将结果取绝对值。找出9个正态结果中绝对值的最大值(R2 = 1.905),将其与l2 (2.215)进行比较。最大值不大于查表值。

Conclusion—The result from the first stage, 95.7, is declared to be an outlier, but the result from the second stage, 99.5, is not an outlier.

结论—在第一个阶段中,95.7被识别为异常值,但在第二个阶段中,99.5不是异常值。

Dixon-Type Tests 狄克逊检验(Dixon检验)

Dixon's Test can be one-sided or two-sided, depending on an a priori decision as to whether outliers will be

considered on one side only. As with the ESD Test, Dixon's Test assumes that the data, in the absence of outliers, come from a single normal population. Following the strategy used for the ESD Test, we proceed as if there were no a priori decision as to side, and so use a two-sided Dixon's Test. From examination of the example data, we see that it is the two smallest that are to be tested as outliers. Dixon provides for testing for two outliers simultaneously; however, these procedures are beyond the scope of this Appendix. The stepwise procedure discussed below is not an exact procedure for testing for the second outlier, because the result of the second test is conditional upon the first. And because the sample size is also reduced in the second stage, the end result is a procedure that usually lacks the sensitivity of Dixon's exact procedures.

狄克逊检验可以是单侧也可以是双侧的,取决于异常值是否仅被认为来自单侧的预先决策。与ESD检验一样,狄克逊检验假定去除异常值后的数据是来自于一个单一的正态总体。使用ESD检验中用到的策略,假如没有对单侧的预先决策,我们使用双侧的狄克逊检验。通过观测举例数据,我们发现它有两个最小值需要进行异常值检测。狄克逊检验可以同时对两个异常值进行检验,然而,这一过程超出了本附录的范围。下面讨论的分步过程

并不是检验第二个异常值的实际过程,因为第二次检验的结果是基于第一次的条件之上。另外也因为在第二阶段样本量是减少了的近过程可以从一个正态分布的总体当中发现预先设定,最终结果应用的方法通过比狄克逊检验实际方法缺少了灵敏度。

Stage 1 (n = 10)—The results are ordered on the basis of their magnitude (i.e., Xn is the largest observation, Xn? 1 is the second largest, etc., and X1 is the smallest observation). Dixon's Test has different ratios based on the sample size (in this example, with n = 10), and to declare X1 an outlier, the following ratio, r11, is calculated by the formula: 阶段1(n = 10)—基于每个数值的大小进行排序(如,Xn是最大的观测值,Xn-1是第二大的值,以此类推,X1是最小的观测值)。根据样本量(本例中n = 10),狄克逊检验有不同的比值,为了检验X1是异常值,根据下列公式计算比值r11。

A different ratio would be employed if the largest data point was tested as an outlier. The r11 result is compared to an r11, 0.05 value in a table of critical values. If r11 is greater than r11, 0.05, then it is declared an outlier. For the above set of data, r11 = (99.5 ? 95.7)/(100.2 ? 95.7) = 0.84. This ratio is greater than r11, 0.05, which is 0.52979 at the 5% significance level for a two-sided Dixon's Test. Sources for r11, 0.05 values are included in many statistical textbooks5.

如果需要对最大值进行检验,计算不同的比值。r11的结果会与临界值表中的r11,0.05进行比较。如果r11大于r11,0.05,那么它就被识别为异常值。对于上述数据组,r11 = (99.5-95.7)/(100.2-95.7) = 0.84。这个比值超过了r11,0.05,在5%显著性水平下双侧狄克逊检验r11,0.05的值为0.52979。r11,0.05值可以从许多统计教材中查阅5。

Stage 2—Remove the smallest observation from the original data set, so that n is now 9. The same r11 equation is used, but a new critical r11, 0.05value for n = 9 is needed (r11, 0.05 = 0.56420). Now r11 = (99.7 ? 99.5)/(100.2 ? 99.5) = 0.29, which is less than r11, 0.05 and not significant at the 5% level.

阶段2 —从原始数据组中去除最大绝对正态结果对应的观测值,现在n值为9。使用同样的公式计算r11,但因为n = 9,需要新的临界值r11,0.05(r11, 0.05 = 0.56420)。现在r11= (99.7-99.5)/(100.2-99.5) = 0.29,小于r11,0.05,因此在5%水平上没有显著性。

Conclusion—Therefore, 95.7 is declared to be an outlier but 99.5 is not an outlier. 结论—因此,95.7被识别为异常值,而99.5不是异常值。

Hampel's Rule Hampel规则

Step 1—The first step in applying Hampel's Rule is to normalize the data. However, instead of subtracting the mean from each data point and dividing the difference by the standard deviation, the median is subtracted from each data

5

The critical values for r in this example are taken from Reference 2 in Appendix G, Outlier Tests. 本例中临界值r来自于附录G的参考文献2,异常值检验。