1010 USP39-NF34 ANALYTICAL DATA INTERPRETATION AND TREATMENT (ÖÐÓ¢ÎÄ) ÏÂÔØ±¾ÎÄ

No distributional assumptions were made on the data in Table 1, as the purpose of this Appendix is to illustrate the calculations involved in a precision study.

ÒòΪ±¾¸½Â¼µÄÄ¿µÄÊÇչʾ¾«ÃܶÈÑо¿ÖÐËùÉæ¼°µÄ¼ÆË㣬ËùÒÔûÓжԱí1ÖÐÊý¾Ý×ö³ö·Ö²¼¼ÙÉè¡£

Table 2. The Predicted Impact of the Test Plan (No. of Runs and No. of Replicates per Run) on the Precision of the

Mean

±í2 ʵÑ鼯»®¶Ô¾ùÖµ¾«ÃܶȵÄÔ¤ÆÚÓ°Ï죨ʵÑé×éÊýÓëÿ×éÖØ¸´´ÎÊý£©

No. of Runs No. of Replicates per Run Variance of the Mean SD of the Mean 1 1 1.251 1.118 1 2 1.200 1.095 1 3 1.183 1.088 2 1 0.625 0.791 2 2 0.600 0.775 2 3 0.592 0.769 % RSDa 1.11 1.09 1.08 0.78 0.77 0.76 a

A mean value of 100.96, based on the 15 data points presented in Table 1, was used (as the divisor) to compute the %RSD.

a

, »ùÓÚ±í1ÖÐ15¸öÊý¾ÝµÄ¾ùֵΪ100.96£¬¾ùÖµ(×÷Ϊ³ýÊý)±»ÓÃÓÚ¼ÆËã%RSD¡£

APPENDIX C: EXAMPLES OF OUTLIER TESTS FOR ANALYTICAL DATA

¸½Â¼C£º·ÖÎöÊý¾ÝÒì³£Öµ¼ìÑéµÄʵÀý

Given the following set of 10 measurements: 100.0, 100.1, 100.3, 100.0, 99.7, 99.9, 100.2, 99.5, 100.0, and 95.7 (mean = 99.5, standard deviation = 1.369), are there any outliers?

¼ÙÉèÏÂÁÐÊý¾Ý×éÀ´×Ô10´Î²âÁ¿½á¹û£º100.0, 100.1, 100.3, 100.0, 99.7, 99.9, 100.2, 99.5, 100.0ºÍ95.7£¨Æ½¾ùֵΪ99.5£¬±ê׼ƫ²îΪ1.369£©£¬ÆäÖдæÔÚÒì³£ÖµÂð£¿

Generalized Extreme Studentized Deviate (ESD) Test

¼«¶ËѧÉú»¯Æ«Àë(ESD)¼ìÑé

This is a modified version of the ESD Test that allows for testing up to a previously specified number, r, of outliers from a normally distributed population. For the detection of a single outlier (r = 1), the generalized ESD procedure is also known as Grubb's test. Grubb's test is not recommended for the detection of multiple outliers. Let r equal 2, and n equal 10.

ÕâÊÇÒ»ÖָĽø¹ýµÄESD¼ìÑ飬Ëü¿ÉÒÔ´ÓÒ»¸öÕý̬·Ö²¼µÄ×ÜÌåµ±Öз¢ÏÖÔ¤ÏÈÉ趨ÊýÁ¿£¨r£©µÄÒì³£Öµ¡£¶ÔÓÚ½ö¼ì²â1¸öÒì³£ÖµµÄÇé¿ö£¬¼«¶ËѧÉú»¯Æ«Àë¼ìÑéÒ²¾ÍÊdz£ËµµÄGrubb's¼ìÑé¡£²»½¨Ò齫Grubb's¼ìÑéÓÃÓÚ¶à¸öÒì³£ÖµµÄ¼ìÑé¡£É趨r=2,¶øn=10¡£

Stage 1 (n = 10)¡ªNormalize each result by subtracting the mean from each value and dividing this difference by the standard deviation (see Table 3)4.

½×¶Î1(n = 10)¡ªÍ¨¹ý½«Ã¿¸öÊýÖµ¼õȥƽ¾ùÖµ£¬È»ºóÔÙ²îÖµ³ýÒÔ±ê׼ƫ²îµÄ·½·¨£¬´Ó¶ø½øÐнá¹ûµÄÕý̬»¯£¨²Î¼û±í3£©4¡£

Table 3. Generalized ESD Test Results ±í3 ¼«¶ËѧÉú»¯Æ«Àë(ESD)¼ìÑé

Mean = SD = 4

n = 10 Data 100.3 100.2 100.1 100.0 100.0 100.0 99.9 99.7 99.5 95.7 99.54 1.369 Normalized +0.555 +0.482 +0.409 +0.336 +0.336 +0.336 +0.263 +0.117 ?0.029 ?2.805 n = 9 Data 100.3 100.2 100.1 100.0 100.0 100.0 99.9 99.7 99.5 99.95 0.245 Normalized +1.361 +0.953 +0.544 +0.136 +0.136 +0.136 ?0.272 ?1.089 ?1.905

The difference between each value and the mean is termed the residual. Other Studentized residual outlier tests exist where the residual, instead of being divided by the standard deviation, can be divided by the standard deviation times the square root of n ? 1 divided by n.

ÿ¸öÊýÖµÓë¾ùÖµµÄ²î±»³ÆÎª²Ð²î¡£µ±²»Ê¹Óñê×¼²î×÷Ϊ³ýÊý£¬¶ø²ÉÓñê׼ƫ²îÓën-1ÓënÉÌµÄÆ½·½¸ùʱ£¬Ò²¿ÉÒÔʹÓÃÆäËûµÄѧÉú»¯²Ð²îÒì³£Öµ¼ì²â·½·¨¡£

Take the absolute value of these results, select the maximum value (R1 = 2.805), and compare it to a previously specified tabled critical value l1 (2.290) based on the selected significance level (for example, 5%). The maximum value is larger than the tabled value and is identified as being inconsistent with the remaining data. Sources for l-values are included in many statistical textbooks. Caution should be exercised when using any statistical table to ensure that the correct (i.e., level of acceptable error) are used when extracting table values.

´ÓÕâЩ½á¹ûµÄ¾ø¶ÔÖµÖÐÑ¡³ö×î´óÖµ(R1 = 2.805)£¬È»ºó±È½ÏÆäÓëÔ¤Ïȹ涨µÄÔÚÑ¡¶¨ÏÔÖøÐÔˮƽ£¨È磬5%£©ÏµIJé±íÁÙ½çÖµ£¨tabled critical value£© l1 (2.290)¡£Èç¹û´óÓÚ²é±íÁÙ½çÖµ£¬×î´óÖµ¾Í±»Ê¶±ðΪÓëÆäËûÖµ²»Ò»Ö¡£lÖµ¿ÉÒÔ´ÓÐí¶àͳ¼Æ½Ì²ÄÖвéÔÄ¡£Ê¹ÓÃÈκÎͳ¼Æ±íʱӦעÒâËùÌáÈ¡²é±íÖµµÄÕýÈ·±êʶ£¨È磬¿É½ÓÊÜÎó²îµÄˮƽ£©¡£ Stage 2 (n = 9)¡ªRemove the observation corresponding to the maximum absolute normalized result from the original data set, so that n is now 9. Again, find the mean and standard deviation (Table 3, right two columns), normalize each value, and take the absolute value of these results. Find the maximum of the absolute values of the 9 normalized results (R2 = 1.905), and compare it to l2 (2.215). The maximum value is not larger than the tabled value. ½×¶Î2(n = 9)¡ª´ÓԭʼÊý¾Ý×éÖÐÈ¥³ý×î´ó¾ø¶ÔÕý̬½á¹û¶ÔÓ¦µÄ¹Û²âÖµ£¬ÏÖÔÚnֵΪ9¡£Ôٴε쬼ÆËã³ö¾ùÖµºÍ±ê׼ƫ²î£¨±í3£¬ÓÒ²àÁ½ÁУ©£¬½«Ã¿¸öÖµÕý̬»¯²¢½«½á¹ûÈ¡¾ø¶ÔÖµ¡£ÕÒ³ö9¸öÕý̬½á¹ûÖоø¶ÔÖµµÄ×î´óÖµ(R2 = 1.905)£¬½«ÆäÓël2 (2.215)½øÐбȽϡ£×î´óÖµ²»´óÓÚ²é±íÖµ¡£

Conclusion¡ªThe result from the first stage, 95.7, is declared to be an outlier, but the result from the second stage, 99.5, is not an outlier.

½áÂÛ¡ªÔÚµÚÒ»¸ö½×¶ÎÖУ¬95.7±»Ê¶±ðΪÒì³£Öµ£¬µ«ÔÚµÚ¶þ¸ö½×¶ÎÖУ¬99.5²»ÊÇÒì³£Öµ¡£

Dixon-Type Tests µÒ¿ËÑ·¼ìÑ飨Dixon¼ìÑ飩

Dixon's Test can be one-sided or two-sided, depending on an a priori decision as to whether outliers will be

considered on one side only. As with the ESD Test, Dixon's Test assumes that the data, in the absence of outliers, come from a single normal population. Following the strategy used for the ESD Test, we proceed as if there were no a priori decision as to side, and so use a two-sided Dixon's Test. From examination of the example data, we see that it is the two smallest that are to be tested as outliers. Dixon provides for testing for two outliers simultaneously; however, these procedures are beyond the scope of this Appendix. The stepwise procedure discussed below is not an exact procedure for testing for the second outlier, because the result of the second test is conditional upon the first. And because the sample size is also reduced in the second stage, the end result is a procedure that usually lacks the sensitivity of Dixon's exact procedures.

µÒ¿ËÑ·¼ìÑé¿ÉÒÔÊǵ¥²àÒ²¿ÉÒÔÊÇË«²àµÄ£¬È¡¾öÓÚÒì³£ÖµÊÇ·ñ½ö±»ÈÏΪÀ´×Ôµ¥²àµÄÔ¤ÏȾö²ß¡£ÓëESD¼ìÑéÒ»Ñù£¬µÒ¿ËÑ·¼ìÑé¼Ù¶¨È¥³ýÒì³£ÖµºóµÄÊý¾ÝÊÇÀ´×ÔÓÚÒ»¸öµ¥Ò»µÄÕý̬×ÜÌ塣ʹÓÃESD¼ìÑéÖÐÓõ½µÄ²ßÂÔ£¬¼ÙÈçûÓжԵ¥²àµÄÔ¤ÏȾö²ß£¬ÎÒÃÇʹÓÃË«²àµÄµÒ¿ËÑ·¼ìÑ顣ͨ¹ý¹Û²â¾ÙÀýÊý¾Ý£¬ÎÒÃÇ·¢ÏÖËüÓÐÁ½¸ö×îСֵÐèÒª½øÐÐÒì³£Öµ¼ì²â¡£µÒ¿ËÑ·¼ìÑé¿ÉÒÔͬʱ¶ÔÁ½¸öÒì³£Öµ½øÐмìÑ飬Ȼ¶ø£¬ÕâÒ»¹ý³Ì³¬³öÁ˱¾¸½Â¼µÄ·¶Î§¡£ÏÂÃæÌÖÂ۵ķֲ½¹ý³Ì

²¢²»ÊǼìÑéµÚ¶þ¸öÒì³£ÖµµÄʵ¼Ê¹ý³Ì£¬ÒòΪµÚ¶þ´Î¼ìÑéµÄ½á¹ûÊÇ»ùÓÚµÚÒ»´ÎµÄÌõ¼þÖ®ÉÏ¡£ÁíÍâÒ²ÒòΪÔÚµÚ¶þ½×¶ÎÑù±¾Á¿ÊǼõÉÙÁ˵Ľü¹ý³Ì¿ÉÒÔ´ÓÒ»¸öÕý̬·Ö²¼µÄ×ÜÌåµ±Öз¢ÏÖÔ¤ÏÈÉ趨£¬×îÖÕ½á¹ûÓ¦Óõķ½·¨Í¨¹ý±ÈµÒ¿ËÑ·¼ìÑéʵ¼Ê·½·¨È±ÉÙÁËÁéÃô¶È¡£

Stage 1 (n = 10)¡ªThe results are ordered on the basis of their magnitude (i.e., Xn is the largest observation, Xn? 1 is the second largest, etc., and X1 is the smallest observation). Dixon's Test has different ratios based on the sample size (in this example, with n = 10), and to declare X1 an outlier, the following ratio, r11, is calculated by the formula: ½×¶Î1(n = 10)¡ª»ùÓÚÿ¸öÊýÖµµÄ´óС½øÐÐÅÅÐò£¨È磬XnÊÇ×î´óµÄ¹Û²âÖµ£¬Xn-1Êǵڶþ´óµÄÖµ£¬ÒÔ´ËÀàÍÆ£¬X1ÊÇ×îСµÄ¹Û²âÖµ£©¡£¸ù¾ÝÑù±¾Á¿£¨±¾ÀýÖÐn = 10£©£¬µÒ¿ËÑ·¼ìÑéÓв»Í¬µÄ±ÈÖµ£¬ÎªÁ˼ìÑéX1ÊÇÒì³£Öµ£¬¸ù¾ÝÏÂÁй«Ê½¼ÆËã±ÈÖµr11¡£

A different ratio would be employed if the largest data point was tested as an outlier. The r11 result is compared to an r11, 0.05 value in a table of critical values. If r11 is greater than r11, 0.05, then it is declared an outlier. For the above set of data, r11 = (99.5 ? 95.7)/(100.2 ? 95.7) = 0.84. This ratio is greater than r11, 0.05, which is 0.52979 at the 5% significance level for a two-sided Dixon's Test. Sources for r11, 0.05 values are included in many statistical textbooks5.

Èç¹ûÐèÒª¶Ô×î´óÖµ½øÐмìÑ飬¼ÆË㲻ͬµÄ±ÈÖµ¡£r11µÄ½á¹û»áÓëÁÙ½çÖµ±íÖеÄr11,0.05½øÐбȽϡ£Èç¹ûr11´óÓÚr11,0.05£¬ÄÇôËü¾Í±»Ê¶±ðΪÒì³£Öµ¡£¶ÔÓÚÉÏÊöÊý¾Ý×飬r11 = (99.5-95.7)/(100.2-95.7) = 0.84¡£Õâ¸ö±ÈÖµ³¬¹ýÁËr11,0.05£¬ÔÚ5%ÏÔÖøÐÔˮƽÏÂË«²àµÒ¿ËÑ·¼ìÑér11,0.05µÄֵΪ0.52979¡£r11,0.05Öµ¿ÉÒÔ´ÓÐí¶àͳ¼Æ½Ì²ÄÖвéÔÄ5¡£

Stage 2¡ªRemove the smallest observation from the original data set, so that n is now 9. The same r11 equation is used, but a new critical r11, 0.05value for n = 9 is needed (r11, 0.05 = 0.56420). Now r11 = (99.7 ? 99.5)/(100.2 ? 99.5) = 0.29, which is less than r11, 0.05 and not significant at the 5% level.

½×¶Î2 ¡ª´ÓԭʼÊý¾Ý×éÖÐÈ¥³ý×î´ó¾ø¶ÔÕý̬½á¹û¶ÔÓ¦µÄ¹Û²âÖµ£¬ÏÖÔÚnֵΪ9¡£Ê¹ÓÃͬÑùµÄ¹«Ê½¼ÆËãr11£¬µ«ÒòΪn = 9£¬ÐèҪеÄÁÙ½çÖµr11,0.05(r11, 0.05 = 0.56420)¡£ÏÖÔÚr11= (99.7-99.5)/(100.2-99.5) = 0.29£¬Ð¡ÓÚr11,0.05£¬Òò´ËÔÚ5%ˮƽÉÏûÓÐÏÔÖøÐÔ¡£

Conclusion¡ªTherefore, 95.7 is declared to be an outlier but 99.5 is not an outlier. ½áÂÛ¡ªÒò´Ë£¬95.7±»Ê¶±ðΪÒì³£Öµ£¬¶ø99.5²»ÊÇÒì³£Öµ¡£

Hampel's Rule Hampel¹æÔò

Step 1¡ªThe first step in applying Hampel's Rule is to normalize the data. However, instead of subtracting the mean from each data point and dividing the difference by the standard deviation, the median is subtracted from each data

5

The critical values for r in this example are taken from Reference 2 in Appendix G, Outlier Tests. ±¾ÀýÖÐÁÙ½çÖµrÀ´×ÔÓÚ¸½Â¼GµÄ²Î¿¼ÎÄÏ×2£¬Òì³£Öµ¼ìÑé¡£