1010 USP39-NF34 ANALYTICAL DATA INTERPRETATION AND TREATMENT (ÖÐÓ¢ÎÄ) ÏÂÔØ±¾ÎÄ

or testing of the product were noted. Even if such statistical tests indicate that one or more values are outliers, they should still be retained in the record. Including or excluding outliers in calculations to assess conformance to acceptance criteria should be based on scientific judgment and the internal policies of the manufacturer. It is often useful to perform the calculations with and without the outliers to evaluate their impact.

×ÜÖ®£¬¾Ü¾ø»òÕß±£ÁôÒ»¸öÃ÷ÏÔµÄÒì³£Öµ¶¼»áµ¼ÖÂÃ÷ÏÔÆ«ÒС££¨Òì³£Öµ£©¼ìÑ飨·½·¨£©µÄÌØÐÔÒÔ¼°¶ÔÉú²ú¹ý³ÌºÍ·ÖÎö·½·¨µÄ¿ÆÑ§Àí½â¶¼±ØÐëÔÚÈ·¶¨Õâ¸öÒì³£ÖµµÄÀ´Ô´Ê±ÓèÒÔ¿¼ÂÇ¡£Ò»¸öÒì³£ÖµµÄ¼ìÑéÓÀÔ¶²»ÄÜ´úÌæÈ«ÃæµÄʵÑéÊÒµ÷²é·ÖÎö¡£Êµ¼ÊÉÏ£¬Ö»ÓÐÔÚµ÷²é·ÖÎöÖÐÎÞ·¨ÕÒ³öÈ·ÇÐÔ­Òò£¬Ò²Ã»Óз¢ÏÖÔÚ²úÆ·Éú²úºÍ¼ì²âÖдæÔÚÆ«Àëʱ²ÅÄÜʹÓÃÒì³£Öµ¼ìÑé¡£¼´Ê¹ÕâÑùµÄͳ¼ÆÑ§¼ìÑéÏÔʾÓÐÒ»¸ö»òÕß¶à¸öÊý¾ÝÊÇÒì³£Öµ£¬Ò²ÈÔÒª½«ËüÃDZ£ÁôÔÚԭʼ¼Ç¼ÖС£ÔÚÆÀ¹À±ê×¼·ûºÏÐԵļÆËã¹ý³ÌÖУ¬±£Áô»òÅųýÕâЩÒì³£Öµ¶¼Ó¦¸Ã»ùÓÚ¿ÆÑ§ÅжϺÍÉú²úÉÌÄÚ²¿Õþ²ß¡£ÔÚʱ£¬Ê¹Óðüº¬Òì³£ÖµºÍ²»°üº¬Òì³£Öµ·Ö±ð¼ÆËãµÄ·½·¨¶ÔÓÚÆÀ¼ÛÒì³£ÖµµÄÓ°Ïìͨ³£ÊÇÓÐÓõġ£

Outliers that are attributed to measurement process mistakes should be reported (i.e., footnoted), but not included in further statistical calculations. When assessing conformance to a particular acceptance criterion, it is important to define whether the reportable result (the result that is compared to the limits) is an average value, an individual measurement, or something else. If, for example, the acceptance criterion was derived for an average, then it would not be statistically appropriate to require individual measurements to also satisfy the criterion because the variability associated with the average of a series of measurements is smaller than that of any individual measurement.

¶ÔÓÚÄÇЩ²âÁ¿¹ý³Ì´íÎóµ¼ÖµÄÒì³£Öµ¶¼ÐèÒª½øÐмǼ£¨ÈçʹÓýÅ×¢£©£¬µ«ÊDz»Óý«Æä°üº¬ÔÚ½ÓÏÂÀ´µÄ¼ÆËãÖС£µ±ÆÀ¼ÛÊÇ·ñ·ûºÏÄ³Ò»ÌØ¶¨½ÓÊܱê׼ʱ£¬·Ç³£ÖØÒªµÄÒ»¼þÊÂÊÇÈ·¶¨Ð豨¸æµÄ½á¹û£¨¼´ÓëÏÞÖµ±È½ÏµÄ½á¹û£©ÊǾùÖµ¡¢µ¥´Î²âÁ¿Öµ£¬»¹ÊÇÆäËûµÄÖµ¡£±ÈÈ磬Èç¹û½ÓÊܱê×¼ÊÇÀ´×ÔÓÚ¾ùÖµ£¬ÄÇôҪÇ󵥸ö²âÁ¿ÖµÒ²Âú×ãÕâ¸ö±ê×¼ÔÚͳ¼ÆÑ§ÒâÒåÉϾÍÊDz»Êʵ±µÄ£¬ÒòΪһϵÁвâÁ¿¾ùÖµµÄ±äÒìÐÔҪСÓÚÈκÎÒ»¸öµ¥¶À²âÁ¿ÖµµÄ±äÒìÐÔ¡£

COMPARISON OF ANALYTICAL PROCEDURES

·ÖÎö·½·¨µÄ±È½Ï

It is often necessary to compare two procedures to determine if their average results or their variabilities differ by an amount that is deemed important. The goal of a procedure comparison experiment is to generate adequate data to evaluate the equivalency of the two procedures over a range of values. Some of the considerations to be made when performing such comparisons are discussed in this section.

ÎÒÃǾ­³£ÐèÒª±È½ÏÁ½ÖÖ£¨·ÖÎö£©·½·¨ÒÔÈ·¶¨ËüÃÇµÄÆ½¾ù½á¹û»ò±äÒìÐÔÊÇ·ñ´æÔÚÖØÒª²îÒì¡£·½·¨±È½ÏʵÑéµÄÄ¿µÄÊÇ»ñµÃ×ã¹»µÄÊý¾Ý£¬ÒÔ±ãÆÀ¼ÛÔÚÒ»¶¨·¶Î§ÄÚÁ½ÖÖ·½·¨µÄµÈЧÐÔ¡£ÏÂÃæµÄÄÚÈݸø³öÁËÔÚ½øÐÐÕâÖֱȽÏʱӦ¸Ã×ö³öµÄ¿¼ÂÇ¡£

Precision ¾«ÃܶÈ

Precision is the degree of agreement among individual test results when the analytical procedure is applied repeatedly to a homogeneous sample. For an alternative procedure to be considered to have ¨Dcomparable¡¬ precision to

that of a current procedure, its precision (see Analytical Performance Characteristics in <1225>, Validation) must not be worse than that of the current procedure by an amount deemed important. A decrease in precision (or increase in variability) can lead to an increase in the number of results expected to fail required specifications. On the other hand, an alternative procedure providing improved precision is acceptable.

¾«ÃܶÈÊÇָʹÓ÷ÖÎö·½·¨¶Ô¾ùÖÊÑù±¾½øÐÐÖØ¸´²â¶¨Ê±£¬¸÷ʵÑé½á¹ûÒ»Öµij̶ȡ£ÒòΪһ¸öÌæ´ú·½·¨Ó¦µ±±»ÈÏΪ¾ßÓÐÓëÏÖÐз½·¨¡°ÏàËÆ¡±µÄ¾«Ãܶȣ¬Æä¾«Ãܶȣ¨²Î¼û<1225>ÖзÖÎöÐÔÄÜÊôÐÔ£¬È·ÈÏ£©ÓëÏÖÓз½·¨Ïà±È±ØÐë²»ÄÜ´æÔÚÃ÷ÏԵIJîÒì¡£¾«ÃܶȵÄϽµ£¨»òÕß˵±äÒìµÄÔö´ó£©¿Éµ¼Ö²»·ûºÏ¹æ¶¨ÖÊÁ¿±ê×¼µÄʵÑé½á¹ûÊýÁ¿Ôö¼Ó¡£ÁíÒ»·½Ã棬ÌåÏÖ³ö¸ü¼Ñ¾«ÃܶȵÄÌæ´ú·½·¨ÊÇ¿ÉÒÔ½ÓÊܵġ£

One way of comparing the precision of two procedures is by estimating the variance for each procedure (the sample variance, s, is the square of the sample standard deviation) and calculating a one-sided upper confidence interval for the ratio of (true) variances, where the ratio is defined as the variance of the alternative procedure to that of the current procedure. An example, with this assumption, is outlined in Appendix D. The one-sided upper confidence limit should be compared to an upper limit deemed acceptable, a priori, by the analytical laboratory. If the one-sided upper confidence limit is less than this upper acceptable limit, then the precision of the alternative procedure is considered acceptable in the sense that the use of the alternative procedure will not lead to an important loss in precision. Note that if the one-sided upper confidence limit is less than one, then the alternative procedure has been shown to have improved precision relative to the current procedure.

±È½ÏÁ½ÖÖ·½·¨¾«ÃܶȵÄÒ»ÖÖ·½Ê½ÊÇͨ¹ýÆÀ¼ÛÿÖÖ·½·¨µÄ·½²î£¨Ñù±¾·½²îs2¼´ÊÇÑù±¾±ê׼ƫ²îµÄƽ·½£©£¬²¢¼ÆËãÌæ´ú·½·¨ÓëÏÖÓ÷½·¨µÄ£¨Õ棩·½²î±ÈÖµµÄµ¥²àÖÃÐÅÉÏÏÞ£¨one-sided upper confidence limit£©¡£¸½Â¼D¸ø³öÁËÕâÖÖ¼ÙÉèµÄÒ»¸ö¾ßÌåʵÀý¡£ÀíËùµ±È»µÄ£¬¸Ãµ¥²àÖÃÐÅÉÏÏÞÓ¦¸ÃÓë·ÖÎöʵÑéÊÒÈ·¶¨µÄ¿É½ÓÊÜÉÏÏÞ½øÐбȽϡ£Èç¹ûËù¼ÆËãµÄµ¥²àÖÃÐÅÉÏÏÞµÍÓڿɽÓÊÜÉÏÏÞ£¬¸ÃÌæ´ú·½·¨µÄ¾«ÃܶȾͱ»ÈÏΪ¿ÉÒÔ½ÓÊÜ£¬¼´ÈÏΪʹÓøÃÌæ´ú·½·¨²»»áµ¼ÖÂÖØÒªµÄ¾«ÃܶÈËðʧ¡£Ó¦¸Ã×¢ÒâµÄÊÇ£¬Èç¹û¼ÆËãËùµÃµÄµ¥²àÖÃÐÅÉÏÏÞСÓÚ1£¬ÄÇÃ´Ìæ´ú·½·¨ÒѾ­ÏÔʾ³ö±ÈԭʹÓ÷½·¨µÄ¾«ÃܶȸߵĽáÂÛ¡£

The confidence interval method just described is preferred to applying the two-sample F-test to test the statistical significance of the ratio of variances. To perform the two-sample F-test, the calculated ratio of sample variances would be compared to a critical value based on tabulated values of the F distribution for the desired level of confidence and the number of degrees of freedom for each variance. Tables providing F-values are available in most standard statistical textbooks. If the calculated ratio exceeds this critical value, a statistically significant difference in precision is said to exist between the two procedures. However, if the calculated ratio is less than the critical value, this does not prove that the procedures have the same or equivalent level of precision; but rather that there was not enough evidence to prove that a statistically significant difference did, in fact, exist.

ÉÏÊöÖÃÐÅÇø¼äµÄ·½·¨ÌرðÊʺÏÓÃÓÚÁ½Ñù±¾µÄF¼ìÑéÀ´ÅжϷ½²î±ÈÖµµÄͳ¼ÆÑ§ÏÔÖøÐÔ²îÒì¡£Òª½øÐÐÁ½Ñù±¾µÄF¼ìÑ飬ÐèÒª½«Ñù±¾·½²î±ÈÓëÁÙ½çÖµ½øÐбȽϣ¬ÁÙ½çÖµ¿ÉÒÔ¸ù¾ÝÔ¤ÆÚµÄÖÃÐŶȺÍÿ¸ö·½²îµÄ×ÔÓɶÈÔÚF·Ö²¼±íÖвé

2

³ö¡£´ó²¿·ÖµÄͳ¼ÆÊé¼®¶¼ÌṩÕâÑùµÄFÖµ±í¡£Èç¹ûËù¼ÆËãµÄ±ÈÖµ³¬¹ýÁÙ½çÖµ£¬ÔòÈÏΪÁ½ÖÖ·½·¨µÄ¾«ÃܶÈÔÚͳ¼ÆÑ§ÉÏ´æÔÚÏÔÖø²îÒì¡£µ«Èç¹ûËù¼ÆËãµÄ±ÈֵСÓÚÁÙ½çÖµ£¬²¢·ÇÖ¤Ã÷Á½ÖÖ·½·¨¾ßÓÐÏàͬ»òµÈЧˮƽµÄ¾«Ãܶȣ¬¶øÖ»ÄÜÈÏΪûÓÐ×ã¹»µÄÖ¤¾ÝÖ¤Ã÷Á½ÕßÖ®¼äÔÚͳ¼ÆÑ§ÉÏÓÐÏÔÖø²îÒì¡£

Accuracy ׼ȷ¶È

Comparison of the accuracy (see Analytical Performance Characteristics in <1225>, Validation) of procedures provides information useful in determining if the new procedure is equivalent, on the average, to the current procedure. A simple method for making this comparison is by calculating a confidence interval for the difference in true means, where the difference is estimated by the sample mean of the alternative procedure minus that of the current procedure. Ò»°ãÈÏΪ£¬·½·¨¼ä׼ȷ¶È£¨²Î¼û<1225>ÖзÖÎöÐÔÄÜÊôÐÔ£¬È·ÈÏ£©µÄ±È½Ï£¬ÔÚÈ·¶¨Ð·½·¨ÔÚÆ½¾ùˮƽÉÏÊÇ·ñÓëÏÖÓз½·¨µÈЧ·½Ãæ¿ÉÌṩ·Ç³£ÓÐÓõÄÐÅÏ¢¡£Ò»¸ö½øÐбȽϵļòµ¥·½·¨¾ÍÊǼÆËãÕæÊµ¾ùÖµÖ®²îÒìµÄÖÃÐÅÇø¼ä£¬ÕâÀ¸Ã²îÒìÊÇͨ¹ýÌæ´ú·½·¨²âµÃ½á¹ûµÄ¾ùÖµ¼õÈ¥ÏÖÓ÷½·¨µÄ½á¹û¾ùÖµ½øÐÐÆÀ¹ÀµÄ¡£

The confidence interval should be compared to a lower and upper range deemed acceptable, a priori, by the laboratory. If the confidence interval falls entirely within this acceptable range, then the two procedures can be considered equivalent, in the sense that the average difference between them is not of practical concern. The lower and upper limits of the confidence interval only show how large the true difference between the two procedures may be, not whether this difference is considered tolerable. Such an assessment can be made only within the appropriate scientific context. This approach is often referred to as TOST (two one-sided tests; see Appendix F)

ÀíËùµ±È»µÄ£¬¼ÆËãËùµÃµÄÖÃÐÅÇø¼äÓ¦¸ÃÓëʵÑéÊÒÈ·¶¨µÄÖÃÐÅÉÏÏÞºÍÏÂÏÞ½øÐбȽϡ£Èç¹ûÖÃÐÅÇø¼äÍêÈ«ÂäÔÚÆäÈ·¶¨µÄ¿É½ÓÊÜÖÃÐÅÉÏÏÂÏÞÄÚ£¬ÄÇô¿ÉÒÔÈÏΪÁ½ÖÖ·½·¨ÊǵÈЧµÄ£»¼´ÈÏΪÁ½ÖÖ·½·¨µÄ¾ùֵûÓÐʵ¼Ê²îÒì¡£¸ÃÖÃÐÅÇø¼äµÄÉÏÏÂÏÞ½öÏÔʾÁ½ÖÖ·½·¨µÄÕæÖµ²îÒìÓжà´ó£¬¶ø²»ÊÇ˵Ã÷ÕâÖÖ²îÒìÊÇ·ñ¿ÉÒÔ±»ÈÝÈÌ¡£¶ÔÓÚÊÇ·ñ¿ÉÒÔÈÝÈÌÕâÖÖ²îÒìµÄÆÀ¹ÀÖ»ÓÐÔÚ¿ÆÑ§µÄ±³¾°Ï²ÅÄܽøÐС£ÕâÖÖ·½·¨Ò»°ã±»TOST£¨Ë«µ¥²à¼ìÑ飻²Î¼û¸½Â¼F£©¡£ The confidence interval method just described is preferred to the practice of applying a t-test to test the statistical significance of the difference in averages. One way to perform the t-test is to calculate the confidence interval and to examine whether or not it contains the value zero. The two procedures have a statistically significant difference in averages if the confidence interval excludes zero. A statistically significant difference may not be large enough to have practical importance to the laboratory because it may have arisen as a result of highly precise data or a larger sample size. On the other hand, it is possible that no statistically significant difference is found, which happens when the confidence interval includes zero, and yet an important practical difference cannot be ruled out. This might occur, for example, if the data are highly variable or the sample size is too small. Thus, while the outcome of the t-test indicates whether or not a statistically significant difference has been observed, it is not informative with regard to the presence or absence of a difference of practical importance.

ÉÏÊöÕâÖÖÖÃÐÅÇø¼äµÄ±È½Ï·½·¨ÌرðÊʺÏÓÚʹÓÃt-¼ìÑéÈ¥¼ì²âÁ½¾ùÖµ²îÒìµÄͳ¼ÆÏÔÖøÐÔÎÊÌâ¡£½øÐÐt¼ìÑéµÄÒ»ÖÖ

·½Ê½ÊÇÏȼÆËãÆäÖÃÐÅÇø¼ä£¬È»ºó¼ì²éÆäÊÇ·ñ°üº¬0Öµ¡£µ±¸ÃÖÃÐÅÇø¼ä²»°üÀ¨0ֵʱ£¬ËµÃ÷Á½·½·¨µÄ¾ùÖµ²îÓÐÏÔÖø²îÒì¡£µ«ÊÇ£¬Í³¼ÆÑ§ÉϵÄÏÔÖø²îÒì¶ÔÓÚʵÑéÊÒ²¢²»Ò»¶¨ÓжàÃ´ÖØÒªµÄʵ¼ÊÒâÒ壬ÒòΪ²îÒìµÄÔö´ó¿ÉÄÜÀ´×ÔÓڸ߾«ÃܶȵÄÊý¾Ý»òÕß´óÑù±¾Á¿µÄÊý¾Ý¡£ÁíÒ»·½Ã棬µ±ÖÃÐÅÇø¼ä°üÀ¨0ֵʱ£¬Ò²»á³öÏÖËäÈ»½á¹ûÏÔʾÁ½ÕßÎÞͳ¼ÆÏÔÖøÐÔ²îÒ죬µ«Ò²²¢²»ÄÜÅųý´æÔÚ¾ßÓÐÖØÒªÊµ¼ÊÒâÒåµÄ²îÒì¡£±ÈÈ磬µ±Êý¾Ý¾ßÓнϴó±äÒìÐÔ»òÕßÑù±¾Á¿Ì«Ð¡Ê±£¬ÕâÖÖÇé¿ö³£»á·¢Éú¡£ËùÒÔ£¬²»ÂÛt¼ìÑéµÄ½á¹ûÊÇ·ñÏÔʾÓÐÏÔÖøÐÔ²îÒ죬¶¼²»Äܳä·ÖÖ¤Ã÷ÊÇ·ñ´æÔÚÓÐʵ¼ÊÖØÒªÒâÒåµÄ²îÒì¡£

Determination of Sample Size

Ñù±¾Á¿¼ÆËã

Sample size determination is based on the comparison of the accuracy and precision of the two procedures3 and is similar to that for testing hypotheses about average differences in the former case and variance ratios in the latter case, but the meaning of some of the input is different. The first component to be specified is ¦Ä, the largest acceptable difference between the two procedures that, if achieved, still leads to the conclusion of equivalence. That is, if the two procedures differ by no more than ¦Ä, on the average, they are considered acceptably similar. The comparison can be two-sided as just expressed, considering a difference of ¦Ä in either direction, as would be used when comparing means. Alternatively, it can be one-sided as in the case of comparing variances where a decrease in variability is acceptable and equivalency is concluded if the ratio of the variances (new/current, as a proportion) is not more than 1.0 + ¦Ä. A researcher will need to state ¦Ä based on knowledge of the current procedure and/or its use, or it may be calculated. One consideration, when there are specifications to satisfy, is that the new procedure should not differ by so much from the current procedure as to risk generating out-of-specification results. One then chooses ¦Ä to have a low likelihood of this happening by, for example, comparing the distribution of data for the current procedure to the specification limits. This could be done graphically or by using a tolerance interval, an example of which is given in Appendix E. In general, the choice for ¦Ä must depend on the scientific requirements of the laboratory.

¸ù¾ÝÁ½ÖÖ·½·¨½øÐÐ׼ȷÐԺ;«ÃܶȱȽϵÄÐèÒªÀ´È·¶¨Ñù±¾Á¿3£¬ÔÚ׼ȷÐԱȽÏʱÑù±¾Á¿ÀàËÆÓÚ¾ùÖµ²îÒì¼ìÑé¼ÙÉèËùÐ裬ÔÚ¾«ÃܶȱȽÏʱÑù±¾Á¿ÀàËÆÓÚ·½²î²îÒì¼ìÑé¼ÙÉèËùÐ裬µ«ÊǼÆËãÑù±¾Á¿Ê±ËùÐèµÄһЩÊäÈë²ÎÁ¿µÄÒâÒåÊDz»Í¬µÄ¡£µÚÒ»¸öËùÐè²ÎÁ¿ÊǦģ¬Ëü´ú±íÁ½ÖÖ·½·¨×î´ó¿É½ÓÊܵIJîÒ죬Èç¹ûÂú×ãÌõ¼þ¾Í¿ÉÒÔ¸ø³öµÈЧÐÔ½áÂÛ¡£Èç¹ûÁ½ÖÖ·½·¨µÄ²îÒìСÓڦģ¬Ò»°ãÈÏΪÁ½ÕßµÈЧ¡£¿¼Âǵ½ÔÚÁ½¸ö·½ÏòÉϦĵIJîÒ죬·½·¨µÄ±È½Ï¿ÉÒÔÑ¡Ôñ¾ùÖµ±È½ÏʱËùʹÓõÄË«²à¼ìÑé¡£»òÕߣ¬Èç¹û¿ÉÒÔ½ÓÊܱäÒìÐÔ½µµÍ£¬ÔڱȽϷ½²îʱҲ¿ÉÒÔÑ¡Ôñµ¥²à±È½Ï£¬²¢ÇÒÈç¹û·½²î±ÈÖµ£¨Ð·½·¨·½²î/ÏÖÐз½·¨·½²îµÄ±ÈÖµ£©²»´óÓÚ1.0 +¦Ä£¬Ð·½·¨ºÍÏÖÐз½·¨¾Í±»ÈÏΪÊǵÈЧµÄ¡£Ñо¿ÈËÔ±ÐèÒª¸ù¾ÝÏÖÐз½·¨ºÍ/»òÆäÓ¦ÓõȵÄÏà¹ØÖªÊ¶À´¹æ¶¨¦ÄÖµ£¬»òÕß¼ÆËã¦ÄÖµ¡£µ±ºÏ¹æÐÔ¼ì²âʱ£¬ÆäÖеÄÒ»ÏÂǾÍÊÇз½·¨²»Ó¦ÓëÏÖÐз½·¨³öÏֽϴó²îÒ죬ÒÔµ¼Ö³öÏÖ³¬±ê½á¹û(OOS)µÄ·çÏÕ¡£ÕâʱÈËÃÇÓ¦¸Ãͨ¹ýÑ¡Ôñ¦ÄÖµÀ´½µµÍ·¢

3

In general, the sample size required to compare the precision of two procedures will be greater than that required to compare the accuracy of the procedures.

ͨ³£ÓÃÀ´Á½ÖÖ·½·¨¾«ÃܶȱȽÏËùÐèµÄÑù±¾Á¿Ó¦¸Ã´óÓÚ׼ȷ¶È±È½ÏËùÐè¡£