HOJ HOME | Chiefs Reports | Osgood Day | Cartilage Regeneration and Repair, Where Are We?
A Harvard Orthopaedic Presence in China
|
Scientific Articles | Alumni

click here to view full page ad

The Evaluation of Orthopaedic Diagnostic Tests

David Zurakowski, PhD • James Di Canzio, MS

Department of Orthopaedic Surgery and Biostatistics • The Children's Hospital

          Most orthopaedic surgeons are familiar with the basic concepts of specificity and sensitivity and use them to determine the usefulness of specific diagnostic tests in clinical practice. On the other hand, the more clinically applicable concepts of positive and negative predictive values are probably under utilized. In addition, many diagnostic tests used in orthopaedics are assigned a grade rather than a simple positive or negative result. To measure the sensitivity and specificity of a graded test, one must first select a threshold grade above which the result will be considered positive and below which it will be considered negative. Such thresholds need not be arbitrary - the optimum cutoff point can be determined by choosing a cutoff that optimizes the combination of sensitivity and specificity. The ability of a graded test to discriminate between two possible results can be evaluated using a receiver operating characteristic (ROC) curve.

          This paper will review the calculation and use of these tools using hypothetical data regarding Magnetic Resonance Imaging (MRI) and Lachman's test in patients in whom the status of the anterior cruciate ligament (ACL) has been determined definitively using arthroscopy.

Dichotomous Diagnostic Tests

           The simplest diagnostic test is one where the results of a test, such as x-ray or MRI, are used to classify patients into two groups according to the presence or absence of an injury or disease. Tests with only two possible outcomes are known as dichotomous tests. Table 1 shows the MRI and arthroscopy data from a hypothetical example. The question that arises in the clinical setting is, "How good is MRI at distinguishing torn and intact ACL's?" In other words, "To what degree can I rely on the interpretation of MRI in making judgements about the status of a patient's knee?Ó

Arthroscopic Findings

MRI

Torn ACL (+) Intact ACL (-) Total knees

Abnormal (+)

394 32 426
 

Normal (-)

27 101 128

Total knees

421 133 554
Table 1. Hypothetical Example

 

Sensitivity and Specificity

           One method of measuring the value of MRI in detecting ACL tears is to calculate the proportion of torn and intact ACL's that were correctly classified by MRI. These proportions are known as the sensitivity and specificity of a test, respectively.

Sensitivity is calculated as the proportion of torn ACL's that were correctly classified by MRI. In this example, of the 421 knees with ACL tears, 394 were correctly evaluated. The sensitivity of MRI in the detection of ACL tears is therefore 94% (Sensitivity = 394/421 = 0.94). In other words, 94% of ACL tears were correctly classified as torn using MRI.

          Specificity is calculated as the proportion of intact ACL's that were correctly classified by MRI. Of the 133 knees with an intact ACL, 101 were correctly classified. The specificity of MRI in the evaluation of the presence or absence of an ACL tear is therefore 76% (Specificity = 101/133 = 0.76). This means that 76% of intact ACL's were correctly classified as intact by MRI.

Positive and Negative Predictive Values

          Sensitivity and specificity tell us the rate of true positive and true negative results, but they do not measure how well MRI predicts injury to the ACL. What we want is an estimate of the probability that the MRI will give the correct result. Such estimates are provided by the positive and negative predictive values.

          Positive predictive value (PPV) is the probability that an ACL interpreted as torn on MRI is actually torn. Negative predictive value (NPV) is the probability that an ACL interpreted as intact on MRI is actually intact. Calculations of the PPV and NPV require an estimate of the prevalence of an ACL tear in the clinical population. This reflects the fact that, the rarer an injury is, the more confident we can be that a normal MR correctly indicates absence of injury and the less confident that an abnormal MR indicates presence of injury. If the prevalence of the injury is low, the PPV will be low even if both the sensitivity and specificity are high. In other words, there is a greater potential for false positive tests when evaluating uncommon injuries, even when the rate of true positives and true negatives is high.

 The PPV and NPV can be calculated using two general equations that utilize Bayes' theorem:1

Assuming that the prevalence of an ACL tear among patients presenting with knee pain is 0.20 or 20%, the PPV and NPV are:


           Among knees with abnormal MR results, we expect 49% to actually have an ACL tear. Among knees with normal MR results, we expect 98% to have intact ACL's. Recall that PPV decreases as prevalence decreases. In this example, if the prevalence of an ACL tear is 5%, the PPV is only 17%. Therefore, in this hypothetical example in which MRI demonstrates relatively high sensitivity and specificity, the PPV will be low if the injury is relatively rare.


NEXT PAGE | TOP OF PAGE | HOJ HOME
Chiefs Reports | Osgood Day | Cartilage Regeneration and Repair, Where Are We?
A Harvard Orthopaedic Presence in China
|
Scientific Articles | Alumni

 

 

Graded Tests

           In orthopaedics, many diagnostic tests do not give a clear-cut positive or negative result. Many tests yield a grade or a score and the orthopaedist must then decide upon a cutoff value to classify the result as indicating either the presence or absence of a condition. For example, suppose all 554 hypothetical patients underwent a clinical examination that included the Lachman test. Each patient would be assigned a grade ranging from 0 to 3: grade 0 represents no anterior tibial translation; grade 1 represents 0 to 5 mm translation difference with a firm endpoint when compared to the opposite uninjured knee; grade 2 represents 5 to 10 mm translational difference with a soft endpoint; and grade 3 is greater than 10 mm of translation with no endpoint.2 The hypothetical distribution of Lachman grades is shown in Table 2.

           If the orthopaedist considered all grades of 1 or higher as indicative of an ACL tear, the sensitivity would be 406/421 = 96% and the specificity would be 77/133 = 58%. If the cutoff value were raised to 2, the sensitivity and specificity would be 80% and 95%, respectively. Likewise, a cutoff of 3 would result in a sensitivity of 35% and a specificity of 100%. Note that as the threshold changes, sensitivity and specificity respond in opposite directions from each other. While taking into consideration the relative consequences of false negative and false positive results, the orthopaedist must choose a cutoff that produces an acceptable combination of sensitivity and specificity. A test with good diagnostic performance is one that has a cutoff value at which both sensitivity and specificity are reasonably high.

Arthroscopic Findings

      Lachman Grade

Torn ACL (+) Intact ACL (-)

      0

15 77

      1

68 49

      2

192 7

      3

146 0

      Total knees

421 133
Table 2. Lachman Test Data

Receiver Operating Characteristic Curves

           The ability of a graded test to discriminate between results (torn vs. intact ACL's) can be measured by a receiver operating characteristic (ROC) curve.3 An ROC graph shows the relationship between sensitivity (y-axis) and 100 - specificity (x-axis) plotted at each possible cutoff. In other words, the ROC curve describes the test's performance as the relationship between the true-positive rate and the false-positive rate. If a test discriminates well, its ROC curve rapidly approaches a true-positive rate or sensitivity of 100%. On the other hand, a test that discriminates poorly has a diagonal ROC curve. Diagnostic performance is evaluated by the area under the ROC curve.4 The steeper the curve the greater the area and the better the discrimination of the test. In the case of perfect discrimination, the area under the curve will equal 1.0, while an area of 0.5 indicates discrimination equivalent to a coin toss or random guessing. Area under the ROC curve can be determined by commercially available software packages.5 The hypothetical data on the Lachman test generated a steep ROC curve (Figure 1). The area under the curve is 0.92, indicating that the Lachman test is very good at discriminating between torn and normal ACL's.

Figure 1. ROC curve generated using hypothetical Lachman test data. The area under the curve is a measure of the diagnostic performance of the Lachman test to discriminate between torn and intact ACL's.

NEXT PAGE | TOP OF PAGE | HOJ HOME
Chiefs Reports | Osgood Day | Cartilage Regeneration and Repair, Where Are We?
A Harvard Orthopaedic Presence in China
|
Scientific Articles | Alumni

 

 

Conclusions

          Statistical evaluations of diagnostic tests reflect clinical judgements made routinely in orthopaedic practice. Researchers should be encouraged to report these statistics, and clinicians should understand and utilize them in their decision making.

David Zurakowski, Ph.D. is Principal Statistician at The Children's Hospital
James Di Canzio, M.S. is a Biostatistician at The Children's Hospital

Address correspondence to:
David Zurakowski, PhD; The Children's Hospital; 300 Longwood Avenue; Boston, MA 02115

References
1. Lurie JD, Sox HC. Spine update. Principles of medical decision making. Spine 1999;24:493-8.
2. Liu SH, Osti L, Henry M, Bocchi L. The diagnosis of acute complete tears of the anterior cruciate ligament. J Bone Joint Surg 1995;77B:586-8.
3. Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-98.
4. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36.
5. Kairisto V, Poola A. Software for illustrative presentation of basic clinical characteristics of laboratory tests. GraphROC for Windows. Scand J Clin Lab Invest 1995;55(Suppl 222):43-60.

TOP OF PAGE | HOJ HOME
Chiefs Reports | Osgood Day | Cartilage Regeneration and Repair, Where Are We?
A Harvard Orthopaedic Presence in China
|
Scientific Articles | Alumni