Most orthopaedic
surgeons are familiar with the basic concepts of specificity and sensitivity
and use them to determine the usefulness of specific diagnostic tests
in clinical practice. On the other hand, the more clinically applicable
concepts of positive and negative predictive values are probably under
utilized. In addition, many diagnostic tests used in orthopaedics are
assigned a grade rather than a simple positive or negative result. To
measure the sensitivity and specificity of a graded test, one must first
select a threshold grade above which the result will be considered positive
and below which it will be considered negative. Such thresholds need
not be arbitrary - the optimum cutoff point can be determined by choosing
a cutoff that optimizes the combination of sensitivity and specificity.
The ability of a graded test to discriminate between two possible results
can be evaluated using a receiver operating characteristic (ROC) curve.
This paper
will review the calculation and use of these tools using hypothetical
data regarding Magnetic Resonance Imaging (MRI) and Lachman's test in
patients in whom the status of the anterior cruciate ligament (ACL)
has been determined definitively using arthroscopy.
Dichotomous
Diagnostic Tests
The simplest
diagnostic test is one where the results of a test, such as x-ray or
MRI, are used to classify patients into two groups according to the
presence or absence of an injury or disease. Tests with only two possible
outcomes are known as dichotomous tests. Table 1
shows the MRI and arthroscopy data from a hypothetical example. The
question that arises in the clinical setting is, "How good is MRI
at distinguishing torn and intact ACL's?" In other words, "To
what degree can I rely on the interpretation of MRI in making judgements
about the status of a patient's knee?Ó
Arthroscopic
Findings |
MRI
|
Torn ACL (+) |
Intact ACL (-) |
Total knees |
|
Abnormal (+)
|
394 |
32 |
426 |
|
Normal (-)
|
27 |
101 |
128 |
|
Total knees
|
421 |
133 |
554 |
Table 1. Hypothetical Example
Sensitivity and Specificity
One method
of measuring the value of MRI in detecting ACL tears is to calculate
the proportion of torn and intact ACL's that were correctly classified
by MRI. These proportions are known as the sensitivity and specificity
of a test, respectively.
Sensitivity is calculated as the proportion of torn ACL's that were
correctly classified by MRI. In this example, of the 421 knees with
ACL tears, 394 were correctly evaluated. The sensitivity of MRI in the
detection of ACL tears is therefore 94% (Sensitivity = 394/421 = 0.94).
In other words, 94% of ACL tears were correctly classified as torn using
MRI.
Specificity
is calculated as the proportion of intact ACL's that were correctly
classified by MRI. Of the 133 knees with an intact ACL, 101 were correctly
classified. The specificity of MRI in the evaluation of the presence
or absence of an ACL tear is therefore 76% (Specificity = 101/133 =
0.76). This means that 76% of intact ACL's were correctly classified
as intact by MRI.
Positive and Negative Predictive Values
Sensitivity
and specificity tell us the rate of true positive and true negative
results, but they do not measure how well MRI predicts injury to the
ACL. What we want is an estimate of the probability that the MRI will
give the correct result. Such estimates are provided by the positive
and negative predictive values.
Positive
predictive value (PPV) is the probability that an ACL interpreted as
torn on MRI is actually torn. Negative predictive value (NPV) is the
probability that an ACL interpreted as intact on MRI is actually intact.
Calculations of the PPV and NPV require an estimate of the prevalence
of an ACL tear in the clinical population. This reflects the fact that,
the rarer an injury is, the more confident we can be that a normal MR
correctly indicates absence of injury and the less confident that an
abnormal MR indicates presence of injury. If the prevalence of the injury
is low, the PPV will be low even if both the sensitivity and specificity
are high. In other words, there is a greater potential for false positive
tests when evaluating uncommon injuries, even when the rate of true
positives and true negatives is high.
|