|
Clinical Epidemiology and Biostatistics: A Primer for Orthopaedic Surgeons Part 1
Mininder S. Kocher MD MPH, David Zurakowski PhD
CHILDREN'S HOSPITAL ORTHOPAEDIC INSTITUTE FOR CLINICAL EFFECTIVENESS, HARVARD MEDICAL SCHOOL, HARVARD SCHOOL OF PUBLIC HEALTH, BOSTON, MA
Introduction
Clinical epidemiology and biostatistics are the basic sciences
of clinical research. This series of articles will provide a
basic primer of clinical epidemiology and biostatistics for the
orthopaedic surgeon.
The evidence-based medicine and patient-derived outcomes
assessment movements burst onto the scene of clinical
medicine in the 1980s and 1990s as a result of contemporaneous
medical, societal, and economic influences. Work by
Wennberg and colleagues revealed large small-area variations in
clinical practice, with some patients thirty times more likely to
undergo an operative procedure than other patients with identical
symptoms merely because of their geographic location1-6.
Further critical research suggested that up to 40% of some
surgical procedures might be inappropriate and that up to 85%
of common medical treatments were not rigorously validated7-9.
Meanwhile, the costs of health care were rapidly rising to over
two billion dollars per day, increasing from 5.2% of the gross
domestic product in 1960 to 16.2% in 199710. Health maintenance
organizations and managed care emerged. In addition,
increasing federal, state, and consumer oversight were brought
to bear on the practice of clinical medicine.
These forces have led to an increased focus on the clinical
effectiveness of care. Clinical epidemiology provides the methodology
to assess the clinical effectiveness of care. Part I of this
series, presented here, provides an overview of the concepts
of study design, hypothesis testing, measures of treatment
effect, and diagnostic performance. Evidence-based medicine,
outcomes assessment, data, and statistical analysis will be
covered in Part II, to be published in next year's edition of The
Orthopaedic Journal at Harvard Medical School. Examples
from the orthopaedic literature and a glossary of terminology
are provided.
Study Design
In observational studies researchers observe patient groups
without allocation of the intervention, whereas in experimental
studies researchers allocate the treatment. Experimental studies
involving humans are called trials. Research studies may be retrospective,
meaning that the direction of inquiry is backwards
from the cases and that the events of interest transpired before
the onset of the study, or they may be prospective, meaning
that the direction of inquiry is forward from the cohort inception
and that the events of interest transpire after the onset of
the study (Fig. 1). Cross-sectional studies are used to survey
one point in time.
All research studies are susceptible to invalid conclusions
due to bias, confounding, and chance. Bias is the nonrandom
systematic error in the design or conduct of a study. Bias usually
is not intentional; however, it is pervasive and insidious.
Forms of bias can corrupt a study at any phase, including
patient selection (selection and membership bias), study performance
(performance, information, and nonresponder bias),
and outcome determination (detection, recall, acceptability,
and interviewer bias). A confounder is a variable having independent
associations with both the independent (predictor) and
dependent (outcome) variables, thus potentially distorting their
relationship. Frequent confounders in clinical research include
gender, age, socioeconomic status, and comorbidities. As discussed
below in the section on hypothesis testing, chance may
lead to invalid conclusions based on the probability of type-I
and type-II errors, which are related to p values and power.
The adverse effects of bias, confounding, and chance
can be minimized by study design and statistical analysis.
Prospective studies minimize selection, information, recall,
and nonresponder bias. Randomization minimizes selection
bias and equally distributes confounders. Blinding can decrease
bias, and matching can decrease confounding. Confounders
can sometimes be controlled post hoc with use of stratified
analysis or multivariable methods. The effects of chance can be
minimized by an adequate sample size based on power calculations
and appropriate alpha levels. The ability of study design
to optimize validity while minimizing bias, confounding, and
chance is recognized by the hierarchical levels of evidence and
grades of recommendations established by the U.S. Preventative
Services Task Force and the Oxford Centre for Evidence-Based
Medicine on the basis of study design (Table I).
|
Level |
|
Levels of Evidence Description |
1a |
Systematic Review (with homogeneity) of Randomized Clinical Trials |
1b |
Individual Randomized Clinical Trial (with narrow confidence interval) Individual Inception Prospective Cohort Study with =80% follow-up |
1c |
All or None Case Series |
2a |
Systematic Review (with homogeneity) of Cohort Studies |
2b |
Individual Cohort Study Low-Quality Randomized Clinical Trial |
2c |
"Outcomes" Research Ecological Studies |
3a |
Systematic Review (with homogeneity) of Case-Control Studies |
3b |
Individual Case-Control Study |
4 |
Case Series Low-Quality Cohort and Case-Control Studies |
5 |
Expert Opinion |
|
Grade |
|
Grade of Recommendation Description |
A |
Consistent Level 1 Studies |
B |
Consistent Level 2 or 3 Studies; Extrapolations from Level 1 Studies |
C |
Level 4 Studies; Extrapolations from Level 2 or 3 Studies |
D |
Level 5 Studies; Troublingly Inconistent or Inconclusive Studies of any Level |
|
Observational study designs include case series, casecontrol
studies, cross-sectional surveys, and cohort studies.
A case series is a retrospective, descriptive account of a
group of patients with interesting characteristics or a series of
patients who have undergone an intervention. A case series of
one patient is a case report. Case series are easy to construct
and can provide a forum for the presentation of interesting or
unusual observations. However, case series are often anecdotal,
are subject to many possible biases, lack a hypothesis, and are
difficult to compare with other series. Thus, case series are
usually viewed as a means of generating hypotheses for further
studies but are not viewed as conclusive. A case-control study
is a one in which the investigator identifies patients with an
outcome of interest and controls without the outcome and
then looks back retrospectively to identify possible causes or
risk factors. The effects in a case-control study are frequently
reported with use of the odds ratio. Case-control studies
are efficient (particularly for the evaluation of unusual
conditions or outcomes) and are relatively easy to perform.
However, an appropriate control group may be difficult
to identify, and preexisting high-quality medical records
are essential. Moreover, case-control studies are very
susceptible to multiple biases (particularly selection and
detection bias). Cross-sectional surveys are often used
to determine the prevalence of disease or to identify coexisting
associations in patients with a particular condition at
one particular point in time. Surveys are also frequently
performed to determine preferences and treatment
patterns. Because crosssectional studies represent a
snapshot in time, they may be misleading if the research
question involves the disease process over time. Surveys
also present unique challenges in terms of adequate
response rate, representative samples, and acceptability
bias. A traditional cohort study is one in which a population
of interest is identified and followed prospectively in order to
determine outcomes and associations with risk factors. Cohort
studies are optimal for studying the incidence, course, and risk
factors of a disease because they are longitudinal, meaning that
a group of subjects is followed over time. The effects in a cohort
study are frequently reported in terms of relative risk. Because
these studies are prospective, they can optimize follow-up and
data quality and can minimize bias associated with selection,
information, and measurement. In addition, they have the
correct time-sequence to provide strong evidence regarding
associations. However, these studies are costly, are logistically
demanding, often require long time-periods for completion,
and are inefficient for the assessment of unusual outcomes or
diseases.
Experimental trials may involve the use of concurrent
controls, sequential controls (cross-over trials), or historical
controls. The randomized clinical trial (RCT) with concurrent
controls is the gold standard of clinical evidence as it provides
the most valid conclusions (internal validity) by minimizing
the effects of bias and confounding. A rigorous randomization
with enough patients is the best means of avoiding confounding.
The performance of an RCT involves the construction of a protocol
document that explicitly establishes eligibility criteria, sample
size, informed consent, randomization, stopping rules, blinding,
measurement, and data analysis. Because allocation is random,
selection bias is minimized and confounders (known and unknown)
are theoretically equally distributed between groups. Blinding
minimizes performance, detection, interviewer, and acceptability
bias. Intention-to-treat analysis minimizes nonresponder
and transfer bias, while sample-size determination
ensures adequate power. The intention-to-treat principle states
that all patients should be analyzed within the treatment group
to which they were randomized in order to preserve the goals
of randomization. Although the RCT is the epitome of clinical
research designs, the disadvantages of RCTs include their
expense, logistics, and time to completion. Accrual of patients
and acceptance by clinicians may be problematic. With rapidly
evolving technology, a new technique may become rapidly well
accepted, making an existing RCT obsolete or a potential RCT
difficult to accept. Ethically, RCTs require clinical equipoise
(equality of treatment options in the clinician's judgment) for
enrollment, interim stopping rules to avoid harm and evaluate
adverse events, and truly informed consent. Finally, while RCTs
have excellent internal validity, some have questioned their
generalizability (external validity) because the practice pattern
and the population of patients enrolled in an RCT may be overly
constrained and nonrepresentative.
Ethical considerations are intrinsic to the design and conduct
of clinical research studies. Informed consent is of paramount
importance and it is the focus of much of the activity
of Institutional Review Boards. Investigators should be familiar
with the Nuremberg Code and the Declaration of Helsinki as
they pertain to ethical issues of risks and benefits, protection of
privacy, and respect for autonomy11,12.
Hypothesis Testing
The purpose of hypothesis testing is to permit generalizations
from a sample to the population from which it came.
Hypothesis testing confirms or refutes the assertion that the
observed findings did not occur by chance alone but rather
occurred because of a true association between variables. By
default, the null hypothesis of a study asserts that there is no
significant association between variables whereas the alternative
hypothesis asserts that there is a significant association. If
the findings of a study are not significant we cannot reject the
null hypothesis, whereas if the findings are significant we can
reject the null hypothesis and accept the alternative hypothesis.
|
Experiment |
Truth |
Not Significant |
Significant |
Not Significant |
Correct |
Type-II (ß) error |
Significant |
Type-I (a) error |
Correct |
|
Thus, all research studies that are based on a sample make
an inference about the truth in the overall population. By constructing
a 2 x 2 table of the possible outcomes of a study (Table
II), we can see that the inference of a study is correct if a significant
association is not found when there is no true association
or if a significant association is found when there is a true
association. However, a study can have two types of errors. A
type-I or alpha (a) error occurs when a significant association
is found when there is no true association (resulting in a "false
positive" study that rejects a true null hypothesis). A type-II or
beta (ß) error wrongly concludes that there is no significant
association (resulting in a "false negative" study that rejects a
true alternative hypothesis).
The P value refers to the probability of the type-I (a) error.
By convention, the alpha level of significance is set at 0.05,
which means we accept the finding of a significant association
if there is less than a one in twenty chance that the observed
association was due to chance alone. Thus, the P-value which
is calculated from a statistical test, is a measure of the strength
of evidence from the data in favor of the null hypothesis. If the
P-value is less than the alpha level then the evidence against
the null hypothesis is strong enough to reject it and conclude
that the result is statistically significant. P values frequently
are used in clinical research and are given great importance by
journals and readers; however, there is a strong movement in
biostatistics to de-emphasize p values because a significance
level of P<0.05 is arbitrary, a strict cutoff point can be misleading
(there is little difference between P=0.049 and P=0.051,
yet only the former is considered "significant"), the P value
gives no information about the strength of the association, and
the P value may be statistically significant without being clinically
important. Alternatives to the traditional reliance on P values
include the use of variable alpha levels of significance based
on the consequences of the type-I error and the reporting of P
values without using the term "significant." Use of 95% confidence
intervals in lieu of P values has gained acceptance as
these intervals convey information regarding the significance
of findings (95% confidence intervals do not overlap if they are
significantly different), the magnitude of differences, and the
precision of measurement (indicated by the range of the 95%
confidence interval). Whereas the P-value is often interpreted as
being either statistically significant or not, the 95% CI provides
a range of values that allows the reader to interpret the implications
of the results. In addition, while P-values have no units,
confidence intervals are presented in the units of the variable of
interest, which helps the reader to interpret the results.
Power is the probability of finding a significant association
if one truly exists and is defined as 1 - the probability of type-II
(ß) error. By convention, acceptable power is set at =80%, which
means there is =20% chance that the study will demonstrate no
significant association when there is a true association. In practice,
when a study demonstrates a significant association, the
potential error of concern is the type-I (a) error as expressed
by the p value. However, when a study demonstrates no significant
association, the potential error of concern is the type-II
(ß) error as expressed by power. That is, in a study that demonstrates
no significant effect, there may truly be no significant
effect or there may actually be a significant effect but the study
was underpowered because the sample size may have been too
small. Thus, in a study that demonstrates no significant effect,
the power of the study should be reported. The calculations
for power analyses differ depending on the statistical methods
utilized for analysis, however four elements are always involved
in a power analysis: a, ß, effect size, and sample size (n). Effect
size is the difference that you want to be able to detect with
the given a and ß. It is based on a clinical sense about how
large a difference would be clinically meaningful. Low sample
sizes, small effect sizes, and large variance decrease the power
of a study. An understanding of power issues is important in
clinical research to minimize resources when planning a study
and to ensure the validity of a study. Sample size calculations
are performed when planning a study. Typically, power is set
at 80%, alpha is set at 0.05, the effect size and variance are
estimated from pilot data or the literature, and the equation
is solved for the necessary sample size. Power analysis is performed
after a study. Typically, alpha is set at 0.05, the sample
size, effect size, and variance of the actual study are used, and
the study's power is determined.
Diagnostic Performance
A diagnostic test can result in four possible scenarios: (1)
true positive if the test is positive and the disease is present, (2)
false positive if the test is positive and the disease is absent, (3)
true negative if the test is negative and the disease is absent,
and (4) false negative if the test is negative and the disease is
present (Table III). The sensitivity of a test is the percentage (or
proportion) of patients who have the disease that are classified
positive (true positive rate). A test with 97% sensitivity implies
that of 100 patients with disease, ninety-seven will have a
positive test. Sensitive tests have a low falsenegative rate. A
negative result on a highly sensitive test rules disease out (SNout).
The specificity of a test is the percentage (or proportion)
of patients without the disease who are classified negative (true
negative rate). A test with 91% specificity implies that of 100
patients without the disease, ninety-one will have a negative test.
Specific tests have a low false-positive rate. A positive result on
a highly specific test rules disease in (SPin). Sensitivity and
specificity can be combined into a single parameter, the likelihood
ratio (LR), which is the probability of a true positive divided by
the probability of a false positive. Sensitivity and specificity
can be established in studies in which the results of a diagnostic
test are compared with the gold standard of diagnosis in the same
patients—for example, by comparing the results of magnetic resonance
imaging with arthroscopic findings13.
|
|
Disease Positive |
Disease Negative |
Test Positive |
a (true positive) |
b (false positive) |
Test Negative |
c (false negative) |
d (true negative) |
|
Sensitivity and specificity are technical parameters of diagnostic
testing performance and have important implications for
screening and clinical practice guidelines14,15; however, they are
less relevant in the typical clinical setting because the clinician
does not know whether or not the patient has the disease. The
clinically relevant questions are the probability that a patient
has the disease given a positive result (positive predictive value)
and the probability that a patient does not have the disease
given a negative result (negative predictive value). The positive
and negative predictive values are probabilities require an estimate
of the prevalence of the disease in the population and can
be calculated using equations that utilize Bayes' theorem.
There is an inherent trade-off between sensitivity and
specificity. Because there is typically some overlap between
the diseased and nondiseased groups with respect to a test
distribution, the investigator can select a positivity criterion
with a low false-negative rate (to optimize sensitivity) or one
with a low false-positive rate (to optimize specificity) (Fig. 2).
In practice, positivity criteria are selected on the basis of the
consequences of a false-positive or a false-negative diagnosis.
If the consequences of a false-negative diagnosis outweigh
the consequences of a false-positive diagnosis of a condition
(such as septic arthritis of the hip in children16), a more sensitive
criterion is chosen (Fig. 3). This relationship between the
sensitivity and specificity of a diagnostic test can be portrayed
with use of a receiver operating characteristic (ROC) curve.
An ROC graph shows the relationship between the true positive
rate (sensitivity) on the y-axis and the false positive rate (100-
specificity) on the x-axis plotted at each possible cut-off (Figure
4). If a test discriminates well, its ROC curve approaches a truepositive
rate of 100% and a true negative rate of 0%. On the
other hand, a test that discriminates poorly has a diagonal ROC
curve (45-degree line). Overall, the diagnostic performance can
be evaluated by the area under the ROC curve. In the case of
perfect discrimination the area under the curve will equal 1.0,
while an area of 0.5 indicates random guessing17.
|
|
Measures of Effect
Measures of likelihood include probability and odds. Probability
is a number, between 0 and 1, that indicates how likely an event
is to occur based on the number of events per the number of trials.
The probability of heads on a coin toss is 0.5. Odds are the ratio
of the probability of an event occurring to the probability of the
event not occurring. The odds of flipping a heads on a coin toss is
1 (0.5/0.5). Because probability and odds are related, they can be
converted where odds = probability/(1 - probability).
Relative risk (RR) can be determined in a prospective cohort study, where RR equals
the incidence of disease in the exposed cohort divided by the incidence
of disease in the nonexposed cohort (Table IV). A similar measurement
in a retrospective case-control study (where incidence cannot
be determined) is the odds ratio (OR), which is the ratio of the
odds of having the disease in the study group compared with
the odds of having the disease in the control group (Table IV).
Factors that are likely to increase the incidence, prevalence,
morbidity, or mortality of a disease are called risk factors.
The effect of a factor that reduces the probability of an adverse
outcome can be quantified by the relative risk reduction (RRR),
the absolute risk reduction (ARR), and the number needed to
treat (NNT (Table IV)). The effect of a factor that increases the
probability of an adverse outcome can be quantified by the relative
risk increase (RRI), the absolute risk increase (ARI), and
the number needed to harm (NNH) (Table IV).
|
|
Adverse Events |
No Adverse Events |
Experimental Group |
a |
b |
Control Group |
c |
d |
|
Glossary
Absolute Risk Reduction (ARR): Difference in risk of adverse outcomes between experimental and control participants in a trial.
Alpha (Type I) Error: Error in hypothesis testing where a significant association is found when there is no true significant association (rejecting a true null hypothesis). The alpha level is the threshold of statistical significance established by the researcher (P<0.05 by convention).
Beta (Type II) Error: Error in hypothesis testing where no significant association is found when there is a true significant association (rejecting a true alternative hypothesis).
Bias: Systematic error in the design or conduct of a study. Threatens validity of the study.
Blinding: Element of study design in which patients and/or investigators do not know who is in the treatment group and who is in the control group. The term masking is often used.
Case-Control Study: Retrospective observational study design which involves identifying cases with outcome of interest and controls without outcome, and then looking back to see if they had exposure of interest.
Case Series: Retrospective observational study design which describes a series of patients with an outcome of interest or who have undergone a particular treatment. No control group.
Confidence Interval (CI): Quantifies the precision of measurement. Usually reported as 95% CI, which is the range of values within which there is a 95% probability that the true value lies.
Confounding: A variable having independent associations with both the dependent and independent variables, thus potentially distorting their relationship.
Cohort Study: Prospective observational study design which involves the identification of group(s), with the exposure or condition of interest, and then following the group(s) forward for the outcome of interest.
Controlling for: Term used to describe when confounding variables are adjusted in the design or analysis of a study in order to minimize confounding.
Crossover Study: Prospective experimental study design which involves the allocation of two or more experimental treatments one after the other in a specified or random order to the same group of patients.
Cross-Sectional Study: Observational study design which assesses a defined population at a single point in time for both exposure and outcome (survey).
Dependent Variable: Outcome or response variable.
Distribution: Values and frequency of a variable (Gaussian, binomial, skewed)
Effect Size: The magnitude of a difference considered to be clinically meaningful. Used in power analysis to determine the required sample size.
Experimental Study: Study design in which treatment is allocated (trial).
Failure: Generic term used for an event.
Hypothesis: A statement that will be accepted or rejected based on the evidence in a study.
Incidence: Proportion of new cases of a specific condition in the population at risk during a specified time interval.
Independent Events: Events whose occurrence has no effect on the probability of each other.
Independent Variable: Variable associated with the outcome of interest that contributes information about the outcome in addition to that provided by other variables considered simultaneously.
Intention to Treat Analysis: Method of analysis in randomized clinical trials in which all patients randomly assigned to a treatment group are analyzed in that treatment group, whether or not they received that treatment or completed the study.
Interaction: Relationship between two independent variables such that they have a different effect on the dependent variable.
Likelihood Ratio (LR): Likelihood that a given test result would be expected in a patient with condition compared to a patient without the condition. Ratio of true-positive rate to false-positive rate.
Matching: Process of making two groups homogeneous for possible confounding factors.
Meta-Analysis: An evidence-based systematic review that uses quantitative methods to combine the results of several independent studies to produce summary statistics.
Multiple Comparisons: Pairwise group comparisons involving more than one P-value.
Negative Predictive Value (NPV): Probability of not having the disease given a negative diagnostic test. Requires an estimate of prevalence.
Null Hypothesis: Default testing hypothesis assuming no difference between groups.
Number Needed to Treat (NNT): Number of patients needed to treat in order to achieve one additional favorable outcome.
Observational Study: Study design in which treatment is not allocated.
Odds: Probability that event will occur divided by probability that event will not occur.
Odds Ratio: Ratio of the odds of having condition/outcome in experimental group to the odds of having the condition/outcome in the control group (case-control study).
One-Tailed Test: Test in which the alternative hypothesis specifies a deviation from the null hypothesis in one direction only.
Placebo: Inactive substance used to reduce bias by simulating the treatment under investigation.
Positive Predictive Value (PPV): Probability of having the disease given a positive diagnostic test. Requires an estimate of prevalence.
Power: Probability of finding a significant association when one truly exists (1-probability of type II (ß) error). By convention, power of 80% or greater is considered sufficient.
Prevalence: Proportion of individuals with a disease or characteristic in the study population of interest.
Probability: A number, between 0 and 1, indicating how likely an event is to occur.
Prospective Study: Direction of inquiry is forward from cohort. Events transpire after study onset.
P-Value: Probability of type I (a) error. If the P-value is small, then it is unlikely that the results observed are due to chance.
Randomized Clinical Trial (RCT): Prospective experimental study design which randomly allocates eligible patients to experimental vs control groups or different treatment groups.
Random Sample: A sample of subjects from the population such that each has equal chance of being selected.
Receiver Operating Characteristic (ROC) Curve: Graph showing the test's performance as the relationship between the truepositive rate and the false-positive rate.
Regression: Statistical technique for determining the relationship among a set of variables.
Relative Risk (RR): Ratio of incidence of disease or outcome in exposed versus incidence in unexposed cohorts (cohort study).
Relative Risk Reduction (RRR): Proportional reduction in adverse event rates between experimental and control groups in a trial.
Retrospective Study: Direction of inquiry is backwards from cases. Events transpired before study onset.
Sample: Subset of the population.
Selection Bias: Systematic error in sampling the population.
Sensitivity: Proportion of patients who have the outcome that are classified positive.
Sensitivity Analysis: Method in decision analysis used to determine how varying different components of a decision tree or model change the conclusions.
Specificity: Proportion of patient without the outcome who are classified negative.
Validity: Degree to which a questionnaire or instrument measures what it is intended to measure.
Notes:
Dr. Kocher is an Instructor in Orthopaedic Surgery, Harvard Medical School, and Director of the Program in Clinical Effectiveness, Harvard School of Public Health and Department of Orthopaedics, Children's Hospital, Boston MA.
Dr. Zurakowski is the Principal Statistician in the Department of Orthopaedic Surgery, Children's Hospital, Boston MA
Address Correspondence To: Mininder S. Kocher, M.D., M.P.H. Department of Orthopaedic Surgery Children's Hospital 300 Longwood Avenue Boston, MA 02115 617.355.4849/ 617.739.3338 (fax) e-mail: mininder.kocher@tch.harvard.edu
References:
- Wennberg J, Gittelsohn A: Small area variations in health care delivery. Science, 182(117): 1102-8, 1973.
- Wennberg J, Gittelsohn A.: Variations in medical care among small areas. Sci Am, 246(4): 120-34, 1982.
- Wennberg JE: Dealing with medical practice variations: a proposal for action. Health Aff (Millwood), 3(2): 6-32, 1984.
- Wennberg JE: Outcomes research: the art of making the right decision. Internist, 31(7): 26, 28, 1990.
- Wennberg JE: Practice variations: why all the fuss? Internist, 26(4): 6-8, 1985.
- Wennberg JE, Bunker JP, Barnes B: The need for assessing the outcome of common medical practices. Annu Rev Public Health, 1: 277-95, 1980.
- Chassin MR: Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures [see comments]. JAMA, 258(18): 2533-7, 1987.
- Kahn KL, Kosecoff J, Chassin M R, Flynn MF, Fink A, Pattaphongse N, Solomon DH, Brook RH: Measuring the clinical appropriateness of the use of a procedure. Can we do it? Med Care, 26(4): 415-22, 1988.
- Park RE, Fink A, Brook RH, Chassin MR, Kahn KL, Merrick NJ, Kosecoff J, Solomon DH: Physician ratings of appropriate indications for three procedures: theoretical indications vs indications used in practice. Am J Public Health, 79(4): 445-7, 1989.
- Millenson ML. Demanding Medical Excellence. Chicago: University of Chicago Press, 1997.
- Katz J. The Nuremberg Code and the Nuremberg Trial. JAMA, 276:1662-6, 1996.
- World Medical Organization. Declaration of Helsinki: Recommendations guiding physicians in biomedical research involving human subjects. JAMA, 277:925-6, 1997.
- Kocher MS, DiCanzio J, Zurakowski D, Micheli LJ. Diagnostic performance of clinical examination and selective magnetic resonance imaging in the evaluation of intra-articular knee disorders in children and adolescents. Am J Sports Med, 2001, 29(3): 292-296.
- Kocher MS. Ultrasonographic screening for developmental dysplasia of the hip: An epidemiologic analysis. Part I. Am J Orthop, 2000, 29(12): 929-933.
- Kocher MS. Ultrasonographic screening for developmental dysplasia of the hip: An epidemiologic analysis. Part II. Am J Orthop, 2001, 30(1):19-24.
- Kocher MS, Zurakowski D, Kasser JR. Differentiating between septic arthritis and transient synovitis of the hip in children: An evidence-based clinical prediction algorithm. J Bone Joint Surg, 1999, 81A:1662-1670.
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29-36, 1982.
|
|
|
|