Canadian Medical Association Journal 1995; 152: 351-357
Résumé
Dans le troisième article d'une série de
quatre, les auteurs illustrent le calcul de mesures
d'association et discutent de leur utilité
dans la prise de décisions cliniques. À partir
des taux de décès ou d'autres
«événement» dans des groupes de sujets
expérimentaux et des groupes de sujets
témoins au cours d'un essai clinique, nous pouvons
calculer le risque relatif (RR) de l'événement
après un traitement expérimental, exprimé en
pourcentage du risque sans le
recours au traitement en question. La réduction du
risque absolu (RRA) est la différence entre les groupes
quant au risque qu'un événement se produise. La
réduction du risque relatif est le pourcentage du risque
de base (le risque qu'un événement se produise
chez les patients du groupe
témoin) éliminé à la suite du
traitement. Le
ratio des probabilités (RP), qui représente la
mesure
privilégiée dans les études
castémoins,
représente le ratio entre les probabilités
qu'un
événement se produise dans le groupe de sujets
expérimentaux et celles qu'il se produise dans le
groupe de sujets
témoins. Le RP et le RR donnent des informations
limitées
lorsqu'il s'agit de faire état des résultats
d'essais prospectifs
parce qu'ils ne reflètent pas les changements du
risque de base. Le
RRA et le nombre de sujets à traiter, qui indiquent
aux cliniciens le
nombre de patients qu'il faut traiter pour prévenir
un
événement, prennent en compte à la fois
le risque de
base et la réduction du risque relatif. Si le moment
des
événements est important pour
déterminer si le
traitement prolonge la vie, par exemple on utilise des
courbes de survie
pour illustrer le moment où se produit
l'événement.
[ Top of document ]
The reader familiar with the first two articles in this series will, when presented with the results of
a clinical trial, know how to discover the range within which the treatment effect likely lies. This
treatment effect is worth considering if it comes from a study that is valid [1]. In this
article, we explore the ways investigators and representatives of pharmaceutical companies may
present the results of a trial.
When clinicians look at the results of clinical trials they are interested in the association between a treatment and an outcome. There may be no association; for example, there may be no difference in mean values of an indicator such as blood pressure between groups, or the same risk of an adverse event such as death in both groups. Alternatively, the trial results may show a decreased risk of adverse outcomes in patients receiving the experimental treatment. In a study examining a putatively harmful agent there may be no increase in risk among patients in a group exposed to the agent in comparison with those in a control group or an association between exposure and an adverse event, which suggests that the agent is indeed harmful. In this article, we examine how one can express the magnitude of these associations.
When investigators present results that show a difference in the mean value of a clinical measurement between two groups, the interpretation is usually straightforward. However, when they present results that show the proportion of patients who suffered an adverse event in each group, interpretation may be more difficult. In this situation they may express the strength of the association as a relative risk, an absolute risk reduction or an odds ratio. Understanding these measures is challenging and important; they will provide the focus of this article. We will examine the relative merits of the different measures of association and show how they can lead clinicians to different conclusions.
[ Top of document ]
Introducing the 2 × 2 table
A crucial concept in analysing the efficacy of therapeutic interventions is the "event." Analysis
often examines the proportion of patients who suffered a particular outcome (the "event") in the
treatment and control groups. This is always true when the outcome is clearly a dichotomous
variable that is, a discrete event that either occurs or does not occur. Examples of dichotomous
outcomes are the occurrence of negative events, such as stroke, myocardial infarction, death or
recurrence of cancer, or positive events, such as ulcer healing or resolution of symptoms. Not
only an event's occurrence but also its timing may be important. We will return to this issue later.
Even if the results are not of a yes-or-no form, investigators sometimes choose to present them as if they were. Investigators may present variables such as duration of exercise before chest pain develops, number of episodes of angina per month, change in lung function or number of visits to the emergency room as mean values in each of the two groups. However, they may also transform these values into dichotomous data by specifying a threshold or degree of change that constitutes an important improvement or deterioration and then examining the proportion of patients above and below this threshold. For example, investigators in one study used forced expiratory volume in 1 second (FEV1) to assess the efficacy of therapy with corticosteroids taken orally by patients with a chronic stable airflow limitation; they defined an "event" as an improvement in FEV1 of more than 20% over the baseline value [2].
The results of trials with dichotomous outcomes can usually be presented in a form of 2 × 2 table (Table 1). For instance, in a randomized trial investigators compared rates of death among patients with bleeding esophageal varices controlled by either endoscopic ligation or sclerotherapy [3]. After a mean follow-up period of 10 months, 18 of 64 patients assigned to ligation died, as did 29 of 65 patients assigned to sclerotherapy. Table 2 summarizes the data from this trial in a 2 × 2 table.
Table 1: Sample 2 x 2 table | ||
Exposure | Outcome | |
Yes | No | |
Yes | A | B |
No | C | D |
Table 2: Results from a randomized trial comparing treatment of bleeding esophageal varices with endoscopic sclero-therapy and with ligation* | |||
Intervention | Outcome, no. of patients | Total no. of patients treated | |
Death | Survival | ||
Ligation | 18 | 46 | 64 |
Sclerotherapy | 29 | 36 | 65 |
*Reprinted with permission from N Engl J Med 1992; 326: 1527-1532. |
[ Top of document ]
Relative risk
The first thing we can determine from the 2 × 2 table is that the risk of an event (death, in this
case) was 28.1% (18/64) in the ligation group and 44.6% (29/65) in the
sclerotherapy group. The ratio of these risks is called the relative risk (RR) or the risk ratio. This
value tells us the risk of the event after the experimental treatment (in this case, ligation), as a
percentage of the original risk (in this case, the risk of death after sclerotherapy). From Table 1,
the formula for calculating the RR from the data gathered is [A/(A + B)]/[C/(C + D)]. In our
example, the RR of death after receiving initial ligation compared with sclerotherapy is 18/64 (the
risk in the ligation group) divided by 29/65 (the risk in the sclerotherapy group), which equals
63%. That is, the risk of death after ligation is about two thirds as great as the risk of death after
sclerotherapy.
[ Top of document ]
Absolute risk reduction
The difference in the risk of the outcome between patients who have undergone one therapy and
those who have undergone another is called the absolute or attributable risk reduction (ARR) or
the risk difference. The formula for its calculation, from Table 1, is [C/(C + D)] - [A/(A + B)].
This measure tells us the percentage of patients who are spared the adverse outcome as a result of
having received the experimental rather than the control therapy. In our example, the ARR is
0.446 minus 0.281, which equals 0.165, or 16.5%.
[ Top of document ]
Relative risk reduction
Another measure used to assess the effectiveness of treatment is relative risk reduction (RRR).
One considers first the risk of an adverse event among patients taking the placebo or, if two
therapies are being compared, the risk among patients receiving the standard or inferior therapy.
This is called the baseline risk. The relative risk reduction is an estimate of the percentage of
baseline risk that is removed as a result of the therapy; it is calculated as the ARR between the
treatment and control groups, divided by the absolute risk among patients in the control group;
from Table 1, {[C/(C + D)] - [A/(A + B)]}/[C/(C + C)]. In our example, the RRR is calculated by
dividing 16.5% (the ARR) by 44.6% (the risk among patients receiving sclero-therapy), which
equals 37%. One may also derive the RRR by subtracting the RR from 1. In our example, the
RRR is equal to 1 minus 0.63, or 0.37 (37%).
[ Top of document ]
Odds ratio
Instead of looking at the risk of an event, we could estimate the odds of an event occurring. In
our example, the odds of death after ligation are 18 (death) versus 46 (survival), or 18/46 (A/B),
and the odds of death after sclero-therapy are 29 versus 36 (C/D). The formula for the ratio of
these odds called, not surprisingly, the odds ratio (OR) is (A/C)/(B/D). In our example, this
calculation yields (18/46)/(29/36), which equals 0.49.
The OR is probably less familiar to physicians than risk or RR. However, the OR is usually the measure of choice in the analysis of casecontrol studies. In general, the OR has certain optimal statistical properties that make it the fundamental measure of association in many types of studies[4]. These statistical advantages may be particularly important when data from several studies are combined, as they are in a meta-analysis. Among such advantages, the comparison of risk represented by the OR does not depend on whether the investigator chose to determine the risk of an event occurring (e.g., death) or not occurring (e.g., survival). This is not true for relative risk. In some situations the OR and the RR will be close for example, in casecontrol studies of a rare disease.
[ Top of document ]
RR versus OR versus ARR: Why the fuss?
The important distinction among the ARR, the RR and the OR may be illustrated by modifying
the death rates in each of the two treatment groups shown in Table 2. In the explanation that
follows, the reader should note that the effect on the various expressions of risk depends on the
way the death rates are changed. We could alter the death rates by the same absolute amount in
each group, by the same relative amount, or in some other way.
There is some evidence that, when treatment reduces the rate of death, the reduction in rates or proportion of deaths will often be similar in each subgroup of patients [5]. In our example, if we assume that the number of patients who died decreased by 50% in both groups, the risk of death in the ligation group would decrease from 28% to 14% and in the sclerotherapy group from 44.6% to 22.3%. The RR would be 14/22.3 or 0.63 the same as before. The OR would be (9/55)/(14.5/51) or 0.58, which differs moderately from the OR based on the higher death rate (0.49), and is closer to the RR. The ARR would decrease from 16.5% to approximately 8%. Thus, a decrease in the proportion of patients who died in both groups by a factor of two leaves the RR unchanged, results in a moderate increase in the OR and reduces the ARR by a factor of two. This example highlights the fact that the same RR can be associated with very different ORs and ARRs. A major change in the risk of an adverse event without treatment (or, as in this case, with the inferior treatment) will not be reflected in the RR or the OR; in contrast, the ARR changes markedly with a change in the baseline risk.
Hence, the RR and the OR do not tell us the magnitude of the absolute risk. An RR of 33% may mean that the treatment reduces the risk of an adverse outcome from 3% to 1% or from 60% to 20%. The clinical implications of these risk reductions are very different. Consider a therapy with severe side effects. If such side effects occur in 5% of patients treated, and the treatment reduces the probability of an adverse outcome from 3% to 1%, we probably will not institute this therapy. However, we may be willing to accept this incidence of side effects if the therapy reduces the probability of an adverse outcome from 60% to 20%. In the latter situation, of every 100 patients treated 40 would benefit and 5 would suffer side effects a trade-off that most would consider worth while.
The RRR behaves the same way as the RR: it does not reflect the change in the underlying risk in the control population. In our example, if the incidence of adverse events decreased by approximately 50% in both groups, the RRR would be the same as it was at the previous incidence rate: (22.3 - 14)/22.3 or 0.37. The RRR therefore shares with the RR the disadvantage of not reflecting the baseline risk.
These observations depend on the assumption that the death rates in the two groups change by the same proportion. If these changes are not proportional the conclusions may be different. For instance, suppose that the rates of death between the two groups differ by 10 percentage points; for example, if the death rates are 80% and 90%, respectively, the RR is 0.8/0.9 or 89%, the RRR 11%, the ARR 10% and the OR 0.44. If the rates of death then decrease by 50 percentage points in each group, to 30% and 40% respectively, the RR would be 0.3/0.4 or 75%, the RRR 25%, the ARR 10% and the OR 0.64. In this case, the ARR remains constant and thus does not reflect the change in the magnitude of risk without therapy. In contrast, the other indices differ in the two cases and hence reflect the change in the baseline risk.
[ Top of document ]
Number needed to treat
The number needed to treat (NNT) is the most recently introduced measure of treatment
efficacy [7]. Let us return to our 2 × 2 tables for a short exercise. In Table 2 we see
that the risk of death in the ligation group is 28.1% and in the sclerotherapy group 44.6%.
Therefore, treating 100 patients with ligation rather than sclerotherapy will save the lives of
between 15 and 16 patients, as shown by the ARR. If treating 100 patients prevents 16 adverse
events, how many patients do we need to treat to prevent 1 event? The answer is 100 divided by
16, which yields approximately 6. This is the NNT. One can also arrive at this number by taking
the reciprocal of the ARR (1/ARR). Since the NNT is related to the ARR, it is not surprising that
the NNT also changes with a change in the underlying risk.
The NNT is directly related to the proportion of patients in the control group who suffer an adverse event. For instance, if the incidence of these events (the baseline risk) decreased by a factor of two and the RRR remained constant, treating 100 patients with ligation would mean that 8 events had been avoided, and the NNT would double, from 6 to 12. In general, the NNT changes inversely in relation to the baseline risk. If the risk of an adverse event doubles, we need treat only half as many patients to prevent the same number of adverse events; if the risk decreases by a factor of four, we must treat four times as many patients to achieve the same result.
[ Top of document ]
Back to the 2 × 2 table
The data we have presented so far could have been derived from the original 2 × 2 table (Table 2).
The ARR and its reciprocal, the NNT, incorporate the influence of any change in baseline risk, but
they do not tell us the magnitude of the baseline risk. For example, an ARR of 5% (and a
corresponding NNT of 20) may represent reduction of the risk of death from 10% to 5% or from
50% to 45%. The RR and RRR do not take into account the baseline risk, and the clinical utility
of these measures suffers as a result.
Whichever way we choose to express the efficacy of a treatment, we must keep in mind that the 2 × 2 table reflects results at a given time. Therefore, our comments on the RR, the ARR, the RRR, the OR and the NNT must be qualified by giving them a time frame. For example, we must say that use of ligation rather than sclerotherapy for a mean period of 10 months resulted in an ARR of 17% and an NNT of 6. The results could be different if the duration of observation was very short, in which case there was little time for an event such as death to occur, or very long, in which case it is much more likely that an event will occur (e.g., if the outcome is death, after 100 years of follow-up all of the patients will have died).
[ Top of document ]
Confidence intervals
We have presented all of the measures of association for treatment with ligation versus
sclerotherapy as if they represented the true effect. As we pointed out in the previous article in
this series, the results of any experiment are an estimate of the truth. The true effect of treatment
may actually be greater or less than what we observed. The confidence interval tells us, within the
bounds of plausibility, how much greater or smaller the true effect is likely to be. Confidence
intervals can be calculated for each of the measures of association we have discussed.
[ Top of document ]
Survival data
As we pointed out, the analysis of a 2 × 2 table is an examination of the data at a specific time.
Such analysis is satisfactory if we are investigating events that occur within relatively short
periods and if all patients are followed for the same duration. However, in longer-term studies we
are interested not only in the number of events but also in their timing. We may, for instance, wish
to know whether therapy for a fatal condition such as severe congestive heart failure or
unresectable lung cancer delays death.
When the timing of events is important, the results can be presented in several 2 × 2 tables constructed at certain points after the beginning of the study. In this sense, Table 2 showed the situation after a mean of 10 months of follow-up. Similar tables could be constructed to show the fate of all patients at given times after their enrolment in the trial, i.e., at 1 week, 1 month, 3 months or whatever intervals we choose. An analysis of accumulated data that takes into account the timing of events is called survival analysis. Despite the name, such analysis is not restricted to deaths; any discrete event may be studied in this way.
The survival curve of a group of patients shows the status of the patients at different times after a defined starting point [8]. In Fig. 1, we show an example of a survival curve taken from a trial of treatments of bleeding varices. Although the mean follow-up period in this trial was 286 days, the survival curve extends beyond this time, presumably to a point at which the number of patients still at risk is sufficient to make reasonably confident predictions. At a later point, prediction would become very imprecise because there would be too few patients to estimate the probability of survival. This imprecision can be captured by confidence intervals or bands extending above and below the survival curves.
Hypothesis tests can be applied to survival curves, the null hypothesis being that there is no difference between two curves. In the first article in this series, we described how an analysis based on hypothesis testing can be adjusted or corrected for differences in the two groups at the baseline. If one group were older (and thus had a higher risk of the adverse outcome) or had less severe disease (and thus had a lower risk), the investigators could conduct an analysis that takes into account these differences. Such an analysis tells us, in effect, what would have happened if the two groups had comparable risks of adverse outcomes at the start of the trial.
[ Top of document ]
Casecontrol studies
The examples we have used so far have been prospective randomized controlled trials. In such
trials we start with an experimental group of patients who are subject to an intervention and a
control group of patients who are not. The investigators follow the patients over time and record
the incidence of events. The process is similar in prospective cohort studies, although in this study
design the "exposure" or treatment is not controlled by the investigators. Instead of being
assigned to receive or not receive the intervention, patients are chosen, sampled or classified
according to whether they were or were not exposed to the treatment or risk factor. In both
randomized trials and prospective cohort studies we can calculate risks, ARRs and RRs.
In casecontrol studies participants are chosen or sampled not according to whether they have been exposed to the treatment or risk factor but on the basis of whether they have experienced an event. Participants start the study with or without the event rather than with or without the exposure or intervention. Patients with the adverse outcome be it stroke, myocardial infarction or cancer are compared with control patients who have not suffered the outcome. The investigators wish to determine if any factor seems to be more common in one of these groups than in the other.
In one casecontrol study investigators examined whether the use of sun-beds or sun-lamps increased the risk of melanoma [9]. They identified 583 patients with melanoma and 608 control patients. The control and case patients had similar distributions of age, sex and region of residence. The results for men and women were presented separately (those for men are shown in Table 3).
Table 3: Results from a casecontrol study of the association between melanoma and the use of sun-beds and sun-lamps* | ||
Ever exposed to sun-beds or sun-lamps | No. of patients | |
Case | Control | |
Yes | 67 | 41 |
No | 210 | 242 |
*Reproduced with permission from Walter SD, Marrett LD, From L, et al: The association of cutaneous malignant melanoma with the use of sunbeds and sunlamps. Am J Epidemiol 1990; 131: 232-243. |
If the information in Table 3 came from a prospective cohort study or randomized controlled trial we could begin by calculating the risk of an event in the experimental and control groups. However, this would not make sense in a casecontrol study because the number of patients who did not have melanoma was chosen by the investigators. For calculation of the RR we need to know the population at risk, and this information is not available in a casecontrol study.
The only measure of association that makes sense in a casecontrol study is the OR. One can investigate whether the odds of having been exposed to sun-beds or sun-lamps among the patients with melanoma are the same as the odds of exposure among the control patients. In the study the odds were 67/210 in the patients with melanoma and 41/242 in the control patients. The odds ratio is therefore (67/210)/(41/242) or 1.88 (95% confidence interval [CI] 1.20 to 2.98), which suggests an association between the use of sun-beds or sun-lamps and melanoma. The fact that the CI does not include 1.0 means that the association is unlikely to be due to chance.
Even if the association were not due to chance, this does not necessarily mean that the sun-beds or sun-lamps were the cause of melanoma in these patients. Potential explanations could include higher recollection of use of these devices among patients with melanoma (recall bias), longer exposure to sun among these patients or different skin colour. (In fact, in this study the investigators addressed many of these possible explanations.) Confirmatory studies would be needed to be confident that exposure to sun-beds or sun-lamps was the cause of melanoma.
[ Top of document ]
Which measure of association is best?
In randomized trials and cohort studies, investigators can usually choose from several measures of
association. Which should the reader hope to see? We believe that the best option is to show all of
the data, in the form of 2 × 2 tables or life tables (deaths or other events during follow-up
presented in tabular form), and then consider both the relative and absolute figures. As the reader
examines the results, she or he will find the ARR and its reciprocal, the NNT, the most useful
measures for deciding whether to institute treatment. As we have discussed, the RR and the RRR
do not take baseline risk into account and can therefore be misleading.
In fact, clinicians make different decisions depending on the way the results are reported. Clinicians consistently judge a therapy to be less effective when the results are presented in the form of the NNT than when any other measure of association is used [10-13].
[ Top of document ]
Interpreting study results
We complete this exposition by reviewing the results of a landmark study the Lipid Research
Clinics Coronary Primary Prevention Trial of the usefulness of therapy to lower serum
cholesterol levels [14]. In this randomized, placebo-controlled trial the investigators
tested the hypothesis that a reduction in cholesterol levels reduces the incidence of coronary heart
disease (CHD). They followed 3806 asymptomatic middle-aged men with primary
hyper-cholesterolemia (serum cholesterol levels above the 95th percentile), of whom one third
were smokers, for a mean period of 7.4 years. Patients in one group received cholestyramine
(24 g/d) and those in the other a placebo. The main outcome measures (events) were
death due to CHD and nonfatal myocardial infarction. After 7.4 years of follow-up the results
showed an ARR of 1.71% (95% CI -0.11%
to 3.53%) and an NNT of 58 (the 95% CI for the NNT would include the fact that the
therapy causes one death in 935 treated patients and requires treatment of 28 patients to save one
life). The original report did not provide CIs for the RR and the ARR. We used the original data
to calculate these measures and the associated CIs, so our point estimates differ slightly from the
adjusted estimates given in the original report.
The risk of an event was 9.8% among the patients taking a placebo and 8.1% among those receiving cholestyramine. The RR of an event for those taking cholestyramine versus those taking a placebo was 83% (95% CI 68% to 101%). The use of cholestyramine was associated with a 17% reduction in the incidence of an event (RRR), with a 95% CI from a 33% reduction in risk to a 1% increase in risk, and with prevention of 17 primary events per 1000 patients treated. Therefore, 58 patients (100/1.7) needed to be treated for 7 years to prevent one primary event.
In addition to calculating the NNT, one could also consider resources expended to prevent an event. The cost of a month's supply of cholestyramine is $120.49. The cost of the drug required to prevent one event is 58 (the NNT) × 7 years of follow-up × 12 months per year × $120.49 for a 1-month supply = $587 027.28. Alternatively, to prevent one event, patients need to take 24 g/d × 58 (NNT) × 365 days per year × 7 years of follow-up = 3 556 560 g, approximately 3.56 tonnes to swallow of cholestyramine.
If one considered only patients with a lower risk of CHD (younger men, women, nonsmokers and those with cholesterol levels that are elevated but not in the top 95th percentile) the NNT would rise. It is not surprising that advertisements promoting the use of cholesterol-lowering drugs cite the RRR rather than the ARR or the NNT and do not mention the cost per event prevented.
The results of this study provide another caution for the clinician. The results we have described are based on the incidence of both fatal and nonfatal coronary events. However, the death rates shown in this study were similar in the two groups: there were 71 deaths among patients receiving placebo and 68 among patients receiving cholestyramine. Furthermore, when investigators have examined all trials of drug therapy for lowering cholesterol, they have found a possible association between administration of these agents and death from causes other than cardiovascular disease [15]. As this result highlights, the wary user of the medical literature must be sure that all relevant outcomes are reported [16].
ARRs are easy to calculate, as is their reciprocal, the NNT. If the NNT is not presented in trial results, clinicians who wish to get the best sense of the effect of an intervention should take the trouble to determine the number of patients they need to treat to prevent an event as well as the cost and toxic effects associated with treatment of that number of patients. These measures will help clinicians to weigh the benefits and costs of treatments.
[ Top of document ]
References