Summary
Nowadays, guidelines are derived from the findings of randomized controlled therapeutic trials. However, an overall significant P value does not exclude that some patients may be harmed by or will not respond to the therapeutic agent being studied. Trials in patients with a low risk of events and/or a limited chance of providing significant differences in therapeutic effects require a large patient population to demonstrate a beneficial effect. Composite efficacy endpoints are often employed to obviate the need for a large patient population when low rates of events or limited therapeutic efficacy are anticipated. Results of randomized controlled therapeutic trials are commonly expressed in terms of relative risk reduction, whereas absolute risk reduction allows the calculation of the “number needed to treat” to prevent an adverse outcome. The number needed to treat is a far more clinically relevant variable than relative risk reduction. The clinician’s mission is to match treatment to patient with the goal of achieving optimal therapeutic response. Drug-safety monitoring is also of major importance to avoid exposing patients to irreversible adverse effects. Unfortunately, drug-safety monitoring is often overlooked in routine clinical practice. Finally, the lack of long-term therapeutic data (> 5–10 years) is an unsolved dilemma, as most trials are limited to a duration of a few months or years.
Résumé
Les essais contrôlés randomisés ont permis au cours du temps d’améliorer les prises en charge thérapeutiques. Cependant la significativité des valeurs de p ne doit pas faire oublier que l’effet est lissé sur toute la population composée en réalité de patients répondeurs et non répondeurs à la thérapeutique testée. Les essais enrôlant des patients à faible risque d’évènements et/ou testant des thérapeutiques avec une efficacité limitée nécessite des populations de plus en plus importantes pour démontrer un bénéfice. Le développement de critères de jugement combinant de multiples critères traduit en réalité l’anticipation d’un faible risque d’évènements ou d’une faible efficacité thérapeutique. Le résultat des essais thérapeutiques est souvent exprimé sous la forme de réduction du risque relatif alors qu’il est plus pertinent d’être informé sur la réduction en risque absolu qui en outre permet d’estimer le nombre de patients nécessaires à traiter pour éviter un évènement. Le rôle des cliniciens-chercheurs est d’affiner la réponse thérapeutique et la sélection des patients afin d’améliorer le taux de patients répondeurs. L’autre mission importante du thérapeute est la pharmacovigilance qui permet de détecter des effets secondaires majeurs parfois irréversibles devant conduire au retrait du médicament ; dans les faits cette dernière est souvent négligée. D’autre part, une grande problématique non résolue est le manque de données à long terme (> 5–10 ans) car la plupart des essais se limite souvent à quelques mois ou années d’observation.
Interpretation of randomized controlled trials
The implementation of randomized controlled trials (RCTs) in recent decades has allowed a shift away from the historical practice of “empirical” therapy to that of “evidence-based medicine”. Double-blind trials with random allocation to intervention groups allow an objective comparison of a group receiving a potentially active or innovative agent with a control group (placebo or reference agent). The British Medical Research Council is credited with the first randomized controlled trial, published in 1948; streptomycin was shown to drastically reduce mortality from pulmonary tuberculosis, compared with bed rest alone (55 streptomycin-treated patients versus 52 controls; mortality rates of 7% versus 27% at 6 months) .
The findings of RCTs have since allowed a dramatic improvement in therapeutic management, and are currently the backbone of therapeutic guidelines. However, an overall beneficial response, as evidenced by a significant P value, does not indicate that all patients benefit from the active intervention. Some patients do not benefit from the intervention, and some may even be harmed by the intervention. When the therapeutic efficacy of the agent is limited and/or patients are at low risk of events, RCTs require very large populations to detect an event relative reduction of only 20–30% with a statistical power of 90–95%. On the other hand, when the agent has shown a substantial clinical benefit compared with placebo or the reference drug, RCTs may be terminated early by the data safety board before full enrolment. In brief, calculation of sample size depends highly on the expected effect of the active intervention. Overall, when analysing the findings of RCTs, statistical significance does not necessarily mean clinical relevance. For instance, in 1958, Barritt & Jordan carried out an RCT comparing heparin with placebo in the treatment of pulmonary embolism . The trial was interrupted after enrolment of the 35th patient because of five deaths and five recurrences in the 19 control patients, and no death or recurrence in the 16 patients who were randomized to heparin . Similarly, a trial conducted in 1995 investigating fibrinolysis in the treatment of massive pulmonary embolism was interrupted after randomization of eight patients, because there were four deaths in the group receiving heparin alone versus none in the group receiving streptokinase followed by heparin . The CONSENSUS trial, which evaluated the potential benefit of angiotensin-converting enzyme (ACE) inhibitors in severe heart failure (HF) (New York Heart Association functional class IV), required randomization of only 253 patients to either enalapril 2.5–40 mg/day or placebo. After an average follow-up of 6 months, there were 33 fatal outcomes (26%) in the enalapril group versus 55 (44%) in the placebo group (relative risk reduction [RRR] 40%; P = 0.002) . In the SOLVD-Treatment (SOLVD-T) study, 2569 patients with less severe HF (New York Heart Association functional class II–III) were randomized to enalapril 2.5–20 mg/day or to placebo. After an average follow-up of 41 months, SOLVD-T showed a statistically significant reduction in mortality with enalapril: 452 (35.2%) deaths in the enalapril group versus 510 (39.7%) in the placebo group (hazard ratio: 0.84, 95% confidence interval [CI] 0.74 to 0.95; P = 0.0036) . In contrast, the SOLVD-Prevention (SOLVD-P) trial, which investigated the use of enalapril 2.5–20 mg/day in 4228 asymptomatic HF patients with left ventricular (LV) ejection fraction ≤ 35%, did not find a statistically significant reduction in mortality; after an average follow-up of 37 months there were 313 deaths (14.8%) in the enalapril group and 334 deaths (15.8%) in the placebo group (hazard ratio: 0.92; 95% CI: 0.79 to 1.08; P = 0.30) . The findings of the CONSENSUS, SOLVD-T and SOLVD-P trials illustrate how closely the rate of events affects the sample size that is needed to demonstrate the benefit of a given intervention . In terms of pathophysiology, these studies show that the efficacy of ACE inhibitors is probably dependent on the intensity of activation of the renin-angiotensin system in congestive HF. It must be noted that different scales are used in the survival curves of the CONSENSUS , SOLVD-T and SOLVD-P studies to highlight the area between the two survival curves.
The use of composite efficacy endpoints in RCTs requires that attention be paid to the severity and pathophysiological mechanisms of the components of the composite endpoints. The combination of clinical criteria with different levels of clinical significance (e.g. death and hospitalization for chest pain) or the combination of clinical and paraclinical criteria, such as in the PREAMI study, which used a composite primary endpoint of death, HF hospitalization and LV remodelling (≥ 8% increase in LV end-diastolic volume), should be avoided . In brief, although composite primary endpoints are still debated frequently, they must have a common pathophysiological mechanism or a similar clinical severity (such as non-fatal vascular events at the level of the heart, brain and lower limb). In fact, the use of composite efficacy endpoints as a primary goal reflects that a low rate of events or limited therapeutic efficacy of the agent is anticipated.
An adequate control group is an essential part of the design of an RCT. When evaluating a new therapeutic agent, it is preferable to use placebo for the control group. However, when an agent with proven efficacy already exists, ethical issues prevent the use of a placebo, and the most effective agent for the control group should then be used. Pharmaceutical sponsors should not select a weak agent or strategy for the control group to maximize the likelihood of positive findings. For example, the HORIZONS-AMI trial compared bivalirudin alone to heparin plus a glycoprotein IIb/IIIa inhibitor in patients with ST-segment elevation myocardial infarction undergoing primary percutaneous coronary intervention; patients were randomized to receive a 40-minute intravenous administration of bivalirudin, a direct thrombin inhibitor with a half-life of 25 minutes, or intravenous administration of unfractionated heparin (half-life of 60–90 minutes) for several days plus intravenous administration of a glycoprotein II/IIIa inhibitor (either abciximab infusion over 12 hours with 24–48 hours of antiplatelet activity or eptifibatide infusion over 12–18 hours with a half-life of 150 minutes). As expected, there was a significant reduction in major bleeding events and a greater event-free survival at 30 days with bivalirudin than with heparin plus glycoprotein II/IIIa inhibitors . Similarly in the CHAMPION PHOENIX trial, 11,145 patients undergoing either urgent or elective percutaneous coronary intervention were randomized to receive the potent intravenous adenosine diphosphate receptor antagonist cangrelor, immediately followed by a loading dose of clopidogrel 600 mg at the end of the infusion, or intravenous placebo combined with a loading dose of either 300 mg or 600 mg of clopidogrel ; as expected, stent thrombosis at 48 hours developed in only 0.8% of the patients in the cangrelor group, but in 1.4% in the placebo group receiving a clopidogrel loading dose of 300 mg in 25% (odds ratio: 0.62; 95% CI: 0.43 to 0.90; P = 0.01) . Another example of inadequate choice of a control agent is the RELAX-AHF trial, where serelaxin (a recombinant human relaxin-2 novel vasoactive peptide hormone) was compared with placebo rather than with intravenous administration of well-titrated dinitrate isosorbide in patients hospitalized for acute decompensated HF .
However, when placebo is used as the control group, cases may arise in which investigators become aware, during the trial period, of which group patients have been randomized into, which may compromise the double-blind nature of a therapeutic trial, and result in a biased interpretation of findings. This applies particularly to agents with well-known side effects. Therapeutic trials of selective serotonin reuptake inhibitors in depression are difficult to blind; those patients receiving inert placebo become immediately recognisable, as they do not experience the classical side effects of selective serotonin reuptake inhibitors (such as nausea, vomiting or a dry mouth). The use of an ‘active’ placebo that mimics the adverse effects of the active medication may preserve the blind aspect of the trial and avoid overestimation of the efficacy of antidepressants in the treatment of depression . In the SHIFT trial, ivabradine was tested against placebo in patients with chronic HF, but did not reduce mortality. The blind aspect of the trial may have been broken in some patients who, while randomized to ivabradine, experienced a marked decrease in heart rate . Similarly, it can be difficult to ensure blinding in trials investigating surgical procedures (because of the nature of the intervention itself), raising the necessity for sham-controlled trials. Renal denervation was associated with highly promising findings for the treatment of hypertension in several unblinded trials. However, the promising findings could not be confirmed by the SIMPLICITY HTN-3 trial, which included a sham procedure in the patients with resistant hypertension who were randomized to the control arm. Systolic blood pressure decreased by 14.13 ± 23.93 mmHg with renal denervation and by 11.74 ± 25.94 mmHg with the sham procedure (95% CI: −6.89 to 2.12; P = 0.26) in the 535 patients who were randomized .
However, it must be emphasized that ethical issues prevent certain therapeutic strategies originating from empirical medicine to be subject to evidence-based medicine; in other words, these cannot be compared with placebo in a classical RCT. For example, loop diuretics cannot be omitted when assessing new agents for the treatment of acute pulmonary oedema, aortic valve replacement cannot be compared to conservative management for patients fit for surgery with symptomatic aortic stenosis. Similarly, pulmonary thromboendarterectomy, when indicated, is ethically required in the treatment of patients with chronic thromboembolic pulmonary hypertension.
Results of RCTs are often expressed as relative risk reduction (RRR: [absolute risk (AR) verum − AR placebo]/AR placebo). Absolute risk reduction (ARR: AR verum − AR placebo) allows calculation of the “number needed to treat” (NNT = 1/ARR) in order to prevent one additional adverse outcome. The NNT gives clinicians a better sense of the intervention effectiveness. When an RCT shows a reduction in myocardial infarction incidence from 2% to 1% over a follow-up period of 4 years, the ARR is 1%, the RRR is 50% and the NNT is 100 (1/1%) in 4 years or 400 patients per year. In other words, there is a 98% chance that myocardial infarction will not occur with placebo, and a 99% chance that it will not with active treatment. The CONSENSUS trial found an RRR of mortality of 40.9%, an ARR of 18% and an NNT of 5–6 patients for a period of 6 months ; in contrast, the SOLVD-T trial found an RRR of deaths of 16%, an ARR of 4.5% and an NNT of 22 over 41 months or 76/year . Overall, in studies investigating treatment with ACE inhibitors, the NNTs to prevent one death attributable to any or cardiovascular causes were inversely related to the annual risk of all-cause or cardiovascular mortality, respectively . The Syst-Eur trial enrolled 4695 patients aged > 60 years with systolic hypertension (systolic blood pressure 160–219 mmHg and diastolic blood pressure < 95 mmHg). Patients were randomized to placebo or nitrendipine 10–40 mg, with the possible addition of enalapril 5–20 mg and hydrochlorothiazide 12.5–25.0 mg. The Syst-Eur trial reported that the treatment of 1000 patients may prevent 29 strokes and 53 major cardiovascular events corresponding to NNTs of 34 and 19, respectively, over a period of 5 years . The HOPE-3 study, which randomized 12,705 patients with intermediate cardiovascular risk to placebo or primary prevention by rosuvastatin 10 mg per day, with an average follow-up of 5.6 years, found that the primary outcome did not occur in 95.2% of placebo patients and in 96.3% of active treatment patients (RRR = 24%, ARR = 1.1%, NNT = 91 or 509/year) . Recently the Salford Lung Study, which randomized 2799 patients to the treatment regimen of combined fluticasone furoate and vilanterol or usual care, showed a relative reduction of 8.4% in the risk of exacerbations ( P = 0.02); 1.74 exacerbations per year in the fluticasone furoate-vilanterol group compared with 1.90 per year with usual care. The NNT was 625 patients for 1 year to avoid one exacerbation .
Another way of presenting the findings of RCTs is to estimate the average gain in life expectancy between active treatment and placebo or, in other words, the average postponement in the occurrence of the endpoint for all treated; this corresponds to the area between the survival curves of patients in the treatment and control arms. Smoking cessation, regular physical exercise or cardiac transplantation increase life expectancy by several years, while statin therapy only increases mean life expectancy by a few weeks or months . Unfortunately, this method of representing results is rarely used.
It is worth quoting the recent report from Finegold et al., who calculated the probability distribution of lifespan gained from primary prevention with statin use . These authors found that expected statin benefit is far from uniform. Although > 90% of people gain no added years from the intervention, a few (< 10%) gain far more than their risk stratum (around 100 months). The difference between low- and high-risk people is not in the extent of the lifespan gain, but in the proportion (ranging from 2.9% to 9.9%) who benefit. These findings strongly advocate patient-centred decision-making.
Another important methodological matter is the issue of truncated RCTs, because of early stoppage for apparent benefit. In fact, early stoppage consistently overestimates the effect of active treatment on outcome. The overestimation is particularly pronounced (> 25%) for truncated trials where fewer than 500 events occurred , such as the JUPITER, CARDS and ASCOT-LLA trials, investigating the use of statins as primary prevention .
Only findings related to the primary endpoint (which allowed calculation of the sample size) can be considered as statistically robust. Findings related to secondary endpoints or analyses of subgroups are only hypothesis generating for new studies to prove or disprove non-primary endpoint findings. For example, the ELITE-1 study comparing losartan and captopril in HF patients with reduced LV ejection fraction found an overall reduction in mortality in the losartan group; however, the ELITE-1 findings were not corroborated by the validation study ELITE-2 . Recently, the development of hierarchical sequential testing methods (for example in the ARISTOTLE trial investigating apixaban) allows the use of multiple endpoints without increasing the rate of false findings at a prespecified significance level alpha, thereby producing results of the same statistical robustness as those based on a single primary endpoint.
The role of the clinician is to refine the expected therapeutic response and select the patients most likely to benefit from the therapeutic agent, thus increasing the number of potential responders. For instance, both RCTs and registries have shown that 70% of patients with congestive HF and reduced LV ejection fraction are improved by cardiac resynchronization therapy . Identification of the 30% of non-responders is clearly needed to avoid unnecessary implantation and reduce potential morbidity and costs. A similar issue is sudden cardiac death prevention by implantable cardioverter-defibrillators (ICDs) in patients with LV ejection fraction ≤ 35%. At 3–5 years, only 10% of patients with ICDs receive life-saving therapy, while 90% of patients are subjected to the unnecessary procedural and device-related complications (inappropriate shocks, infections, haematomas, lead fracture or displacement, tamponade, development of anxiety, etc.) and costs associated with ICD implantation . Device manufacturers have no incentive to refine the phenotype of the patient who benefits from ICD therapy to curtail the ICD-related financial burden . Of note, the DANISH trial, which randomized 556 patients with symptomatic systolic HF not caused by coronary artery disease, found no benefit from prophylactic ICDs in this patient group . The 2008 European Society of Cardiology recommendations advocated (grade IIa, level of evidence A) indiscriminate use of ACE inhibitors in all post-myocardial infarction patients . The recommendation was based on the EUROPA study, in which 12,218 patients with stable coronary artery disease and no apparent HF were randomized to perindopril 8 mg or matching placebo; the study found an RRR of 20% for the composite primary endpoint of cardiovascular mortality, non-fatal myocardial infarction or successfully resuscitated cardiac arrest, and an NNT of 694 patients per year for cardiovascular mortality . However, 4 years later, the European Society of Cardiology recommended targeted use of ACE inhibitors solely for patients with LV dysfunction or clinical evidence of HF (grade I, level of evidence A) . This tailored approach reduces the occurrence of hypotension, cough and hyperkalaemia in normotensive patients with coronary artery disease and conserved LV function.
The sample populations of RCTs rarely reflect real-world patient populations; elderly patients and those with non-cardiac co-morbidities are consistently under-represented. Differences between RCT and real-world patient populations may have major consequences when findings are applied to routine clinical practice, as occurred after the RALES trial was published . Spironolactone reduced fatal outcome in HF patients with reduced LV ejection fraction and an average age of 65 years: 386 (46%) mortalities in the placebo group versus 284 (35%) in the spironolactone group (hazard ratio: 0.70; 95% CI: 0.60 to 0.82; P < 0.001; ARR 11% and NNT 9 over a mean follow-up period of 24 months). However, when translated into clinical practice, spironolactone resulted in an increased incidence of hyperkalaemia in patients who were 13 years older than the RALES patients, with less stringent laboratory monitoring of potassium concentrations than in the RALES population . In terms of the use of non-vitamin K antagonist oral anticoagulants (NOACs), less than half of real life patients with atrial fibrillation and suspected stroke are eligible for a NOAC when taking into account the enrolment criteria in pivotal NOAC trials. However, by taking into account the European Medicines Agency approved indications, almost three-quarters of these patients become eligible, indicating less stringent rules from regulatory authorities . Therefore, an individualized approach is clearly needed in clinical practice, with a global outlook on the primary disease, co-morbidities, social environment and the patient’s own wishes. Maintaining independent research, designed to identify and treat patients who are more likely to be able to benefit from a therapeutic choice, is needed to offset industry-sponsored RCTs that are designed to capture the largest possible population of patients.