Randomized clinical trials (RCTs) are considered the gold standard for evidence-based medicine. However, an accurate estimation of the event rate is crucial for their ability to test clinical hypotheses. Overestimation of event rates reduces the required sample size but can compromise the statistical power of the RCT. Little is known about the prevalence, extent, and impact of overestimation of event rates. The latest RCTs on 10 preselected topics in the field of cardiovascular interventions and devices were selected, and actual primary event rates in the control group were compared with their respective event rate estimations. We also assessed what proportion of the nonsignificant RCTs was truly able to exclude a relevant treatment effect. A total of 27 RCTs randomizing 19,436 patients were included. The primary event rate in the control group was overestimated in 20 of the 27 RCTs (74.1%) resulting in a substantial relative difference between observed and estimated event rates (mean −22.9%, 95% confidence interval −33.5% to −12.2%; median −16.3%, 95% confidence interval −30.3% to −6.5%). Event rates were particularly overestimated in RCTs on biodegradable polymer drug-eluting coronary stents and renal artery stenting. Of the 14 single end point superiority trials with nonsignificant results, only 3 (21.4%) actually resulted in truly negative conclusions. In conclusion, event rates in RCTs evaluating cardiovascular interventions and devices are frequently overestimated. This under-reported phenomenon has fundamental impact on the design of RCTs and can have an adverse impact on the statistical power of these trials to answer important questions about therapeutic strategies.
After their introduction in 1948, randomized clinical trials (RCTs) have rapidly evolved as a mainstay of evidence-based clinical medicine. Nonetheless, their reliability, validity, and generalizability strongly depend on the methodologic rigor implemented to obtain results and draw conclusions. Performance of a sample size calculation for an appropriate primary end point is an essential step in this process that should be completed before initiating the trial. This step is probably equally challenging as it is important because estimates have to be made regarding the event rate in the control group and the clinically relevant benefit of the experimental therapy that would yield a positive trial outcome. In a recent RCT on renal denervation for resistant hypertension, blood pressure control improved significantly in both the experimental and control group. This unexpected and intriguing observation suggests that concurrent treatment and follow-up of subjects randomly allocated to experimental and control groups is not only one of the key features that has led us to adopt RCTs as the gold standard but can also be a caveat when patients are doing “too well” and low event rates compromise the assumptions made in the sample size calculation. Although overestimation of event rates does not seem to be uncommon in cardiovascular RCTs, we are unaware of any previous report that has looked into this phenomenon. In the present study, we systematically studied the prevalence, extent, and impact of overestimation of event rates in contemporary RCTs evaluating cardiovascular interventions and devices.
Methods
To include an unbiased sample of trials, we prespecified 10 topics in the field of cardiovascular interventions and devices:
Biodegradable polymer drug-eluting coronary stents (DES) in coronary artery disease;
Distal embolic protection in primary percutaneous coronary intervention;
Embolic protection in carotid artery stenting;
Intra-aortic balloon counterpulsation in acute myocardial infarction complicated by cardiogenic shock;
Left atrial appendage closure for prevention of stroke in nonvalvular atrial fibrillation;
Mitral valve replacement for ischemic mitral regurgitation;
Patent foramen ovale closure for cryptogenic stroke;
Percutaneous mitral valve repair for mitral regurgitation;
Renal artery stenting for renovascular disease; and
Transcatheter aortic valve replacement for aortic stenosis.
These topics were selected based on their contemporary nature and relevance to the field. Second, 2 investigators (KDM and DRH) independently performed a Medline search to identify the latest 5 RCTs for each of these topics up to April 14, 2014. The number of included trials was limited to the latest 5 to prevent imbalance between the topics. Search terms for each topic are listed in the Supplementary Methods . Additionally, references of relevant articles, reviews, and meta-analyses were reviewed. Inclusion criteria for RCTs were full publication in a peer-reviewed journal in English language, randomization of ≥30 patients into an experimental and control arm, and presence of a sample size calculation that disclosed the estimated event rate in at least the control group. We allowed the sample size calculation to be reported in the primary trial report, a study design paper, or a publicly accessible study protocol. Any discrepancies were resolved in consensus. As we did not study human subjects ourselves, institutional review board approval was not required.
We only studied the event rate in the control group because the event rate in the experimental group is complicated by the (inherently) uncertain efficacy of the experimental treatment. Several possible causes for overestimation of event rates were considered ( Table 1 ). General trial characteristics and details on event rates and sample size were abstracted from the trial reports, study design papers, and study protocols. Furthermore, we reviewed the original sources on which event rate estimates were based (reference reports) to assess their study design and timeliness. The latter was defined as the time gap between study closure of the most recent reference report and the date of enrollment initiation of the RCT. Target sample size was defined as the initially planned sample size. Three RCTs had 2 primary efficacy end points with separate power calculations. Results for both primary end points were listed separately, but the RCTs were handled as single trials in the aggregate analyses by averaging the results for both end points. To measure the possible impact of fewer than estimated primary end point events on trial results, we also assessed whether nonsignificant superiority RCTs produced a truly negative result.
Deviations of actual event rates and sample sizes from their initial estimates were reported as relative percentages. We considered a >10% lack in events or a >10% smaller sample size to be significant deviations from the initial estimates. Aggregate data were presented as means and medians with 95% confidence intervals (CIs) and ranges. The mean relative difference in event rates was weighted by its inverse variance. Bivariate associations were assessed with Spearman’s correlation. Group differences were tested with the Mann-Whitney U test. Equality of variances was tested using Brown and Forsythe’s F statistic. Differences between paired observations were tested with the Wilcoxon signed-rank test. Wald-based CIs were calculated for nonsignificant superiority RCTs (details in the Supplementary Methods ). We declared a study to be “truly negative” if the CI excluded the minimum detectable treatment effect used at the time of study planning. Nonsignificant superiority RCTs that did not meet this criterion were declared “inconclusive” as they were unable to exclude the minimal treatment effect deemed to be relevant by the investigators when designing their RCT. We examined both the absolute risk effect and the relative risk effect as we cannot know which the study planners intended as the true effect measure. Statistical significance was set at p <0.05 (2 tailed). Statistical analyses were performed with Stata 11.0 (StataCorp, College Station, Texas).
Results
Of 10,382 records, 56 RCTs were identified. Of these, 29 RCTs were excluded because of a study design other than treatment versus control (n = 3), randomization of <30 patients (n = 3), absence of full text in English language (n = 1), absence of a sample size calculation or estimated event rates not listed in the sample size calculation (n = 15), and not being part of the latest 5 trials on the topic (n = 7). Thus, 27 RCTs randomizing a total of 19,436 patients were included in the present study. This included the latest 5 RCTs on biodegradable polymer DES and all RCTs on the other topics because ≤5 eligible RCTs were published on each of the other 9 topics. Search results for each topic are listed in Supplementary Figures 1 to 10 , and further trial details such as trial references, trial acronyms, and published study design reports are listed in the Supplementary Results .
General trial characteristics and details on event rates and sample size are listed in Table 2 . The included RCTs enrolled patients from 2000 to 2012. Seventeen RCTs were designed to demonstrate superiority and 10 RCTs had a noninferiority design. Median trial duration was 2.2 years (range 0.5 to 9.0 years), and the median dropout rate was 2.4% (range 0.0% to 22.0%). The operator was blinded in none of the trials, and the patient was blinded only in the NOYA-I trial. Most RCTs were at least partially funded by the industry, except for DEDICATION, RAS-CAD, STACCATO, and the trials by Barbato et al and Acker et al.
Trial | Design | Enrollment | Control group | Outcome | Drop out, % | |
---|---|---|---|---|---|---|
Years | Duration, y | |||||
Biodegradable polymer drug-eluting coronary stents in coronary artery disease | ||||||
NEXT | NI, multicenter | 2011 | 0.5 | Everolimus-eluting stent | Non-inferior | 0.8 |
TARGET-I | NI, multicenter | 2010-2012 | 2.1 | Everolimus-eluting stent | Non-inferior | 12.8 |
COMPARE-II | NI, multicenter | 2009-2011 | 2.1 | Everolimus-eluting stent | Non-inferior | 0.7 |
SORT-OUT V | NI, multicenter | 2009-2011 | 1.6 | Sirolimus-eluting stent | Inferior | 0.04 |
NOYA-I | NI, multicenter | 2009 | 0.7 | Sirolimus-eluting stent | Non-inferior | 15.3 |
Distal embolic protection in primary percutaneous coronary intervention | ||||||
DEDICATION | Sup, multicenter | 2005-2006 | 1.6 | No distal protection | Neutral | 3.7 |
Tahk et al ∗ | Sup, multicenter | 2003-2004 | 0.9 | No distal protection | Superior | 41.4 |
Tahk et al ∗ | Sup, multicenter | 2003-2004 | 0.9 | No distal protection | Superior | 2.6 |
ASPARAGUS | Sup, multicenter | 2002-2003 | 1.5 | No distal protection | Neutral | NR |
EMERALD ∗ | Sup, multicenter | 2002-2003 | 1.5 | No distal protection | Neutral | 4.4 |
EMERALD ∗ | Sup, multicenter | 2002-2003 | 1.5 | No distal protection | Neutral | 12.8 |
PROMISE | Sup, single center | 2005 † | NR | No distal protection | Neutral | 0.0 |
Embolic protection in carotid artery stenting | ||||||
Barbato et al | Sup, single center | 2003-2006 | 2.2 | No emboic protection | Neutral | 0.0 |
IABP in AMI complicated by cardiogenic shock | ||||||
IABP-SHOCK II | Sup, multicenter | 2009-2012 | 2.7 | No IABP | Neutral | 0.3 |
Left atrial appendage closure for prevention of stroke in non-valvular atrial fibrillation | ||||||
PREVAIL ∗ | NI, multicenter | 2010-2012 | 1.7 | Warfarin treatment | Inferior | 0.0 |
PREVAIL ∗ | NI, multicenter | 2010-2012 | 1.7 | Warfarin treatment | Non-inferior | 0.0 |
PROTECT-AF | NI, multicenter | 2005-2008 | 3.4 | Warfarin treatment | Non-inferior | 0.6 |
Mitral valve replacement for ischemic mitral regurgitation | ||||||
Acker et al | Sup, multicenter | 2009-2012 | 4.0 | Mitral valve repair | Neutral | 1.6 |
Patent foramen ovale closure for cryptogenic stroke | ||||||
RESPECT | Sup, multicenter | 2003-2011 | 8.3 | Medical therapy | Neutral | 13.2 |
PC Trial | Sup, multicenter | 2000-2009 | 9.0 | Medical therapy | Neutral | 17.6 |
Closure-I | Sup, multicenter | 2003-2008 | 5.3 | Medical therapy | Neutral | 1.2 |
Percutaneous mitral valve repair for mitral regurgitation | ||||||
EVEREST-II | NI, multicenter | 2005-2008 | 3.2 | Valve surgery | Inferior | 3.2 |
Renal artery stenting for renovascular disease | ||||||
CORAL | Sup, multicenter | 2005-2010 | 4.7 | Medical therapy | Neutral | 1.7 |
RAS-CAD | Sup, single center | 2006-2008 | 2.5 | Medical therapy | Neutral | 13.1 |
ASTRAL | Sup, multicenter | 2000-2007 | 7.2 | Medical therapy | Neutral | 4.7 |
STAR | Sup, multicenter | 2000-2005 | 5.6 | Medical therapy | Neutral | 2.9 |
Transcatheter aortic valve replacement for aortic stenosis | ||||||
US COREVALVE | NI, multicenter | 2011-2012 | 1.7 | Valve surgery | Superior | 4.8 |
STACCATO | Sup, multicenter | 2008-2011 | 2.6 | Valve surgery | Neutral | 2.9 |
PARTNER A | NI, multicenter | 2007-2009 | 2.3 | Valve surgery | Non-inferior | 2.0 |
PARTNER B | Sup, multicenter | 2007-2009 | 1.8 | Medical therapy | Superior | 1.4 |
∗ Two primary endpoints, results listed for each.
Overall, there was a clear evidence for overestimation of event rates ( Table 3 ). The primary event rate in the control group was lower than estimated in 20 of 27 RCTs (74.1%) translating into a marked relative difference between observed and estimated event rate (weighted mean −22.9%, 95% CI −33.5% to −12.2%; median −16.3%, 95% CI −30.3% to −6.5%). The extent of overestimation for each of the topics is shown in Figure 1 . Control event rates were particularly lower than estimated in RCTs on biodegradable polymer DES (5 RCTs; n = 9,170; weighted mean relative difference −35.6%; 95% CI −53.0% to −18.2%) and renal artery stenting (4 RCTs; n = 1,977; weighted mean relative difference −41.1%; 95% CI −76.1% to −6.1%). A >10% lack of primary events in the control limb was seen in 16 RCTs (59.3%). Of these, 6 of 16 RCTs (37.5%) addressed this as a concern in their reports (COMPARE-II, Acker et al, PC trial, CORAL, ASTRAL, and STAR ). The median time gap between study closure of the most recent reference reports and enrollment initiation of the RCT was 4.4 years (range 0.0 to 11.1 years).
Trial | Primary endpoint | Primary events | Sample size | Reference paper(s) | |||||
---|---|---|---|---|---|---|---|---|---|
Observed | Estimated | Relative difference, % | Actual | Target | Relative difference, % | Design | Time gap with RCT, y | ||
Biodegradable polymer drug-eluting coronary stents in coronary artery disease | |||||||||
NEXT | 1y target lesion revascularization, % | 4.2 | 6.9 | -39.1 | 3235 | 3200 | 1.1 | Observational study | 4.4 |
TARGET-I | 9mo in-stent late lumen loss, mm ∗ | 0.13 | 0.16 | -18.8 | 460 | 460 | 0.0 | RCT | 4.5 |
COMPARE-II | 1y death/AMI/ target lesion revascularization, % | 4.8 | 9.5 | -49.5 | 2707 | 2700 | 0.3 | RCTs | 2.2 |
SORT-OUT V | 9mo death/AMI/stent thrombosis/TVR, % | 3.1 | 3.0 | 3.3 | 2468 | 2400 | 2.8 | RCT | 1.8 |
NOYA-I | 9mo in-stent late lumen loss, mm ∗ | 0.14 | 0.14 | 0.0 | 300 | 300 | 0.0 | Observational study | NR |
Distal embolic protection in primary percutaneous coronary intervention | |||||||||
DEDICATION | Incomplete ST-resolution, % † | 28.0 | 32.0 | -12.5 | 626 | 600 | 4.3 | RCT | 1.4 |
Tahk et al ‡ | Doppler coronary flow (APV), cm/s ∗ , § | 18.0 | 16.3 | -10.4 | 116 | 120 | -3.3 | Prospective study | NR |
Tahk et al ‡ | TIMI myocardial perfusion grade 0-2, % † | 62.3 | 70.0 | -11.1 | 116 | 120 | -3.3 | Observational study | 6.6 |
ASPARAGUS | Angiographic slow-flow/no-reflow, % | 11.4 | 20.0 | -43.0 | 341 | 278 | 22.7 | Pilot study | NR |
EMERALD ‡ | Incomplete ST-resolution, % † | 38.1 | 50.0 | -23.8 | 501 | 400 | 25.3 | Mixed, with RCT | 2.0 |
EMERALD ‡ | Scintigraphic left ventricular infarct size, % ∗ | 16.0 | 15.0 | 6.7 | 501 | 400 | 25.3 | RCTs | 1.2 |
PROMISE | Adenosine doppler coronary flow (APV), cm/s ∗ , § | 36.0 | 41.0 | 12.2 | 200 | 200 | 0.0 | RCT | NR |
Embolic protection in carotid artery stenting | |||||||||
Barbato et al | New ischemic lesions cerebral MRI, % | 44.0 | 29.0 | 51.7 | 36 | 100 | -64.0 | Prospective study | NR |
IABP in AMI complicated by cardiogenic shock | |||||||||
IABP-SHOCK II | 30d death, % † | 41.3 | 44.0 | -6.1 | 600 | 600 | 0.0 | Mixed, with RCT | 4.2 |
Left atrial appendage closure for prevention of stroke in non-valvular atrial fibrillation | |||||||||
PREVAIL ‡ | 18mo stroke/systemic embolism/CV, unexplained death, % | 6.3 | 6.3 ¶ | 0.0 | 407 | N/A ∗∗ | N/A ∗∗ | RCT | 2.3 |
PREVAIL ‡ | 18mo ischemic stroke/systemic embolism after 7 days, % | 2.0 | 2.5 ¶ | -20.0 | 407 | N/A ∗∗ | N/A ∗∗ | RCT | 2.3 |
PROTECT-AF | Stroke/CV-death/systemic embolism, /100 patient years | 4.9 | 6.15 | -20.3 | 707 | N/A ∗∗ | N/A ∗∗ | RCT | 9.3 |
Mitral valve replacement for ischemic mitral regurgitation | |||||||||
Acker et al | 1y left ventricular end-systolic volume index, ml/m 2 ∗ | 54.6 | 80.0 | -31.8 | 251 | 250 | 0.4 | Mixed, no RCT | 5.0 |
Patent foramen ovale closure for cryptogenic stroke | |||||||||
RESPECT | Early death or 2y stroke, % | 3.0 | 4.3 | -30.2 | 980 | N/A ∗∗ | N/A ∗∗ | Observational studies | 6.8 |
PC Trial | 4.5y death/stroke/TIA/systemic embolism, % | 6.0 | 13.5 | -55.6 | 414 | 410 | 1.0 | Mixed, no RCT | 3.8 |
Closure-I | 30d death or 2y stroke/TIA/neurological death, % | 6.8 | 6.0 | 13.3 | 909 | 1600 | -43.2 | Mixed, with RCT | 3.0 |
Percutaneous mitral valve repair for mitral regurgitation | |||||||||
EVEREST-II | 1y death/mitral valve surgery, dysfunction, regurgitation, % | 27.0 | 10.0 | 170.0 | 279 | 279 | 0.0 | Observational studies | 11.1 †† |
Renal artery stenting for renovascular disease | |||||||||
CORAL | 2y renal death, insufficiency, replacement therapy/CV-death/stroke/AMI/CHF, % | 25.0 | 30.0 | -16.7 | 947 | 1080 | -12.3 | Observational studies | 6.0 ‡‡ |
RAS-CAD | 1y decrease in left ventricular mass index, g/m 2 ∗ , § | 6.1 | 5.2 | -17.3 | 84 | 168 | -50.0 | RCTs | 9.2 |
ASTRAL | Averaged annual decline in reciprocal serum creatinine, liter/μmol/year ∗ | 0.13 | 1.6 | -91.9 | 806 | 1000 | -19.4 | Prospective study | 4.7 |
STAR | 2y progressive renal failure, % | 22.0 | 50.0 | -56.0 | 140 | 140 | 0.0 | Prospective study | 2.8 |
Transcatheter aortic valve replacement for aortic stenosis | |||||||||
US COREVALVE | 1y death, % | 18.7 | 20.0 | -6.5 | 795 | 790 | 0.6 | Observational studies | 5.1 |
STACCATO | 30d death/major stroke/renal failure, % | 2.8 | 13.5 | -79.3 | 70 | 200 | -65.0 | Observational study | 0.0 §§ |
PARTNER A | 1y death, % | 26.8 | 32.0 | -16.3 | 699 | 650 | 7.5 | Prospective studies | 0.0 ¶¶ |
PARTNER B | 1y death, % | 50.7 | 37.5 | 35.2 | 358 | 350 | 2.3 | Observational study | 9.4 |