Clinical Trial Design
Mary W. Redman
John J. Crowley
For thoracic surgery as well as for any other discipline of medicine, the best path toward increased knowledge and better outcomes for patients is through carefully designed and conducted clinical trials. Trials are often categorized as phases. Phase I is typically the first experience in humans with a drug, device, or procedure. An important objective in a phase I trial is the establishment of safety. In cancer clinical trials of new drugs, the objective is often to find the highest dose that does not cause excessive toxicity, as this will likely be the most effective dose. The objectives in phase II trials often involve further safety considerations as well as the establishment of sufficient efficacy that the drug, device, procedure, or regimen deserves further study. The usual objective in a phase III trial is to compare one treatment approach with another, often a new experimental approach to a standard regimen.
The most important elements in any clinical trial are a clear statement of objectives; careful definition of eligibility, treatments, and endpoints; statistical considerations consistent with the objectives; careful attention to quality control at all levels including data item definitions, data-collection procedures, training for and review of treatment delivery, data management, and statistical analysis; and a clear reporting of results.
Statistical Background
The essence of the statistical argument is to make inferences from the patients at hand, the sample, to the universe of all patients, the population. The key concept is that outcomes observed in a sample of patients vary from one sample to the next, so that such observations can only be viewed as estimates of what would be observed in the population.
Clinical trials are often designed using the statistical framework of hypothesis testing. In this framework a null hypothesis H0, often the status quo, is posited, and the purpose of the trial is to reject that null hypothesis in favor of an alternative hypothesis H1. After a clinical trial, investigators will recommend staying with the status quo unless the new approach proves to be better (the issue of whether the new approach is worse is not really of interest). This is known as a one-sided test. Alternatively, if there are two competing standards, we might be interested in seeing if one of the two is better, and if so, which one. This is a two-sided test. Since researchers do not have the whole population of patients but only a sample, they do not know the truth. So they design and carry out a trial and decide between the null and alternative hypotheses.
General Design Considerations
Every trial contains certain key elements of design, which begin with the protocol document and carry through the conduct of the study.
Objectives
There should be a limited number of objectives, each carefully specified in terms of endpoints that are defined with clarity and can thus be measured without confusion. The statistician will often ask that one endpoint be considered the primary endpoint and the others secondary; this helps with the calculation of sample size and lends credibility when the results are reported.
Eligibility, Treatments, Endpoints
For surgical trials, key eligibility criteria will often involve characterizing the patients for whom the surgical approaches are possible and likely to be of benefit. Care must be taken that the criteria can be applied by those screening and entering patients on the trial, which often means that staging criteria, nodal maps, and so forth should be included in the protocol or in an appendix. The criteria should involve factors known (or known in principle) at the time of entry on trial, not determined by events that occur later. There is a trade-off between criteria that are so narrow that the generalizability of the study is compromised, and those that are too broad, in which case the effectiveness of treatment may be masked by the inclusion of inappropriate patients with little chance of benefiting. This is a matter of judgment.
The treatment approaches also need to be carefully spelled out and appropriate to the questions being asked. As with eligibility criteria, specification of treatment should be understandable and unambiguous to those delivering the therapy.
Endpoints should be suitable for the objectives, and should have clear definitions. An endpoint such as operative mortality needs to be defined (e.g., any deaths within 30 days of surgery). Survival is defined as the time from registration on study to time of death due to any cause, or last contact (the latter case yielding censored survival times). Using time to death due to disease is problematic because cause of death information is often
unreliable. Progression-free survival (or disease-free survival when treatment removes all known disease) is defined as the time from registration to the first observation of disease progression or death due to any cause, or last contact. If a patient has not progressed or died, progression-free survival is censored at the time of last follow-up. Because this endpoint requires disease to be assessed, the assessment schedule should be the same for all treatment arms.
unreliable. Progression-free survival (or disease-free survival when treatment removes all known disease) is defined as the time from registration to the first observation of disease progression or death due to any cause, or last contact. If a patient has not progressed or died, progression-free survival is censored at the time of last follow-up. Because this endpoint requires disease to be assessed, the assessment schedule should be the same for all treatment arms.
A common endpoint for phase II trials in oncology is tumor response, defined in the response evaluation criteria in solid tumors (RECIST) as a 30% decrease in unidimensionally measurable disease.22 Although this sounds like a simple, dichotomous outcome, there are problems in practice. Additionally, while tumor response may be a reasonable endpoint for phase II trials of efficacy, it is not generally appropriate as the primary outcome in a phase III comparative trial.
Side effects or toxicity of treatment will almost always be an endpoint, though often a secondary one. Patient-reported quality of life is increasingly used as an endpoint in clinical trials, even a primary one. It is generally considered important to assess many facets of quality of life. There are many standard instruments in use, as described Moinpour and colleagues.13 A full discussion of statistical analysis of quality-of-life issues may be found in the Handbook of Statistics in Clinical Oncology.21
Statistical Considerations
The statistical considerations in a protocol should be consistent with the objectives and endpoints. Ordinarily the sample size is driven by the primary objective. If the purpose is estimation, then the precision of estimation needs to be specified. If the aim of the study is comparative, then the significance level and the power for a specified difference need to be defined. Further considerations include rate of accrual of patients, power for comparison of any secondary endpoints, and some characterization of the analysis plan for all endpoints.
Quality Control
Quality control pertains to all aspects of maintaining quality throughout the conduct of a clinical trial, including a carefully written protocol, data collection forms that will yield the information needed for analysis, data collection and management procedures, training in delivery of treatment and data collection, and central physician review of collected data, including operative and pathology reports. Quality control aspects unique to surgery trials include training on the protocol-specified surgical procedures, requiring experience with the required techniques (either by credentialing or as part of a preprotocol phase), and early centralized review of operative reports.
Reporting of Results
The report of a clinical trial should feature, in the results section, the definitive protocol-specified primary analysis of the primary endpoint. Secondary endpoints (especially those multivariate endpoints such as toxicity) should be reported in a more descriptive fashion. Other analyses, such as the comparison of treatment outcomes in subsets, and prognostic factor analyses, should be regarded as exploratory only. The patients included in the primary analysis all should be eligible patients, without regard to whether there were deviations in treatment delivered, by what is known as the intent-to-treat principle (some statistical purists would include all patients, not just all eligible patients, but this would seem to make generalization to the appropriate population difficult).
Phase I and II Trials
Phase I trials have a limited role in thoracic surgery because they are generally used to find a “safe” dose of a single drug or combination regimen. An overview of Phase I trials is provided by Storer.19
There are two common types of phase II trials, studies of new agents performed in order to assess whether there is promise of activity, and pilot studies conducted to assess the activity and feasibility of previously tested treatments but in new combinations and schedules. Standard phase II studies, of investigational new drugs (INDs), are usually based on tumor response rates, accrue patients in two stages, and are formulated statistically as a test of the null hypothesis H0: p = pA versus the alternative hypothesis H1: p = pB.10 In this setting, p is the probability of response, pA is the probability that, if true, would mean that the agent was not worth studying further, and pB is the probability that, if true, would mean the agent is active and worth further study. An alternative endpoint to tumor response that might apply more generally would be 6-month or 1-year survival. Calculations for general pA and pB are available on the web at: www.swogstat.org/stat/public/twostage/2stage1.htm. Various other two-stage (or more) phase II designs have been proposed; the most commonly used is the controlled trial.15