One of the most vexing interactions between statisticians and clinicians/scientists is the discussion about power and sample size calculations. We statisticians do not expect everyone to understand the detailed machinery of all calculations; however, in this statistician’s experience, it is clear that understanding of one basic element of this procedure escapes many clinicians and scientists. This brief communication is aimed at helping to bridge this gap so that the clinical/scientific researcher can better communicate with their statistician when discussing power and sample size calculations.
Let us consider the relatively straightforward example of a 2-group randomized controlled trial with a binary (success/failure) outcome. We will randomly assign the patients to either control or treatment; we hypothesize that patients receiving treatment are more likely to experience an outcome of success than control patients. We are now faced with the question of “How many patients do we need to determine if treatment increases probability of success versus control?”
Based on historical data, we estimate the rate of success in our control patients will be ∼50%. However, we are not certain of this, so we may consider the possibility that it is slightly higher or lower. Similarly, we believe that our treatment will improve the likelihood of success, but we are not sure by exactly how much, so we ask our statistician to provide us the required sample size for a small range of possible success rates in the treated patients. As requested, the statistician will estimate the sample sizes and detectable differences required for 80% power to show a difference between groups with alpha = 0.05 ( Table 1 ). Please note that the term “power” refers to the probability of rejecting the null hypothesis in repeated experiments (i.e., with 80% power, we will reject the null hypothesis at least 4 of 5 times under the given assumptions about success rates in treatment and control); the term “alpha” refers to the statistical significance threshold at which we will reject the null hypothesis (i.e., with a p value <0.05, we will consider our results “statistically significant” evidence that the null hypothesis is false).