Simulation based sample size

with Type I and Type II error control

Elise Dupuis Lozeron

2024-05-23

Statistical tests

  • The purpose of a statistical test is to test a null hypothesis \(H_0\) about a parameter of the population

  • The null hypothesis \(H_0\) is the one that we want to reject. It often expresses a lack of effect. For example: the effect of the treatment is the same in the control and in the intervention arm

  • The alternative hypothesis \(H_a\) is the one that we want to accept. For example: the effect of the treatment is different in both arms

Type I and Type II errors

There are two types of errors that can be made with a statistical test:

  • Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.

Type I and Type II errors

There are two types of errors that can be made with a statistical test:

  • Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.

  • Type II error: not rejecting \(H_0\) when it is false, i.e. when \(H_a\) is true. False-negative result. It has a probability of \(\beta\).

Type I and Type II errors

There are two types of errors that can be made with a statistical test:

  • Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.

  • Type II error: not rejecting \(H_0\) when it is false, i.e. when \(H_a\) is true. False-negative result. It has a probability of \(\beta\).

Decision \(H_0\) true \(H_a\) true
Reject \(H_0\) type I error (\(\alpha\)) 1- \(\beta\)
Do not reject \(H_0\) 1-\(\alpha\) type II error (\(\beta\))

Type I and Type II errors

  • The power of a statistical test is \(1- \beta\). It is the probability of detecting a statistically significant difference when a difference of a given magnitude really exists.

  • Type II error rate is controlled through the power of the test.

  • Type I error rate is controlled through \(\alpha\).

  • These errors are controlled in the long run.

Power and sample size

Power and sample size

  • The statistical power of a test allows control of the Type II error rate.

  • It depends on:

    • the difference to be detected (effect size),
    • the \(\alpha\) level,
    • the sample size.
  • The correct sample size is a balance between a study that is too small or too large (resource constraints, ethical considerations).

Sample size computation

  • Compute the sample size using the formulas which exist for a statistical test with a given \(\alpha\), power and effect size.

  • This approach is limited to study designs and statistical tests where formulas exist.

  • Using simulations (iterative procedure):

    1. Fix \(n\) and generate samples from a theoretical model and parameter’s value under \(H_a\),
    2. Fit the chosen statistical model to each sample,
    3. Compute the proportion of significant tests \(\leadsto\) power.

Example

  • Randomized double-blind controlled trial carried out to compare inhaled corticosteroids with placebo in school-age children.

  • Primary endpoint was the mean forced expiratory volume in 1 second (FEV1) evaluated at the end of the 6 months of follow-up.

  • Hypothetical difference to be detected: 0.10\(l\) with a sd = 0.25\(l\).

  • \(\alpha = 5\%\), \(1 - \beta = 80\%\)

Sample size

Theoretical computation:

Code
power.t.test(power = .80, sig.level = 0.05, delta = 0.1, sd = 0.25) 

     Two-sample t test power calculation 

              n = 99.08057
          delta = 0.1
             sd = 0.25
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Simulation-based computation:

Code
sim_d <- function(seed, n) {
  mu_t <- 1.64
  mu_c <- 1.54

  set.seed(seed)

  tibble(group = rep(c("control", "treatment"), each = n)) |>
    mutate(
      treatment = ifelse(group == "control", 0, 1),
      y = ifelse(group == "control",
        rnorm(n, mean = mu_c, sd = 0.25),
        rnorm(n, mean = mu_t, sd = 0.25)
      )
    )
}

n_sim <- 500

s <-
  tibble(seed = 1:n_sim) |>
  mutate(d = map(seed, sim_d, n = 100)) |>
  mutate(fit = map(d, ~ lm(y ~ treatment, data = .x) |>
    broom::tidy()))

power_obt <- 
  s |>
  select(-d) |>
  unnest_longer(fit) |>
  filter(fit$term == "treatment") |>
  mutate(signif = fit$p.value < 0.05) |>
  summarise("simulated power (%)" = sum(signif / 500 * 100))
n simulated power (%)
200 81.2

Simulation based sample size

  • Randomized cluster trial: 10 pediatricians are asked to recruit 10 patients in each group.

  • The model used to generate the data is a linear mixed model with a random intercept for the pediatrician’s clustering.

  • The theoretical treatment effect is the same.

  • \(\sigma_{clust} = 0.08\), \(\sigma_{res} = 0.24\)

Code
sim_d_mix <- function(seed, n) {
  Pediatrician <- as.factor(rep(1:10))
  set.seed(seed)
  # Random sd intercept for pediatrician
  Ped_sd <- 0.08
  Subject <- as.factor(rep(1:n))
  Condition <- c("Treatment", "Control")
  # creates "design matrix" for the data
  X <- expand.grid(Subject = Subject, Pediatrician = Pediatrician, Condition = Condition)

  X <- as_tibble(X)
  X <- X |>
    mutate(
      Condition_dummy = case_when(
        Condition == "Control" ~ 0,
        Condition == "Treatment" ~ 1
      ),
      # add simulated random effect for the cluster
      Pediatrician_re = rep(rep(rnorm(10, mean = 0, sd = Ped_sd), each = n), 2)
    )

  # fixed intercept and coeff to be defined for simulation
  b <- c(1.54, 0.1)

  # Residual sd to be defined for simulation
  sd_res <- sqrt((0.25^2 - 0.08^2))
  resid <- rep(rnorm(200, 0, sd_res))
  
  # create outcome
  X <- as.data.frame(X)
  FEV1 <- b[1] + b[2] * X$Condition_dummy + X$Pediatrician_re + resid
  # Final simulated data set
  data_sim <- cbind(X, FEV1)
}
  

n_sim <- 100
s_lmer <-
  tibble(seed = 1:n_sim) |>
  mutate(data_w = map(seed, sim_d_mix, n = 10)) |>
  # Apply mixed model
  mutate(fit = map(data_w, ~ lmer(FEV1 ~ Condition + (1 | Pediatrician), data = .x))) |>
  # Compute treatment effect pvalue using Kenward Rodger df
  mutate(emm_fit = map(fit, ~ emmeans(.x, pairwise ~ Condition))) |>
  # Extract pvalue
  mutate(p_val_sim = map(emm_fit, ~ as.data.frame(.x$contrasts)[1, 6]))
  
power_obt <-
  s_lmer |>
  summarise("simulated power (%)" = sum(p_val_sim < 0.05) / 100 * 100)
n Design simulated power (%)
200 Cluster RCT 88

Simulation based sample size

  • With the same study design and parameters’ values, what will be the power if 1 patient with the best FEV1 is lost to follow-up, in the treatment group only?
Code
data_lost <- function(data, ltfup, n) {
  data  <- as_tibble(data) |> 
    arrange(Condition, Pediatrician, desc(FEV1)) |> 
    mutate(remdata = rep(c(rep("Yes", times = ltfup), rep("No", times = (n-ltfup))), 20)) |> 
    filter(!(Condition == "Treatment" & remdata == "Yes"))
}
 

n_sim <- 100
s_lmer_lost <-
  tibble(seed = 1:n_sim) |>
  mutate(data_w = map(seed, sim_d_mix, n = 10)) |>
  mutate(data_mod = map(data_w, ~ data_lost(.x, ltfup = 1, n = 10))) |>
  mutate(fit = map(data_mod, ~ lmer(FEV1 ~ Condition + (1 | Pediatrician), data = .x))) |>
  mutate(emm_fit = map(fit, ~ emmeans(.x, pairwise ~ Condition))) |>
  mutate(p_val_sim = map(emm_fit, ~ as.data.frame(.x$contrasts)[1, 6]))
  
power_obt_lost <-
  s_lmer_lost |>
  summarise("simulated power (%)" = sum(p_val_sim < 0.05) / 100 * 100)
n Design simulated power (%)
200 Cluster RCT with lost to follow-up 48

Conclusions

  • Sample size computation allows for control of Type II error rate.

  • It is an important part of study planning and it should be justified.

  • Depending on the complexity of the study design and the applied statistical model it can be done using formulas or simulations.

  • It remains a theoretical computation based on several assumptions.

Thank you for you attention