Simulation based sample size

with Type I and Type II error control

Elise Dupuis Lozeron

2024-05-23

Statistical tests

The purpose of a statistical test is to test a null hypothesis \(H_0\) about a parameter of the population
The null hypothesis \(H_0\) is the one that we want to reject. It often expresses a lack of effect. For example: the effect of the treatment is the same in the control and in the intervention arm
The alternative hypothesis \(H_a\) is the one that we want to accept. For example: the effect of the treatment is different in both arms

Type I and Type II errors

There are two types of errors that can be made with a statistical test:

Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.

Type I and Type II errors

There are two types of errors that can be made with a statistical test:

Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.
Type II error: not rejecting \(H_0\) when it is false, i.e. when \(H_a\) is true. False-negative result. It has a probability of \(\beta\).

Type I and Type II errors

There are two types of errors that can be made with a statistical test:

Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.
Type II error: not rejecting \(H_0\) when it is false, i.e. when \(H_a\) is true. False-negative result. It has a probability of \(\beta\).

Decision	\(H_0\) true	\(H_a\) true
Reject \(H_0\)	type I error (\(\alpha\))	1- \(\beta\)
Do not reject \(H_0\)	1-\(\alpha\)	type II error (\(\beta\))

Type I and Type II errors

The power of a statistical test is \(1- \beta\). It is the probability of detecting a statistically significant difference when a difference of a given magnitude really exists.
Type II error rate is controlled through the power of the test.
Type I error rate is controlled through \(\alpha\).

These errors are controlled in the long run.

Power and sample size

Power and sample size

The statistical power of a test allows control of the Type II error rate.
It depends on:
- the difference to be detected (effect size),
- the \(\alpha\) level,
- the sample size.
The correct sample size is a balance between a study that is too small or too large (resource constraints, ethical considerations).

Sample size computation

Compute the sample size using the formulas which exist for a statistical test with a given \(\alpha\), power and effect size.
This approach is limited to study designs and statistical tests where formulas exist.
Using simulations (iterative procedure):
1. Fix \(n\) and generate samples from a theoretical model and parameter’s value under \(H_a\),
2. Fit the chosen statistical model to each sample,
3. Compute the proportion of significant tests \(\leadsto\) power.

Example

Randomized double-blind controlled trial carried out to compare inhaled corticosteroids with placebo in school-age children.
Primary endpoint was the mean forced expiratory volume in 1 second (FEV1) evaluated at the end of the 6 months of follow-up.
Hypothetical difference to be detected: 0.10\(l\) with a sd = 0.25\(l\).
\(\alpha = 5\%\), \(1 - \beta = 80\%\)

Sample size

Theoretical computation:

Code

power.t.test(power = .80, sig.level = 0.05, delta = 0.1, sd = 0.25)


     Two-sample t test power calculation 

              n = 99.08057
          delta = 0.1
             sd = 0.25
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Simulation-based computation:

Code

sim_d <- function(seed, n) {
  mu_t <- 1.64
  mu_c <- 1.54

  set.seed(seed)

  tibble(group = rep(c("control", "treatment"), each = n)) |>
    mutate(
      treatment = ifelse(group == "control", 0, 1),
      y = ifelse(group == "control",
        rnorm(n, mean = mu_c, sd = 0.25),
        rnorm(n, mean = mu_t, sd = 0.25)
      )
    )
}

n_sim <- 500

s <-
  tibble(seed = 1:n_sim) |>
  mutate(d = map(seed, sim_d, n = 100)) |>
  mutate(fit = map(d, ~ lm(y ~ treatment, data = .x) |>
    broom::tidy()))

power_obt <- 
  s |>
  select(-d) |>
  unnest_longer(fit) |>
  filter(fit$term == "treatment") |>
  mutate(signif = fit$p.value < 0.05) |>
  summarise("simulated power (%)" = sum(signif / 500 * 100))

n	simulated power (%)
200	81.2

Simulation based sample size

Randomized cluster trial: 10 pediatricians are asked to recruit 10 patients in each group.
The model used to generate the data is a linear mixed model with a random intercept for the pediatrician’s clustering.
The theoretical treatment effect is the same.
\(\sigma_{clust} = 0.08\), \(\sigma_{res} = 0.24\)

Code

sim_d_mix <- function(seed, n) {
  Pediatrician <- as.factor(rep(1:10))
  set.seed(seed)
  # Random sd intercept for pediatrician
  Ped_sd <- 0.08
  Subject <- as.factor(rep(1:n))
  Condition <- c("Treatment", "Control")
  # creates "design matrix" for the data
  X <- expand.grid(Subject = Subject, Pediatrician = Pediatrician, Condition = Condition)

  X <- as_tibble(X)
  X <- X |>
    mutate(
      Condition_dummy = case_when(
        Condition == "Control" ~ 0,
        Condition == "Treatment" ~ 1
      ),
      # add simulated random effect for the cluster
      Pediatrician_re = rep(rep(rnorm(10, mean = 0, sd = Ped_sd), each = n), 2)
    )

  # fixed intercept and coeff to be defined for simulation
  b <- c(1.54, 0.1)

  # Residual sd to be defined for simulation
  sd_res <- sqrt((0.25^2 - 0.08^2))
  resid <- rep(rnorm(200, 0, sd_res))
  
  # create outcome
  X <- as.data.frame(X)
  FEV1 <- b[1] + b[2] * X$Condition_dummy + X$Pediatrician_re + resid
  # Final simulated data set
  data_sim <- cbind(X, FEV1)
}
  

n_sim <- 100
s_lmer <-
  tibble(seed = 1:n_sim) |>
  mutate(data_w = map(seed, sim_d_mix, n = 10)) |>
  # Apply mixed model
  mutate(fit = map(data_w, ~ lmer(FEV1 ~ Condition + (1 | Pediatrician), data = .x))) |>
  # Compute treatment effect pvalue using Kenward Rodger df
  mutate(emm_fit = map(fit, ~ emmeans(.x, pairwise ~ Condition))) |>
  # Extract pvalue
  mutate(p_val_sim = map(emm_fit, ~ as.data.frame(.x$contrasts)[1, 6]))
  
power_obt <-
  s_lmer |>
  summarise("simulated power (%)" = sum(p_val_sim < 0.05) / 100 * 100)

n	Design	simulated power (%)
200	Cluster RCT	88

Simulation based sample size

With the same study design and parameters’ values, what will be the power if 1 patient with the best FEV1 is lost to follow-up, in the treatment group only?

Code

data_lost <- function(data, ltfup, n) {
  data  <- as_tibble(data) |> 
    arrange(Condition, Pediatrician, desc(FEV1)) |> 
    mutate(remdata = rep(c(rep("Yes", times = ltfup), rep("No", times = (n-ltfup))), 20)) |> 
    filter(!(Condition == "Treatment" & remdata == "Yes"))
}
 

n_sim <- 100
s_lmer_lost <-
  tibble(seed = 1:n_sim) |>
  mutate(data_w = map(seed, sim_d_mix, n = 10)) |>
  mutate(data_mod = map(data_w, ~ data_lost(.x, ltfup = 1, n = 10))) |>
  mutate(fit = map(data_mod, ~ lmer(FEV1 ~ Condition + (1 | Pediatrician), data = .x))) |>
  mutate(emm_fit = map(fit, ~ emmeans(.x, pairwise ~ Condition))) |>
  mutate(p_val_sim = map(emm_fit, ~ as.data.frame(.x$contrasts)[1, 6]))
  
power_obt_lost <-
  s_lmer_lost |>
  summarise("simulated power (%)" = sum(p_val_sim < 0.05) / 100 * 100)

n	Design	simulated power (%)
200	Cluster RCT with lost to follow-up	48

Conclusions

Sample size computation allows for control of Type II error rate.
It is an important part of study planning and it should be justified.
Depending on the complexity of the study design and the applied statistical model it can be done using formulas or simulations.
It remains a theoretical computation based on several assumptions.

Thank you for you attention

elise.lozeron@gmail.com
elisedupuis
elisedupuis
elise-dupuis-lozeron