
with Type I and Type II error control
2024-05-23
The purpose of a statistical test is to test a null hypothesis \(H_0\) about a parameter of the population
The null hypothesis \(H_0\) is the one that we want to reject. It often expresses a lack of effect. For example: the effect of the treatment is the same in the control and in the intervention arm
The alternative hypothesis \(H_a\) is the one that we want to accept. For example: the effect of the treatment is different in both arms
There are two types of errors that can be made with a statistical test:

There are two types of errors that can be made with a statistical test:
Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.
Type II error: not rejecting \(H_0\) when it is false, i.e. when \(H_a\) is true. False-negative result. It has a probability of \(\beta\).

There are two types of errors that can be made with a statistical test:
Type I error: rejecting \(H_0\) when it is true. False-positive result. It has a probability of \(\alpha\) which is the statistical significance level.
Type II error: not rejecting \(H_0\) when it is false, i.e. when \(H_a\) is true. False-negative result. It has a probability of \(\beta\).
| Decision | \(H_0\) true | \(H_a\) true |
|---|---|---|
| Reject \(H_0\) | type I error (\(\alpha\)) | 1- \(\beta\) |
| Do not reject \(H_0\) | 1-\(\alpha\) | type II error (\(\beta\)) |
The power of a statistical test is \(1- \beta\). It is the probability of detecting a statistically significant difference when a difference of a given magnitude really exists.
Type II error rate is controlled through the power of the test.
Type I error rate is controlled through \(\alpha\).




The statistical power of a test allows control of the Type II error rate.
It depends on:
The correct sample size is a balance between a study that is too small or too large (resource constraints, ethical considerations).
Compute the sample size using the formulas which exist for a statistical test with a given \(\alpha\), power and effect size.
This approach is limited to study designs and statistical tests where formulas exist.
Using simulations (iterative procedure):
Randomized double-blind controlled trial carried out to compare inhaled corticosteroids with placebo in school-age children.
Primary endpoint was the mean forced expiratory volume in 1 second (FEV1) evaluated at the end of the 6 months of follow-up.
Hypothetical difference to be detected: 0.10\(l\) with a sd = 0.25\(l\).
\(\alpha = 5\%\), \(1 - \beta = 80\%\)
Theoretical computation:
Simulation-based computation:
sim_d <- function(seed, n) {
mu_t <- 1.64
mu_c <- 1.54
set.seed(seed)
tibble(group = rep(c("control", "treatment"), each = n)) |>
mutate(
treatment = ifelse(group == "control", 0, 1),
y = ifelse(group == "control",
rnorm(n, mean = mu_c, sd = 0.25),
rnorm(n, mean = mu_t, sd = 0.25)
)
)
}
n_sim <- 500
s <-
tibble(seed = 1:n_sim) |>
mutate(d = map(seed, sim_d, n = 100)) |>
mutate(fit = map(d, ~ lm(y ~ treatment, data = .x) |>
broom::tidy()))
power_obt <-
s |>
select(-d) |>
unnest_longer(fit) |>
filter(fit$term == "treatment") |>
mutate(signif = fit$p.value < 0.05) |>
summarise("simulated power (%)" = sum(signif / 500 * 100))| n | simulated power (%) |
|---|---|
| 200 | 81.2 |
Randomized cluster trial: 10 pediatricians are asked to recruit 10 patients in each group.
The model used to generate the data is a linear mixed model with a random intercept for the pediatrician’s clustering.
The theoretical treatment effect is the same.
\(\sigma_{clust} = 0.08\), \(\sigma_{res} = 0.24\)
sim_d_mix <- function(seed, n) {
Pediatrician <- as.factor(rep(1:10))
set.seed(seed)
# Random sd intercept for pediatrician
Ped_sd <- 0.08
Subject <- as.factor(rep(1:n))
Condition <- c("Treatment", "Control")
# creates "design matrix" for the data
X <- expand.grid(Subject = Subject, Pediatrician = Pediatrician, Condition = Condition)
X <- as_tibble(X)
X <- X |>
mutate(
Condition_dummy = case_when(
Condition == "Control" ~ 0,
Condition == "Treatment" ~ 1
),
# add simulated random effect for the cluster
Pediatrician_re = rep(rep(rnorm(10, mean = 0, sd = Ped_sd), each = n), 2)
)
# fixed intercept and coeff to be defined for simulation
b <- c(1.54, 0.1)
# Residual sd to be defined for simulation
sd_res <- sqrt((0.25^2 - 0.08^2))
resid <- rep(rnorm(200, 0, sd_res))
# create outcome
X <- as.data.frame(X)
FEV1 <- b[1] + b[2] * X$Condition_dummy + X$Pediatrician_re + resid
# Final simulated data set
data_sim <- cbind(X, FEV1)
}
n_sim <- 100
s_lmer <-
tibble(seed = 1:n_sim) |>
mutate(data_w = map(seed, sim_d_mix, n = 10)) |>
# Apply mixed model
mutate(fit = map(data_w, ~ lmer(FEV1 ~ Condition + (1 | Pediatrician), data = .x))) |>
# Compute treatment effect pvalue using Kenward Rodger df
mutate(emm_fit = map(fit, ~ emmeans(.x, pairwise ~ Condition))) |>
# Extract pvalue
mutate(p_val_sim = map(emm_fit, ~ as.data.frame(.x$contrasts)[1, 6]))
power_obt <-
s_lmer |>
summarise("simulated power (%)" = sum(p_val_sim < 0.05) / 100 * 100)| n | Design | simulated power (%) |
|---|---|---|
| 200 | Cluster RCT | 88 |
data_lost <- function(data, ltfup, n) {
data <- as_tibble(data) |>
arrange(Condition, Pediatrician, desc(FEV1)) |>
mutate(remdata = rep(c(rep("Yes", times = ltfup), rep("No", times = (n-ltfup))), 20)) |>
filter(!(Condition == "Treatment" & remdata == "Yes"))
}
n_sim <- 100
s_lmer_lost <-
tibble(seed = 1:n_sim) |>
mutate(data_w = map(seed, sim_d_mix, n = 10)) |>
mutate(data_mod = map(data_w, ~ data_lost(.x, ltfup = 1, n = 10))) |>
mutate(fit = map(data_mod, ~ lmer(FEV1 ~ Condition + (1 | Pediatrician), data = .x))) |>
mutate(emm_fit = map(fit, ~ emmeans(.x, pairwise ~ Condition))) |>
mutate(p_val_sim = map(emm_fit, ~ as.data.frame(.x$contrasts)[1, 6]))
power_obt_lost <-
s_lmer_lost |>
summarise("simulated power (%)" = sum(p_val_sim < 0.05) / 100 * 100)| n | Design | simulated power (%) |
|---|---|---|
| 200 | Cluster RCT with lost to follow-up | 48 |
Sample size computation allows for control of Type II error rate.
It is an important part of study planning and it should be justified.
Depending on the complexity of the study design and the applied statistical model it can be done using formulas or simulations.
It remains a theoretical computation based on several assumptions.