Method described in paper of morisot and colleagues

pool_morisot(preds_list, by_vars)

Arguments

preds_list

A list of length equal to number of imputed datasets, containing the imputation-specific predictions. Each element should be a dataframe containing columns "prob" (probability), "se" (standard error of probability) and any other variables which identify groups of predictions (to be used in by_vars)

by_vars

Vector of variable names to pool across

Examples

set.seed(1234)

# This represents a prediction for patients A and B made
# across 20 imputed datasets 
preds_list <- replicate(
  n = 20, 
  simplify = FALSE,
  expr = {
    cbind.data.frame(
      "prob" = runif(2, min = 0.25, max = 0.5),
      "se" = runif(2, min = 0.01, max = 0.05),
      "patient" = c("A", "B")
    )
  }
)

preds_list[c(1, 2)]
#> [[1]]
#>        prob         se patient
#> 1 0.2784259 0.03437099       A
#> 2 0.4055749 0.03493518       B
#> 
#> [[2]]
#>        prob         se patient
#> 1 0.4652288 0.01037983       A
#> 2 0.4100777 0.01930202       B
#> 

# Pool the probabilities
pool_morisot(preds_list, by_vars = "patient")
#>    patient  p_pooled    CI_low    CI_upp
#> 1:       A 0.3456252 0.2259118 0.5045596
#> 2:       B 0.3737131 0.2421776 0.5459951