performance-measures.Rmd
Recall that the analysis models of interest were the cause-specific Cox proportional hazards models for relapse (REL) and non-relapse mortality (NRM), hk(t|X,Z)=hk0(t)exp(βkX+γkZ) for k={1,2}. We then had two main sets of estimands of interest:
Define j=1,2,...,nsim simulation replications, which begin by simulating an independent dataset for each j according to some parametrisation, defined by a scenario. In our notation, we suppress l=1,2,...,L corresponding to the simulation scenarios.
Let θ represent an element of θregr. At each simulation replication, M imputed datasets are created for each of the four imputation-based methods described in section 5.3 of the paper. In each of these M datasets, both cause-specific Cox models are fit. The regression coefficients and their standard errors are then pooled according to Rubin’s rules - yielding a vector ˆθj=[ˆθj, ^SE(ˆθj)]. For the complete case analysis, ˆθj simply contains the estimated coefficient and standard error from the models fit on the complete-cases (no pooling involved). We then define the performance measures as follows:
Mean: ˆθ=1nsimnsim∑j=1ˆθj
Standard error: ^SE(ˆθ)=1nsimnsim∑j=1^SE(ˆθj)
Empirical standard error:
^EmpSE(ˆθ)=√1nsim−1nsim∑j=1(ˆθj−ˆθ)2
Bias: ^Bias(ˆθ)=1nsimnsim∑j=1ˆθj−θ
Coverage: ^Cov(ˆθ)=1nsimnsim∑j=11{ˆθlow,j<θ<ˆθupp,j} where the bounds of the 95% confidence interval ˆθlow,j and ˆθupp,j are computed as ˆθj±zα/2×^SE(ˆθj) for the complete-case analysis, whereas for the imputation methods they are based on the t distribution - see confint.mipo.
Root mean square error: ^RMSE(ˆθ)=√1nsim−1nsim∑j=1(ˆθj−θ)2
Monte Carlo standard errors for all measures except RMSE were computed as per the formulas in the tutorial by Morris, White, and Crowther (2019). The Monte Carlo standard error for the RMSE was computed by using the approximate jackknife estimator implemented in the simhelpers package - see the relevant vignette.
To obtain the predicted probabilities when using the imputation methods, the cause-specific models fitted in each imputed dataset are used to create predictions, which are then pooled using Rubin’s rules. For computational reasons, standard errors were not recorded, and so the pooling simply involved averaging the probabilities across imputed datasets. Letting θ instead represent an element of θpred, the pooled probability at replication j of a simulation scenario is defined as
ˆθj=1MM∑m=1ˆθm, where ˆθm is the predicted probability obtained in the mth imputed dataset.
The performance measures for the predicted probabilities are the same as those outlined in the previous section, with the exceptions of Standard error and Coverage (since they were not recorded).