Correlations

Why does the average data set not serve as a good basis for analysis?

Linear models

Model comparisons

Stepwise modeling

Conditional means

Mean differences: ANOVA

This is the fifth exercise in the series.

In this exercise we will walk you through different ways of combining inferences based on multiply imputed data sets.

All the best,

Gerko and Stef

1. Open R and load the following packages and fix the random seed.

library(mice) # Data imputation
library(dplyr) # Data manipulation
library(magrittr) # Flexible piping in R
library(purrr) # Flexible functional programming
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document if you follow the order and code of the exercises precisely.

2. Impute the boys data properly with passive imputation for bmi with the following parameters:

m = 10 for 10 imputed datasets.
maxit = 6 to give the algorithm 6 iterations to obtain a stable solution.
print = FALSE to omit printing of the iteration and imputation history.

We will use this data to go through the workflow of data analysis with mids (multiply imputed data set) objects.

We start by creating the method vector and specify the passive imputation of bmi

meth <- make.method(boys)
meth["bmi"] <- "~ I(wgt / (hgt / 100)^2)"

Then we remove bmi as a predictor for hgt and wgt to avoid circularity (bmi feeding back into hgt and wgt.

pred <- make.predictorMatrix(boys)
pred[c("hgt", "wgt"), "bmi"] <- 0
pred

##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   0  1   1   1  1   1
## wgt   1   1   0   0  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

and we run the mice algorithm again with the new predictor matrix (we still ‘borrow’ the imputation methods object meth from before)

imp <-mice(boys, 
           meth = meth, 
           pred = pred, 
           print = FALSE, 
           m = 10, 
           maxit = 6)

We use the multiply imputed data set imp from now on.

Correlations

3. Calculate a correlation between all continuous variables for the imputed boys data

There are two ways in which we can calculate the correlation on the imputed data:

The wrong way: calculate an estimate over the average imputed dataset .

Quite often people are suggesting that using the average imputed dataset - so taking the average over the imputed data set such that any realized cell depicts the average over the corresponding data in the imputed data - would be efficient and conform Rubin’s rules. This is not true. Doing this will yield false inference.

To demonstrate this, let’s ceate the averaged data set and exclude the non-numerical columns:

ave <- imp %>%
  mice::complete("long") %>%
  group_by(.id) %>%
  summarise_all(.funs = mean) %>%
  select(-.id, -.imp, -phb, -gen, -reg)

head(ave)

## # A tibble: 6 x 6
##     age   hgt   wgt   bmi    hc    tv
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.035  50.1  3.65  14.5  33.7   2.2
## 2 0.038  53.5  3.37  11.8  35     3.3
## 3 0.057  50    3.14  12.6  35.2   2.3
## 4 0.06   54.5  4.27  14.4  36.7   3.7
## 5 0.062  57.5  5.03  15.2  37.3   1.7
## 6 0.068  55.5  4.66  15.1  37     1.9

If we now calculate Pearson’s correlation, rounded to two digits:

cor.wrong <- ave %>%
  cor() %>%
  round(digits = 2)

we obtain:

cor.wrong

##      age  hgt  wgt  bmi   hc   tv
## age 1.00 0.98 0.95 0.63 0.86 0.86
## hgt 0.98 1.00 0.94 0.60 0.91 0.80
## wgt 0.95 0.94 1.00 0.79 0.84 0.86
## bmi 0.63 0.60 0.79 1.00 0.59 0.64
## hc  0.86 0.91 0.84 0.59 1.00 0.67
## tv  0.86 0.80 0.86 0.64 0.67 1.00

The correct way: calculate an estimate for each imputed dataset and average over the estimates

It is best to do a Fisher transformation before pooling the correlation estimates - and a backtransformation afterwards. Therefore we define the following two functions that allow us to transform and backtransform any value:

fisher.trans <- function(x) 1/2 * log((1 + x) / (1 - x))
fisher.backtrans <- function(x) (exp(2 * x) - 1) / (exp(2 * x) + 1)

Now, to calculate the correlation on the imputed data

cor <- imp %>%
  mice::complete("all") %>%
  map(select, -phb, -gen, -reg) %>%  
  map(stats::cor) %>%
  map(fisher.trans)
cor

## $`1`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1960239 1.838270 0.7334134 1.2838424 1.1631232
## hgt 2.1960239       Inf 1.772436 0.6824682 1.5317094 1.0253561
## wgt 1.8382697 1.7724360      Inf 1.0689689 1.2150216 1.1748799
## bmi 0.7334134 0.6824682 1.068969       Inf 0.6752152 0.7012375
## hc  1.2838424 1.5317094 1.215022 0.6752152       Inf 0.7728062
## tv  1.1631232 1.0253561 1.174880 0.7012375 0.7728062       Inf
## 
## $`2`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1910071 1.836386 0.7403549 1.2825782 1.1872138
## hgt 2.1910071       Inf 1.770646 0.6881484 1.5333194 1.0317803
## wgt 1.8363859 1.7706464      Inf 1.0791743 1.2170311 1.2035503
## bmi 0.7403549 0.6881484 1.079174       Inf 0.6812123 0.7323799
## hc  1.2825782 1.5333194 1.217031 0.6812123       Inf 0.7642128
## tv  1.1872138 1.0317803 1.203550 0.7323799 0.7642128       Inf
## 
## $`3`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1944834 1.837795 0.7345437 1.2855709 1.1760091
## hgt 2.1944834       Inf 1.771040 0.6824284 1.5347226 1.0140508
## wgt 1.8377948 1.7710398      Inf 1.0711550 1.2192675 1.2097216
## bmi 0.7345437 0.6824284 1.071155       Inf 0.6766605 0.7465336
## hc  1.2855709 1.5347226 1.219267 0.6766605       Inf 0.7633598
## tv  1.1760091 1.0140508 1.209722 0.7465336 0.7633598       Inf
## 
## $`4`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1933569 1.832186 0.7383965 1.2794967 1.1604424
## hgt 2.1933569       Inf 1.771422 0.6888356 1.5325723 1.0080486
## wgt 1.8321856 1.7714224      Inf 1.0814855 1.2127519 1.1513295
## bmi 0.7383965 0.6888356 1.081486       Inf 0.6760717 0.6850460
## hc  1.2794967 1.5325723 1.212752 0.6760717       Inf 0.7622345
## tv  1.1604424 1.0080486 1.151329 0.6850460 0.7622345       Inf
## 
## $`5`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1987207 1.836872 0.7075307 1.2800161 1.1363762
## hgt 2.1987207       Inf 1.770040 0.6561143 1.5257563 0.9885557
## wgt 1.8368723 1.7700401      Inf 1.0301078 1.2085744 1.1260021
## bmi 0.7075307 0.6561143 1.030108       Inf 0.6454349 0.6516133
## hc  1.2800161 1.5257563 1.208574 0.6454349       Inf 0.7389882
## tv  1.1363762 0.9885557 1.126002 0.6516133 0.7389882       Inf
## 
## $`6`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1997805 1.837028 0.7320565 1.2800198 1.1764657
## hgt 2.1997805       Inf 1.774865 0.6841765 1.5250343 1.0380368
## wgt 1.8370278 1.7748650      Inf 1.0703198 1.2176427 1.2172060
## bmi 0.7320565 0.6841765 1.070320       Inf 0.6848232 0.7315825
## hc  1.2800198 1.5250343 1.217643 0.6848232       Inf 0.7739781
## tv  1.1764657 1.0380368 1.217206 0.7315825 0.7739781       Inf
## 
## $`7`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1923815 1.837884 0.7329401 1.2744194 1.1944465
## hgt 2.1923815       Inf 1.772881 0.6812806 1.5277814 1.0401921
## wgt 1.8378837 1.7728807      Inf 1.0660449 1.2148237 1.1894146
## bmi 0.7329401 0.6812806 1.066045       Inf 0.6740203 0.6902522
## hc  1.2744194 1.5277814 1.214824 0.6740203       Inf 0.7813761
## tv  1.1944465 1.0401921 1.189415 0.6902522 0.7813761       Inf
## 
## $`8`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1977784 1.838968 0.7357619 1.2732988 1.1827927
## hgt 2.1977784       Inf 1.774272 0.6860998 1.5230193 1.0196310
## wgt 1.8389681 1.7742719      Inf 1.0717699 1.2107916 1.1976843
## bmi 0.7357619 0.6860998 1.071770       Inf 0.6783695 0.7342019
## hc  1.2732988 1.5230193 1.210792 0.6783695       Inf 0.7637315
## tv  1.1827927 1.0196310 1.197684 0.7342019 0.7637315       Inf
## 
## $`9`
##           age      hgt      wgt       bmi        hc        tv
## age       Inf 2.194798 1.839937 0.7240744 1.2705573 1.1647717
## hgt 2.1947979      Inf 1.773808 0.6719150 1.5229541 1.0309059
## wgt 1.8399373 1.773808      Inf 1.0500938 1.2052134 1.1872160
## bmi 0.7240744 0.671915 1.050094       Inf 0.6549673 0.6854105
## hc  1.2705573 1.522954 1.205213 0.6549673       Inf 0.7652942
## tv  1.1647717 1.030906 1.187216 0.6854105 0.7652942       Inf
## 
## $`10`
##           age       hgt      wgt       bmi        hc        tv
## age       Inf 2.1924482 1.836024 0.7264512 1.2824257 1.1477097
## hgt 2.1924482       Inf 1.773047 0.6762927 1.5330661 1.0060617
## wgt 1.8360237 1.7730470      Inf 1.0574169 1.2184621 1.1541765
## bmi 0.7264512 0.6762927 1.057417       Inf 0.6761177 0.6954423
## hc  1.2824257 1.5330661 1.218462 0.6761177       Inf 0.7733603
## tv  1.1477097 1.0060617 1.154177 0.6954423 0.7733603       Inf

The object cor is a list over the imputations where each listed index is a correlation matrix. To calculate the average over the correlation matrices, we can add the listed indices and divide them by :

cor.rect <- Reduce("+", cor) / length(cor) # m is equal to the length of the list
cor.rect <- fisher.backtrans(cor.rect)

If we compare the wrong estimates in cor.wrong

cor.wrong

##      age  hgt  wgt  bmi   hc   tv
## age 1.00 0.98 0.95 0.63 0.86 0.86
## hgt 0.98 1.00 0.94 0.60 0.91 0.80
## wgt 0.95 0.94 1.00 0.79 0.84 0.86
## bmi 0.63 0.60 0.79 1.00 0.59 0.64
## hc  0.86 0.91 0.84 0.59 1.00 0.67
## tv  0.86 0.80 0.86 0.64 0.67 1.00

with the correct estimates in cor.rect

round(cor.rect, digits = 2)

##      age  hgt  wgt  bmi   hc   tv
## age  NaN 0.98 0.95 0.62 0.86 0.82
## hgt 0.98  NaN 0.94 0.59 0.91 0.77
## wgt 0.95 0.94  NaN 0.79 0.84 0.83
## bmi 0.62 0.59 0.79  NaN 0.59 0.61
## hc  0.86 0.91 0.84 0.59  NaN 0.64
## tv  0.82 0.77 0.83 0.61 0.64  NaN

We see that the wrong estimates in cor.wrong have the tendency to overestimate the correlation coefficient that is correctly combined following Rubin’s rules.

The correct estimates have a diagonal of NaN’s, because the tranformation of a correlation of 1 yields Inf and the backtransformation of Inf has no representation in real number space. We know the diagonal is supposed to be 1, so we can simply correct this

diag(cor.rect) <- 1
cor.rect

##           age       hgt       wgt       bmi        hc        tv
## age 1.0000000 0.9755061 0.9505194 0.6234031 0.8562776 0.8239305
## hgt 0.9755061 1.0000000 0.9438769 0.5913737 0.9102522 0.7699732
## wgt 0.9505194 0.9438769 1.0000000 0.7874385 0.8378628 0.8278038
## bmi 0.6234031 0.5913737 0.7874385 1.0000000 0.5864837 0.6077652
## hc  0.8562776 0.9102522 0.8378628 0.5864837 1.0000000 0.6445590
## tv  0.8239305 0.7699732 0.8278038 0.6077652 0.6445590 1.0000000

Why does the average data set not serve as a good basis for analysis?

In FIMD v2, paragraph 5.1.2 Stef mentions the following:

The average workflow is faster and easier than the correct methods, since there is no need to replicate the analyses times. In the words of Dempster and Rubin (1983), this workflow is

seductive because it can lull the user into the pleasurable state of believing that the data are complete after all.

The ensuing statistical analysis does not know which data are observed and which are missing, and treats all data values as real, which will underestimate the uncertainty of the parameters. The reported standard errors and p-values after data-averaging are generally too low. The correlations between the variables of the averaged data will be too high. For example, the correlation matrix in the average data are more extreme than the average of the correlation matrices, which is an example of ecological fallacy. As researchers tend to like low p-values and high correlations, there is a cynical reward for the analysis of the average data. However, analysis of the average data cannot give a fair representation of the uncertainties associated with the underlying data, and hence is not recommended.

So, please stay away from averaging the imputed data sets. Instead, use the correct workflow of analyzing the imputed sets seperately and combining the inference afterwards.

Linear models

4. Fit the following linear model on the imputed data:

lm(age ~ wgt + hgt)

fit1.lm <- imp %>%
  with(lm(age ~ wgt + hgt))

est1.lm <- pool(fit1.lm)
est1.lm

## Class: mipo    m = 10 
##          term  m    estimate         ubar            b            t dfcom
## 1 (Intercept) 10 -7.48484461 5.891627e-02 1.727421e-04 5.910628e-02   745
## 2         wgt 10  0.07227995 3.478153e-05 2.500248e-07 3.505655e-05   745
## 3         hgt 10  0.10645255 1.089061e-05 6.405888e-08 1.096107e-05   745
##         df         riv      lambda         fmi
## 1 739.9900 0.003225194 0.003214825 0.005897997
## 2 733.4812 0.007907280 0.007845246 0.010539557
## 3 735.7374 0.006470233 0.006428638 0.009118555

summary(est1.lm)

##          term    estimate   std.error statistic       df p.value
## 1 (Intercept) -7.48484461 0.243117839 -30.78690 739.9900       0
## 2         wgt  0.07227995 0.005920857  12.20768 733.4812       0
## 3         hgt  0.10645255 0.003310751  32.15360 735.7374       0

5. Now expand the linear model from (4) with a squared term for hgt:

lm(age ~ wgt + hgt + I(hgt^2))

fit2.lm <- imp %>%
  with(lm(age ~ wgt + hgt + I(hgt^2)))

est2.lm <- pool(fit2.lm)
est2.lm

## Class: mipo    m = 10 
##          term  m      estimate         ubar            b            t dfcom
## 1 (Intercept) 10 -4.0401500544 2.481985e-01 4.533153e-04 2.486972e-01   744
## 2         wgt 10  0.0312162871 5.970293e-05 3.323328e-07 6.006850e-05   744
## 3         hgt 10  0.0386114111 8.520263e-05 2.418903e-07 8.546871e-05   744
## 4    I(hgt^2) 10  0.0003604169 2.120409e-09 6.822827e-12 2.127914e-09   744
##         df         riv      lambda         fmi
## 1 740.2754 0.002009065 0.002005036 0.004690434
## 2 735.2608 0.006123084 0.006085820 0.008778403
## 3 739.1093 0.003122900 0.003113178 0.005799809
## 4 738.6361 0.003539464 0.003526980 0.006214209

summary(est2.lm)

##          term      estimate    std.error statistic       df      p.value
## 1 (Intercept) -4.0401500544 4.986955e-01 -8.101437 740.2754 2.220446e-15
## 2         wgt  0.0312162871 7.750387e-03  4.027707 735.2608 6.217697e-05
## 3         hgt  0.0386114111 9.244929e-03  4.176496 739.1093 3.313683e-05
## 4    I(hgt^2)  0.0003604169 4.612932e-05  7.813186 738.6361 1.909584e-14

Model comparisons

6. Compare the models from (4) and (5) to see which model would yield the best fit:

We have three choices for evaluation:

The multivariate Wald test

D1(fit2.lm, fit1.lm) # multivariate Wald test

##    test statistic df1      df2 dfcom      p.value         riv
##  1 ~~ 2  61.04588   1 739.7552   744 1.909584e-14 0.003539464

The statistic requires a variance covariance matrix for the estimates.

The Combining test statistic

D2(fit2.lm, fit1.lm) # combining test statistics

##    test statistic df1      df2 dfcom     p.value         riv
##  1 ~~ 2  60.99329   1 466721.9    NA 5.77316e-15 0.004410659

The requires only the test statistics (e.g. p-values or values) and hence is more flexible to apply than the statistic: But this comes at a cost as the resulting inference is less informed by the data than under the statistic.

The Likelihood ratio test

D3(fit2.lm, fit1.lm) # likelihood ratio test

##    test statistic df1      df2 dfcom      p.value         riv
##  1 ~~ 2  58.99007   1 268897.9   744 1.587619e-14 0.003368425

For large sample size, is equivalent to , however, does not require the covariances of the complete data estimates. It is the preferred method for testing random effects, and connects to global fit statistics in structural equation models. The likelihood ratio test does not require normality. For large riv (i.e. values > 10), the statistics must be taken with a grain of salt. In general, given what we know today, the statistic may be slightly more efficient than for small samples (i.e. cases); for larger samples (i.e. cases) the and appear equally good and a choice between them is mostly a matter of convenience –> see also paragraph 5.3.4 in FIMD v2 for a comparison on when to use , and/or .

Stepwise modeling

7. Fit a stepwise linear model to predict hgt seperately to each of the imputed data sets.

We can use the step() function in R to select formula-based models. We start by specifying the scope of the evaluations:

scope <- list(upper = ~ age + wgt + hc + gen + phb + tv + reg,
              lower = ~ 1)

The scope specifies the upper bound of the model (all variable to run the selection on) and lower bound of the mode, where a 1 indicates an intercept only model.

We can then specify the expressions to be evaluated:

expr <- expression(f1 <- lm(hgt ~ 1),
                   f2 <- step(f1, 
                              scope = scope, 
                              direction = "forward",
                              trace = 0
                              ))

where f1 is the linear model to be evaluated and f2 the step() function that evaluates the f1 function. Finally, we apply the with() function to evaluate the expression expr on object imp:

fit <- with(imp, expr)

The fit object contains the model evaluations To calculate the times each variable was selected, we can run:

formulas <- lapply(fit$analyses, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
table(votes)

## votes
## age gen  hc phb reg  tv wgt 
##  10   3  10  10   1   6  10

We see that reg is only used in 1 models based on the 10 imputed datasets. tv is used in 6 of the models and gen is used in 3 of the 10 completed-data models. wgt, hc, age and phb are used in all models.

To determine if gen should be a part of the final model, we may run a multivariate Wald test:

fit.gen <- with(imp, lm(hgt ~ age + hc + phb + wgt + gen))
fit.nogen <- with(imp, lm(hgt ~ age + hc + phb + wgt))
D1(fit.gen, fit.nogen)

##    test statistic df1     df2 dfcom   p.value       riv
##  1 ~~ 2  1.086821   4 168.946   735 0.3647302 0.5946789

With a p-value of .059 we could conclude that gen is not strictly needed in this model. We might also investigate the BIC on the seperate imputed sets and compare those across (not within) models. We draw the same conclusion from this evaluation - the BIC is lower for the model without gen - although not by much. But then again, the p-value indicated the same trend.

BIC.gen <- fit.gen$analyses %>%
  sapply(BIC) 

BIC.nogen <- fit.nogen$analyses %>%
  sapply(BIC)

To count the model evaluations in favor of BIC.nogen –> better fit means lower BIC:

BIC.gen

##  [1] 5119.745 5096.645 5122.573 5063.054 5123.554 5083.536 5084.813 5102.367
##  [9] 5068.139 5109.723

BIC.nogen

##  [1] 5099.175 5081.218 5098.136 5057.492 5106.433 5072.734 5060.464 5081.877
##  [9] 5047.834 5096.033

sum(BIC.gen > BIC.nogen)

## [1] 10

Please not that we can compare the BIC only over the models, not over the imputed data sets. The realized imputations differ for each set, but for each imputed set, the model comparison is based on the same realization. The sum(BIC.gen > BIC.nogen) compares only the BIC’s against its counterpart for the same imputed data set. The resulting outcome can be considered as a majority vote: in our case 10 out of 10 model comparisons are in favor of the model without gen as a predictor.

Conditional means

8. Calculate the average mean for bmi for every region reg over the imputed data.

To study the means for bmi conditionally on reg to get a picture of what the differences are:

imp %>%
  mice::complete("long") %>%
  select(reg, bmi) %>%
  group_by(reg) %>%
  summarise_all(.funs = mean)

## # A tibble: 5 x 2
##   reg     bmi
##   <fct> <dbl>
## 1 north  19.3
## 2 east   17.9
## 3 west   17.9
## 4 south  17.7
## 5 city   18.2

To also obtain information about the standard error of the mean, we could extend the summarise_all() evaluation wit a custom standard error function:

se <- function(x){
  sd(x) / sqrt(length(x))
}

and then calculate the summary again

imp %>%
  mice::complete("long") %>%
  select(reg, bmi) %>%
  group_by(reg) %>%
  summarise_all(list(~ mean(.), ~se(.)))

## # A tibble: 5 x 3
##   reg    mean     se
##   <fct> <dbl>  <dbl>
## 1 north  19.3 0.120 
## 2 east   17.9 0.0742
## 3 west   17.9 0.0603
## 4 south  17.7 0.0698
## 5 city   18.2 0.0978

Mean differences: ANOVA

9. Test whether the means differ.

This is best done with an ANOVA. To do this correctly, we can apply the following workflow. First, we fit an intercept only model

fit.empty <- imp %>%
  mice::complete("all") %>%
  map(lm, formula = bmi ~ 1)

and then we fit the model with reg as a predictor

fit.reg <- imp %>%
  mice::complete("all") %>%
  map(lm, formula = bmi ~ 1 + reg)

We can calculate the seperate ANOVA’s from each fitted model:

aov.empty <- lapply(with(imp, lm(age ~ 1))$analyses, aov)
aov.reg <- lapply(with(imp, lm(age ~ 1 + reg))$analyses, aov)

And look at the summaries:

lapply(aov.empty, summary)

## [[1]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[2]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[3]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[4]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[5]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[6]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[7]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[8]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[9]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53               
## 
## [[10]]
##              Df Sum Sq Mean Sq F value Pr(>F)
## Residuals   747  35503   47.53

The summary for the aov.empty object has no p-values. This is expected as we are fitting an empty model (without any predictors) and hence have no model Mean Squares (MS) to calculate and test the ratio

We do have those components for the model with reg included as predictor:

lapply(aov.reg, summary)

## [[1]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    745  186.37   3.984 0.00332 **
## Residuals   743  34758   46.78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[2]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    691  172.71   3.686 0.00556 **
## Residuals   743  34813   46.85                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[3]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.21   4.046 0.00298 **
## Residuals   743  34747   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[4]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    750  187.61   4.011 0.00316 **
## Residuals   743  34753   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[5]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    756  188.94    4.04 0.00301 **
## Residuals   743  34748   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[6]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    750  187.49   4.008 0.00318 **
## Residuals   743  34753   46.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[7]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    762  190.60   4.076 0.00282 **
## Residuals   743  34741   46.76                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[8]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    695  173.84   3.711 0.00533 **
## Residuals   743  34808   46.85                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[9]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    757  189.36   4.049 0.00296 **
## Residuals   743  34746   46.76                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[10]]
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## reg           4    744  185.99   3.976 0.00337 **
## Residuals   743  34759   46.78                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We find that each of the seperate ANOVA’s indicates significance, meaning that there is an overall effect for reg on bmi in each imputed data set.

To obtain an overall estimate for the ANOVA we can simply compare the empty model to the model with reg included:

D1(fit.reg, fit.empty)

##    test statistic df1     df2 dfcom     p.value        riv
##  1 ~~ 2  4.504619   4 735.114   743 0.001334955 0.01786418

And find that indeed the overall (i.e. pooled) effect over the imputations is also significant, which is not surprising as the F-values for the seperate tests show little variation.

- End of exercise

`mice`: combining inferences

Gerko Vink and Stef van Buuren

Multiple Imputation in Practice

Correlations

Why does the average data set not serve as a good basis for analysis?

Linear models

Model comparisons

Stepwise modeling

Conditional means

Mean differences: ANOVA

mice: combining inferences

Gerko Vink and Stef van Buuren

Multiple Imputation in Practice

Correlations

Why does the average data set not serve as a good basis for analysis?

Linear models

Model comparisons

Stepwise modeling

Conditional means

Mean differences: ANOVA

`mice`: combining inferences