# 插補次數和最大迭代次數如何影響多重插補的準確性？

`MICE`的幫助頁面將功能定義為：

``````mice(data, m = 5, method = vector("character", length = ncol(data)),
predictorMatrix = (1 - diag(1, ncol(data))),
visitSequence = (1:ncol(data))[apply(is.na(data), 2, any)],
form = vector("character", length = ncol(data)),
post = vector("character", length = ncol(data)), defaultMethod = c("pmm",
"logreg", "polyreg", "polr"), maxit = 5, diagnostics = TRUE,
printFlag = TRUE, seed = NA, imputationMethod = NULL,
defaultImputationMethod = NULL, data.init = NULL, ...)
``````

Let's just go through the parameters one by one:

• `data` doesn't require explanation
• `m` is the number of imputations, generally speaking, the more the better. Originally (following Rubin, 1987) 5 was considered to be enough (hence the default). So from an accuracy point of view, 5 may be sufficient. However, this was based on an efficiency argument only. In order to achieve better estimates of standard errors, more imputations are needed. These days there is a rule of thumb to use whatever the average percentage rate of missingness is - so if there is 30% missing data on average in a dataset, use 30 imputations - see Bodner (2008) and White et al (2011) for further details.
• `method` specifies which imputation method is to be used - this only necessary when the default method is to be over-ridden. For example, continuous data are imputed by predictive mean matching by default, and this usually works very well, but Bayesian linear regression, and several others including a multilevel model for nested/clustered data may be specified instead. Hence, expert/clinical/statistical knowledge may be of use in specifying alternatives to the default method(s).
• `predictorMatrix` is a matrix which tells the algorithm which variables predict missingness in which other variables. `mice` uses a default based on correlations between variables and the proportion of usable cases if this is not specified. Expert/clinical knowledge may be very useful in specifying the predictor matrix, so the default should be used with care.
• `visitSequence` specifies the order in which variables are imputed. It is not usually needed.
• `form` is used primarily to aid the specification of interaction terms to be used in imputation, and isn't normally needed.
• `post` is for post-imputation processing, for example to ensure that positive values are imputed. This isn't normally needed.
• `defaultMethod` changes the default imputation methods, and is not normally needed
• `maxit` is the number of iterations for each imputation. `mice` uses an iterative algorithm. It is important that the imputations for all variables reach convergence, otherwise they will be inaccurate. By inspecting the trace plots generated by `plot()` this can be visually determined. Unlike other Gibbs sampling methods, far fewer iterations are needed - generally in the region of 20-30 or less as a rule of thumb. When the trace lines reach a value and fluctuate slightly around it, convergence has been achieved. The following is an example showing healthy convergence, taken from here :

Here, 3 variables are being imputed with 5 imputations (coloured lines) for 20 iterations (x-axis on the plots), the y-axis on the plots are the imputed values for each imputation.

• `diagnostics` produces useful diagnostic information by default.

• `printFlag` outputs the algorithm progress by default which is useful because the estimated time to completion can easily be ascertained.

• `seed` is a random seed parameter which is useful for reproducibility.

• `imputationMethod` and `defaultImputationMethod` are for backwards compatibility only.

Bodner, Todd E. (2008) “What improves with increased missing data imputations?” Structural Equation Modeling: A Multidisciplinary Journal 15: 651-675. https://dx.doi.org/10.1080/10705510802339072

Rubin, Donald B. (1987) Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

White, Ian R., Patrick Royston and Angela M. Wood (2011) “Multiple imputation using chained equations: Issues and guidance for practice.” Statistics in Medicine 30: 377-399. https://dx.doi.org/10.1002/sim.4067