^{*}

In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, ^{−1/2}, where ^{−6} to 0.01. As

Donald B. Rubin's 1987 book

Based on this RE, Rubin drew the following conclusion: If

For a limited number of imputations in MI,

where

where

where the subscript

As m approaches infinity, the following relationship can be deduced from

In recent years, there is undeniable evidence that a much greater number of imputations, e.g. 40 or more, are needed in order to obtain reliable statistical inferences [

Why would the apparently sufficient m as suggested by the

Fraction of missing information sounds similar to fraction of missing data (

Using the 2012 Physician Workflow Mail Survey (PWS12) of the National Ambulatory Medical Care Survey (NAMCS), the relationship between

Conducted by the National Center for Health Statistics (NCHS), the NAMCS Physician Workflow Mail Survey (PWS) was a nationally representative, 3-year (2011-2013) panel mail survey of office-based physicians, with each year being a complete survey cycle [

Four levels of

Hot deck imputation [

Anal_V had four values. They are CONTROL, REGION, PRIMEMP and DERIVED (

When Rubin deduced the sufficient m using the ^{−6}.

The data used for this paper were from 1440

Judging by the magnitude of the F values, Anal_V was by far the most important factor affecting

Anal_V's dominating effect on

The linear model

From the ^{2} and the variance of the sample means

If all 2567 values were randomly drawn during the MI process, then
^{2} can be approximated by

and

Since

Rubin's conclusion of

Should the

Rubin made the following statement in his 1987 book [_{0} is equal to the expected fraction of observations missing in the simple case of scalar _{i}_{i}_{0}is the same as the _{i} is the value of Y for the

As indicated by

As we see from

Using the real survey data from PWS12, MI was performed at

Anal_V had the dominating effect on

The linear increase of

Rubin stated that

The magnitude of

The authors sincerely thank Dr. Alan H. Dorfman, Office of Research and Methodology (ORM), NCHS, CDC, USA, for his valuable suggestions on the research and critical text editing of the paper.

The views of this paper do not necessarily reflect the views of the National Center for Health Statistics (NCHS) or the Centers for Disease Control and Prevention (CDC) of the United States government.

Effects of Anal_V on the

Effect of different categories of PRIMEMP on the

Effects of Imp_V on the

Characteristics of the imputation variables (Imp_V).

Imp_V | Description | Mean | Value range | Variance |
---|---|---|---|---|

SIZE100 | Practice size as represented by the number of physicians. | 11.41 | 1 - 100 | 483.02 |

SIZE5 | Practice size recoded from SIZE100: 1 = Solo practice; 2 = Twophysicians; 3 = 3 to 5 physicians; 4 = 6 - 10 physicians;5 = 11+ physicians. | 3.06 | 1 - 5 | 1.97 |

SIZE20 | Practice size recoded from SIZE100: 1 - 19 = The actual number of physicians; 20 = 20+ physicians. | 6.47 | 1 - 20 | 38.26 |

Description of the analytic treatments (Anal_V).

Anal V | Description | Value range |
---|---|---|

CONTROL | No analytic variable | Not applicable |

REGION | Region of the physician interview office | 1 = Northeast, 2 = Mid West, 3 = South, 4 = West |

PRIMEMP | Primary present employment of the physician | 11 = AMA-Self-emp, solo prac; 13 = AMA-Two phy. prac; 20 = AOA-Office prac. solo; 21 = AMA-Oth pat care/AOA-Off prac. partnp; 22 = AOA-Office prac group; 23 = AOA-Offcpracofc employee; 30 = AMA-Grp prac/AOA-Off prac HMO staff; 31 = AOA-Office prac. walk-in clinic; 35 = AMA-HMO; 40 = AMA-Medical school; 64 = AMA-County/Cty/State Govt Other; 97 = AOA-other office or clinic practice; 110 = AMA-No classification; 200 = Sampled CHC |

DERIVED | Derived categories for SIZE5, SIZE20, and SIZE100 | Regrouping the values with random errors added to each group. 1 to 4 for SIZE5, 1 to 9 for SIZE20, 1 to 17 for SIZE 100 |

Mean

Anal_V | Imp_V | ||
---|---|---|---|

| |||

SIZE5 | SIZE20 | SIZE100 | |

CONTROL | 0.000038 | 0.000034 | 0.000033 |

REGION | 0.000171 | 0.000166 | 0.000169 |

PRIMEMP | 0.002212 | 0.001876 | 0.001756 |

DERIVED | 0.000805 | 0.004115 | 0.009253 |

Results of ANOVA, showing the effects of

Source of variation | DF | Variance | F value | P > F |
---|---|---|---|---|

Model | 47 | 0.00042800 | 3387.0 | 0.0001 |

3 | 0.00028103 | 2223.9 | 0.0001 | |

Anal_V | 3 | 0.00292646 | 23158.3 | 0.0001 |

Imp_V | 2 | 0.00077419 | 6126.6 | 0.0001 |

9 | 0.00024690 | 1953.8 | 0.0001 | |

6 | 0.00005859 | 463.7 | 0.0001 | |

Anal_V × Imp_V | 6 | 0.00088711 | 7020.1 | 0.0001 |

18 | 0.00005827 | 461.1 | 0.0001 | |

Error | 1392 | 0.00000013 |

Ranges of

Item | AnalV | Imp_V | |
---|---|---|---|

Minimum | 0.0017 | 0.000043 | 0.0017 |

Maximum | 0.0035 | 0.005822 | 0.0040 |

Difference% = 100 × (max – min)/min | 105 | 13439 | 135 |

Minimum | 0.0017 | 0.000043 | 0.0017 |

The regression coefficient (b) and the corresponding t value for the linear model

Anal V | Imp V | b (Regression coefficient) | t value for _{0}: | P for t |
---|---|---|---|---|

SIZE5 | 0.00000206 | 26.88 | <0.0001 | |

CONTROL | SIZE20 | 0.00000202 | 29.58 | <0.0001 |

SIZE100 | 0.00000221 | 31.98 | <0.0001 | |

SIZE5 | 0.00001383 | 53.64 | <0.0001 | |

REGION | SIZE20 | 0.00001403 | 59.82 | <0.0001 |

SIZE100 | 0.00001426 | 61.64 | <0.0001 | |

SIZE5 | 0.00027159 | 23.19 | <0.0001 | |

PRIMEMP | SIZE20 | 0.00027602 | 31.38 | <0.0001 |

SIZE100 | 0.00025355 | 26.92 | <0.0001 |