Examine the feasibility of performing an item response theory (IRT) analysis on two of the Centers for Disease Control and Prevention health-related quality of life (CDC HRQOL) modules – the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM). Previous principal components analyses confirm that the two scales both assess a mix of mental (CDC-MH) and physical health (CDC-PH). The purpose is to conduct item response theory (IRT) analysis on the CDC-MH and CDC-PH scales separately.
2182 patients with self-reported or physician-diagnosed arthritis completed a cross-sectional survey including HDCM and HDSM items. Besides global health, the other 8 items ask the number of days that some statement was true; we chose to recode the data into 8 categories based on observed clustering. The IRT assumptions were assessed using confirmatory factor analysis and the data could be modeled using an unidimensional IRT model. The graded response model was used for IRT analyses and CDC-MH and CDC-PH scales were analyzed separately in flexMIRT.
The IRT parameter estimates for the five-item CDC-PH all appeared reasonable. The three-item CDC-MH did not have reasonable parameter estimates.
The CDC-PH scale is amenable to IRT analysis but the existing The CDC-MH scale is not. We suggest either using the 4-item Healthy Days Core Module (HDCM) and the 5-item Healthy days Symptoms Module (HDSM) as they currently stand or the CDC-PH scale alone if the primary goal is to measure physical health related HRQOL.
Arthritis is the one of the leading causes of disability with an estimated 67 million US adults with doctor-diagnosed arthritis by 2030 [
HRQOL has been measured among people with arthritis in several ways. The Medical Outcomes Survey 36-item and 12-item short-form surveys (SF-36 and SF-12) are the best known HRQOL measures [
In 1995, the CDC added 10 additional items creating two optional modules (including the 5-item Healthy Days Symptom Module (HDSM) and the 5-item Activity Limitation Module [
In 2006, Mielenz et al. reported a classical test theory analyses of the 4-item HDCM and the 5-item HDSM [
The goal of the current research is to conduct item response theory (IRT) analyses of the items assessing the CDC-PH and CDC-MH scales separately. Based on these previous analyses, both scales were comprised of a mix of items assessing physical and mental health. We felt it would be most useful to isolate the items assessing each construct as previously reported and try to create separate IRT scores representing physical and mental health.
The methods for this study have been previously published and they are briefly summarized here [
The NC-FM-RN is a practice-based network devoted to research on chronic diseases in primary care [
A total of 1820 participants completed surveys (1139 from the NC-FM-RN and 681 from the MSK). Details about the non-respondents and this response rate are published; a total of 631 participants were removed from the denominators of each sample due to incorrect addresses (
The HDCM includes: 1) Would you say that in general your health is :[five responses ranging from excellent to poor], 2) Now thinking about your physical health, which includes physical illness and injury, for how many days during the past 30 days was your physical health not good? [the number of days in the past 30], 3) Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? [the number of days in the past 30], 4) During the past 30 days for about how many days did poor physical or mental health keep you from doing your usual activities, such as self-care, work or recreation? [the number of days in the past 30], [From
The CDC-PH includes these five items described above: HDCM 1), 2) and 4) and HDSM 1) and 5) [
IRT is a set of models that describe the process by which individuals respond to items. Put another way, IRT is analogous to a factor analysis where the relationships between the measured variables and the latent construct are nonlinear [
Some of the advantages of IRT include: 1) detailed item level information, 2) more accurate estimates of precision of individual scores, 3) item parameters that are not sample dependent, and 4) IRT is the segue into computerized adaptive testing [
The natural response scale for the HDCM and HDSM is number of the past 30 days that a statement was true. This leaves 31 possible response categories. An analysis of the observed responses suggests that the bulk of respondents are using far fewer than 31 categories. For example, roughly 72 % of the responses for the second item on the HDCM fell into one of seven categories (0,5,10,15,20,25,30). Similar trends were found in the other items considered here as well. We chose to recode the data into eight categories which, along with the original observed frequencies, are described in Table Observed Frequencies and recoding scheme for the HDCM and HDSM HDCM items 2-4 = 4-item Healthy Days Core Module (HDCM 1: self-rated global health with five responses is not shown here), HDCM 2: Physical health not good, HDCM 3: Mental health not good, HDCM 4: Poor health; HDSM = 5-item Healthy Days Symptoms Module, HDSM 1: Pain limited activities, HDSM 2: Depressed, HDSM 3: Stress HDSM 4: Not enough rest, HDSM 5: Full of energy (reversed scored)Freq HDCM 2 HDCM 3 HDCM 4 HDSM 1 HDSM 2 HDSM 3 HDSM 4 HDSM 5 0 339 588 761 469 580 456 327 654 1 45 56 43 49 80 66 26 32 2 91 123 82 97 136 131 87 68 3 86 83 55 71 95 102 72 56 4 67 44 41 51 44 52 50 35 5 123 128 98 91 129 147 109 124 6 25 21 15 17 19 24 29 16 7 43 38 33 28 37 35 38 26 8 26 13 18 14 20 12 25 17 9 7 4 6 6 5 2 4 5 10 143 137 106 120 137 142 157 138 11 0 1 3 2 5 1 0 1 12 13 12 10 13 11 9 26 9 13 1 3 4 3 2 2 2 4 14 24 13 17 10 7 11 14 6 15 141 117 99 103 106 118 134 126 16 5 2 3 0 2 2 2 4 17 3 1 3 1 1 3 3 1 18 11 4 5 8 4 5 6 6 19 1 1 1 0 0 0 2 3 20 155 113 105 133 93 106 165 98 21 10 4 2 6 3 1 9 7 22 5 3 3 1 3 8 4 3 23 2 1 1 5 2 1 5 8 24 3 4 5 3 0 1 1 6 25 54 50 54 64 47 45 95 91 26 4 2 5 4 4 2 2 13 27 5 3 2 8 5 2 3 18 28 15 16 6 14 6 14 15 44 29 4 6 3 2 4 4 1 12 30 315 177 185 385 177 264 366 107 Missing 54 52 46 42 56 52 41 82
For the kind of IRT analysis described in this paper, a critical assumption is that of unidimensionality. If a scale is unidimensional, then responses to that scale arise from only one underlying trait. A closely related assumption is that of local independence. Local independence implies that, conditional on the latent trait being measured, item responses are independent from one another. The unidimensionality and local independence assumptions can be assessed using confirmatory factor analysis (CFA).
The CFA models were estimated in LISREL using polychoric correlations and diagonally weighted least squares (DWLS) [
After listwise deletion, the sample size for the CFA analyses was
After the factor analyses described above, we were confident that the data could be modeled using a unidimensional IRT model. The CDC-MH and CDC-PH scales were analyzed separately in flexMIRT. Multidimensional calibrations were also conducted in flexMIRT and the impact on the estimated parameters was negligible, which is not surprising given the particular structure of these data. The estimated correlation between the two dimensions was 0.65, consistent with the CFA findings. The estimation procedure used in flexMIRT is able to accommodate missing data without resorting to list-wise (or pair-wise) deletion, so we did not have to remove participants who had missing data. This resulted in a sample size of
The general health item, which is the first item on the CDC-PH, has five response categories. All other items, in both the CDC-PH and CDC-MH, have eight response categories as detailed above. The GRM produces one fewer threshold parameter than number of categories, which means there are four threshold parameters for the general health item and seven threshold parameters for all remaining items. Each item also has a slope parameter, which in the GRM is allowed to freely vary over items. The parameter estimates for the mental and physical health items can be found in Table Parameter estimates from the GRM for the CDC-Physical and Mental Health
aHDSM 5 Full of energy was reversed scored for all analysesItem description a b1
b2
b3
b4
b5
b6
b7
CDC-physical health HDCM 1: 2.24 −2.35 −1.05 0.12 1.51 - - - General health HDCM 2: 3.52 −0.98 −0.44 −0.06 0.24 0.52 0.86 1.04 Physical health not good HDCM 4: 2.77 −0.27 0.08 0.38 0.66 0.92 1.24 1.49 Poor health HDSM 1: 2.97 −0.74 −0.27 0.01 0.26 0.47 0.75 0.94 Pain limited activities HDSM 5: 1.73 −2.2 −1.34 −0.97 −0.61 −0.27 0.10 0.47 Full of energya
CDC-Mental Health HDCM 3: 4.10 −0.47 0.01 0.34 0.64 0.89 1.18 1.4 Mental health not good HDSM 2: 7.71 −0.44 0.05 0.38 0.67 0.89 1.13 1.32 Sad, blue, depressed HDSM 3: 3.45 −0.71 −0.11 0.25 0.54 0.78 1.03 1.19 Worried, tense, anxious
A primary interest in initially examining the item parameters in Table
Figure Trace line plot for the general health item from the HDCM
In addition to the trace lines, IRT produces several summary measures related to reliability. Two of these measures, information and standard error, are most often presented graphically. Figure Information and standard error curves for the CDC-PH scale
For example, a theta estimate (or IRT scale score) of 0 on the CDC-PH scale would have a standard error of 0.28. This means that a 95 % confidence interval on that participant’s score would range from −0.55 to 0.55. In contrast, consider a participant who received an IRT scale score of 2.4 (i.e., 2.4 standard deviations above the mean) on the CDC-PH scale. The information curve is much lower in this portion of the construct, which is reflected in the standard error for this score being 0.65. The same 95 % confidence interval on this participant’s score stretches over a much wider area, from 1.12 to 3.68. The less information a scale provides at a given level of theta, the less sure we can be about the accuracy of the score, which is reflected in the larger standard errors.
The parameter estimates for the five-item CDC-PH scale all appeared reasonable. The CDC-PH items provide reasonably reliable scores for individuals with arthritis from 1.5 standard deviations below the mean to 2 standard deviations above the mean for this latent construct. We cannot recommend using IRT with the CDC-MH scale at this time. The three items on the CDC-MH scale did not have reasonable parameter estimates. In particular, the second item, which asks about depression, has an estimated slope greater than seven thus we strongly advise against using these parameters.
To our knowledge, the CDC HRQOL measures have not been analyzed using IRT. Jiang and Hesser (2009) used the 9-items of these two Healthy Days scales (4-item HDCM and the 5-item HDSM) as indicators to assess the association between these HRQOL indicators and health risk factors [
Another potential limitation is the recoding scheme used with the healthy days modules. The CDC has previously proposed a recoding scheme using the following cut points: 1) 0 days, 2) 1–2 days, 3) 3–7 days, 4) > =8 days [
We did not explore sensitivity to change in this cross-sectional study and future longitudinal studies should do this. As we learn more about the properties of individual items and the scales they comprise, it becomes possible to use this information when designing scales. For instance, if we knew
Our arthritis population was quite heterogeneous, including patients with established osteoarthritis or rheumatoid arthritis to those saying yes to the presence of joint symptoms in the previous month. This can be considered both a strength and a limitation of this study. Representation in the tails of a distribution can provide more data to estimate item-parameters which are related to those tails (e.g., high or low b-values). However, this can also indicate that the normality assumption for the population is not reasonable. IRT-based item parameters are related to the population from which the sample was drawn. Although there are many possible populations that would be of interest, this population has the advantage of generalizing to a broad clinical spectrum including patients from primary care settings to specialty clinics (both orthopedics and rheumatology) across a fairly diverse state. We also did not consider differential item functioning (DIF), which occurs when the relationship between items and construct(s) varies across some other variable (e.g., disease status, gender, etc.). To the extent that researchers would like to use the physical health scale to compare across different disease populations it will be important to look for DIF across these groups in future studies, as the presence of DIF can bias group comparisons [
The analyses conducted support the feasibility of performing IRT analyses on the 5-item CDC-PH scale; and lend additional support to the notion that the CDC-PH scale is a solid measure of physical HRQOL in arthritis populations. We did not find the 3-item CDC-MH useful by itself. The results suggest that, at least in this population, an IRT approach with this scale is not advised.
health-related quality of life
Centers for Disease Control and Prevention
Healthy Days Core Module
Healthy Days Symptom Module
item response theory
North Carolina Family Medicine Resource Network
musculoskeletal database
graded response model
confirmatory factor analysis
diagonally weighted least squares
Root Mean Square Error of Approximation
Comparative Fit Index
physical health scale
mental health scale
The authors declare that they have no competing interests.
TM: Conception and design, acquisition of data, analysis and interpretation of data, drafting manuscript, revising manuscript, and final approval of version to be published. LC: Conception and design, acquisition of data, revising manuscript, and final approval of version to be published. ME: Conception and design, analysis and interpretation of data, drafting manuscript, revising manuscript, and final approval of version to be published.
This research was supported by a 2005 North Carolina Chapter’s Arthritis Foundation New Investigator Award. A 2001 Arthritis Foundation New Investigator Award supported the original data collection.
This research was supported in part by Grant 1 R49 CE002096‐01 from the National Center for Injury Prevention and Control, Centers for Disease Control and Prevention to the Center for Injury Epidemiology and Prevention at Columbia University. This research was also supported in part by the Malka Fund.
Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health and Centers for Disease Control and Prevention. This research was supported in part by Contract Number 1IP2PI000797-01 from the Patient-Centered Outcomes Research Institute.
The North Carolina Family Medicine Research Network (NC-FM-RN) is an organization dedicated to fostering practice-based research. The North Carolina Health Project (NCHP) is a practice-based cohort of adult patients who were enrolled by the NC-FM-RN from a sample of Family Practices in North Carolina. Projects are jointly sponsored by the Department of Family Medicine, the Thurston Arthritis Research Center, and the Cecil G. Sheps Center for Health Services Research at the University of North Carolina at Chapel Hill, in collaboration with the North Carolina Academy of Family Physicians. The project co-directors are Leigh Callahan, PhD, and Philip Sloane, MD, MPH. Participating Family Practices have included: Biddle Point Health Center, Charlotte; Bladen Medical Associates, Elizabethtown; Blair Family Medicine, Wallace; Chatham Primary Care, Siler City; Community Family Practice, Asheville; Dayspring Family Medicine, Eden; Goldsboro Family Physicians, Goldsboro; Henderson Family Health Center, Hendersonville; North Park Medical Center, Charlotte; Orange Family Medical Center, Hillsborough (pilot site); Person Family Medical Center, Roxboro; Robbins Family Practice, Robbins; South Cabarrus Family Physicians, Harrisburg, Concord, Mt. Pleasant & Kannapolis; and Summerfield Family Practice, Summerfield.
We would also like to thank the following physicians for encouraging their patients to participate in our database and outcomes studies: H. Vann Austin, Franc Barada, Robert Berger, Mary Anne Dooley, William Gruhn, Robert Harrell, Tatiana Huguenin, Beth Jonas, Joanne Jordan, Fathima Kabir, Elliott Kopp, Andrew Laster, Kara Martin, Gwenesta Melton, Nicholas Patrone, Kate Queen, Westley Reeves, Hanno Richards, Alfredo Rivadeneira, William Rowe, Gordon Senter, Paul Sutej, Claudia Svara, Anne Toohey, William Truslow, John Winfield, and William Yount.
Special thanks go to Robert DeVellis, PhD, Shannon Currey, PhD, Jennifer Milan Polinski, MPH, Britta Schoster, MPH, Katherine Buysse, BA, Matthew Morrison, BA.