81090735668Med Decis MakingMed Decis MakingMedical decision making : an international journal of the Society for Medical Decision Making0272-989X1552-681X22009666374991010.1177/0272989X11418671NIHMS498706ArticleAgreement About Identifying Patients Who Change Over Time: Cautionary Results in Cataract and Heart Failure PatientsFeenyDavidPhDThe Center for Health Research, Kaiser Permanente Northwest and Health Utilities IncorporatedSpritzerKarenBADepartment of Medicine, University of California, Los AngelesHaysRon DPhDDepartment of Medicine, University of California, Los AngelesLiuHonghuPhDSchool of Dentistry, University of California, Los AngelesGaniatsTheodore G.MDDepartment of Family and Preventive Medicine, University of California, San DiegoKaplanRobert M.PhDDepartment of Health Services Research, University of California, Los AngelesPaltaMariPhDDepartment of Population Health Sciences, University of Wisconsin-MadisonFrybackDennis G.PhDDepartment of Population Health Sciences, University of Wisconsin-MadisonContact Information for Corresponding Author and Reprint Requests, David Feeny, The Center for Health Research, Kaiser Permanente Northwest, 3800 North Interstate Avenue, Portland, OR 97227-1110 USA, Telephone: (503) 528-3937, FAX: (503) 335-2428, david.feeny@kpchr.org38201318102011Mar-Apr20122282013322273286Background

Preference-based measures of health-related quality of life all use the same dead = 0.00 to perfect health = 1.00 scale, but there are substantial differences among measures.

Objective

The objective is to examine agreement in classifying patients as better, stable, or worse.

Design

The EQ-5D, Health Utilities Index Mark 2 and Mark 3, Quality of Well-Being – Self-Administered, Short-Form 36 (Short-Form 6D), and disease-targeted measures were administered prospectively in two clinical cohorts.

Setting

The study was conducted at academic medical centers: University of California, Los Angeles; University of California, San Diego; University of Wisconsin-Madison; and University of Southern California.

Patients

Patients undergoing cataract extraction surgery with lens replacement completed the 25-item National Eye Institute Visual Function Questionnaire (NEI-VFQ-25). Patients newly refereed to congestive heart failure specialty clinics completed the Minnesota Living with Heart Failure Questionnaire (MLHF).

Measurements

In both cohorts subjects completed surveys at baseline, one and six months. The NEI-VFQ-25 and MLHF were used as gold standards to assign patients to categories of change. Agreement was assessed using kappa.

Results

376 cataract patients were recruited. Complete data for baseline and the one-month follow-up were available on all measures for 210 cases. Using criteria specified by Altman, agreement was poor for six of nine pairs of comparisons and fair for three pairs. 160 heart failure patients were recruited. Complete data for baseline and the six-month follow-up were available for 86 cases. Agreement was negligible for five pairs and fair for one.

Limitations

The study was conducted on selected patients at a few academic medical centers.

Conclusions

The results underscore the lack of interchangeability among different preference-based measures.

National Institute on Aging : NIAP01 AG020679 || AG
Introduction

Preference-based measures of health-related quality of life (HRQL) are needed for monitoring population health and for program evaluation for comparative effectiveness research. Most importantly, these measures are required for estimating quality-adjusted life years (QALYs). A number of widely used generic preference-based measures are available such as the EQ-5D (1), Health Utilities Index Mark 2 (HUI2) and Mark 3 (HUI3)(2), the Quality of Well-Being - Self-Administered scale (QWB-SA) (3),and Short-Form 6D (SF-6D) (4;5). Although these measures share a common core (6;7) and all include items on mobility, mental health, and pain, there are also important differences with respect to which attributes (dimensions or domains of health status) are included. HUI and the QWB-SA include vision, hearing, speech, and dexterity; the EQ-5D and SF-6D do not. The QWB-SA is unique in that it includes 58 symptoms or health problems, only some of which are included in the other measures. These measures also differ in the range of function or symptom severity covered in each attribute. The QWB-SA asks respondents if they have or do not have a problem such as pain and stiffness; in contrast HUI and SF-6D have gradients such as the categories mild, moderate, and severe pain.

These measures also differ with respect to the methods that were used to elicit preference scores with which to estimate their respective multi-attribute scoring functions, the methods for estimating those functions, and their functional forms (8). For instance, the QWB-SA scoring function is based on valuations using the visual analog scale (VAS) and a linear additive scoring function. SF-6D is based on the standard gamble (SG) and an ad hoc modified linear additive functional form. EQ-5D is based on the time trade-off (TTO) and an ad hoc modified linear additive functional form. HUI is based on transformed VAS and SG scores and a multiplicative functional form.

It is therefore not surprising that several investigators that have used two or more measures have concluded that the scores from these measures are not interchangeable (914). Further there is evidence from prospective studies that the estimates of absolute and/or relative change (responsiveness, including effect size (ES) and the standardized response mean (SRM)) (15) often do not agree (12;1619).

The objective of this paper is to examine agreement among the above measures in classifying patients into the same categories of change: We want to know if the measures agree on which patients get better, remain stable, or get worse. Data from two prospective cohort studies that employed all five of the above measures as well as disease-targeted measures are used to assess agreement among these measures: one study of patients undergoing cataract surgery, the other of patients referred for treatment for congestive heart failure by a specialty clinic.

This paper builds on an earlier paper (19) based on the data from the same study. That paper provided cohort-level estimates of responsiveness (SRM) for each of the five preference-based measures in each of the two cohorts. Responsiveness varied among measures and across cohorts. Results from that paper underscore the lack of interchangeability of scores among these measures.

This paper asks an important follow-up question. Even if overall responsiveness differs among measures, do they agree on who gets better, who gets worse, and who was stable?

MethodsPatients

Subjects for both components of the study had to be at least 35 years of age, able to give informed consent, able to hear and understand instructions in English, and have sufficient vision and ability in reading and writing English to complete questionnaires (19). Cataract Surgery. Patients were undergoing cataract extraction surgery with lens replacement. Patients were excluded if undergoing simultaneous glaucoma, corneal, or vitro-retinal procedures, or if they were unable to read large print versions of questionnaires. Heart Failure. Patients were newly referred to congestive heart failure clinics. Inclusion criteria included evidence of the presence of heart failure for at least three months defined as a left ventricular ejection fraction less than 40%. Patients classified as Class IV in the New York Heart Association system, those with a recent (≤ six months) myocardial infraction, unstable angina, recent (≤ three months) coronary artery bypass graft surgery, those on the heart transplant list, or those with recent (≤ three months) ventricular tachycardia were excluded.

Participants were recruited from four academic medical centers: The University of California, Los Angeles (UCLA), the University of California, San Diego (UCSD), the University of Wisconsin, and the University of Southern California (cataract patients). The study was approved by the Institutional Review Boards at each of these institutions (UCLA IRB #G05-06-096-11; UCSD Project #070435; Wisconsin M-2005-1171; USC #HS-06-00493).

Procedures

At enrollment patients were given a packet of self-administered questionnaires to complete and mail back to the UCSD Health Services Research Center (HSRC) within seven days. The HSCR mailed out the same packet for the one- and six-month follow up surveys.

Measures

The study included five of the most commonly used preference-based measures (8). There is substantial evidence on the reliability, cross-sectional construct validity, and responsiveness (longitudinal construct validity) of each of these measures in a wide variety of applications. The study also used a widely used disease-targeted measure for vision (25-Item National Eye Institute Visual Function Questionnaire) (NEI-VFQ-25)(2022) and a prominent disease-targeted measure for heart failure (Minnesota Living with Heart Failure Questionnaire) (MLHF) (2326).

<italic>EQ-5D-3L</italic> (hereafter: EQ-5D)

The health-status classification system of EQ-5D includes five attributes (mobility, self-care, usual activity, pain/discomfort, and anxiety/depression) with three levels (no problem, some problem, extreme problem) per attribute (1). The EQ-5D also includes a visual analog scale (VAS) on which respondents provide a rating of their current overall health; the analyses reported here do not include the VAS scores. Health status at a point in time for a subject is described as a five-element vector, one level for each attribute. Preference-based scores for EQ-5D health states were derived using a scoring function based on TTO preferences elicited from a random sample of community dwelling residents of the United State and estimated with an ad hoc modified linear additive utility function (27). Scores are defined on the conventional scale in which dead = 0.00 and perfect health = 1.00; EQ-5D scores range from −0.11 (states worse than dead) to 1.00.

<italic>Health Utilities Index Mark 2 (HUI2) and Mark 3 (HUI3)</italic> (<xref ref-type="bibr" rid="R28">28</xref>;<xref ref-type="bibr" rid="R29">29</xref>)

HUI2 includes seven attributes: sensation [vision, hearing, speech], mobility, emotion, cognition, self-care, pain, and fertility. (The item on fertility was not administered in this study; fertility was assumed to be normal, level 1.) There are four or five levels per attribute in HUI2. The multiplicative HUI2 scoring function is based on preference elicitation using the VAS and SG from a random sample of community-dwelling subjects in Canada (28). Single-attribute utility scores are on a scale in which 0.00 is the score of the most disabled level in that attribute and 1.00 is the score for level 1, no problem or disability in that attribute. Overall HUI2 scores vary from −0.03 to 1.00. In addition to the overall HUI2 score, the single-attribute HUI2 sensation score was included in the analyses of data from the cataract cohort because of its relevance as a specific measure of visual function.

HUI3

HUI3 includes eight attributes (vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain and discomfort) with five or six levels per attribute. The multiplicative HUI3 scoring function is based on preference elicitation using the VAS and SG from a random sample of community dwelling subjects in Canada (29). Overall HUI3 scores vary from −0.36 to 1.00. In addition the overall HUI3 score, the single-attribute HUI3 vision score was included in the analyses of data from the cataract cohort because of its relevance.

Self-Administered Quality of Well-being Scale (QWB-SA)

The QWB-SA assesses self-reported functioning using a series of questions designed to record limitations in the previous three days, within three separate domains (mobility, physical activity, and social activity). In addition, QWB-SA includes a series of questions that ask about the presence or absence of different symptom/problem complexes. The four domain scores are combined into a total score that provides a numerical point-in-time expression of well-being that ranges from zero (0.00) for dead to one (1.00) for asymptomatic optimum functioning. The original QWB obtained preference ratings of 856 people from the general population (30). The QWB-SA used convenience samples to model preference for case descriptions and the models were shown to be highly correlated with the population ratings in the original QWB general population preferences elicitation survey. Scores range from 0.00 to 1.00; 0.09 is the minimum for a living health state. The self-administered QWB-SA has been shown to be highly correlated with the interviewer-administered QWB and to retain the psychometric properties. Extensive evaluation of reliability and validity have been published (3;3;3032).

Self-Rated Health (SRH)

The self-rated health item (33), “In general, would you say that your health is excellent, very good, good, fair, or poor,” is a widely used measure of overall health and was therefore included in the analyses.

Short-Form 6D (SF-6D)

SF-6D is a preference-based measure based on a subset of items from the SF-36 (or SF-12)(4;5;34). SF-6D includes six attributes (physical functioning, role limitations, social functioning, pain, mental health, and vitality) with four to six levels per attribute. The scoring function is based on SG preferences elicited from a random sample of community-dwelling subjects in the United Kingdom and estimated using an ad hoc linear additive functional form (4).

25-Item National Eye Institute Visual Function Questionnaire (NEI-VFQ-25)

The NEI-VFQ-25 was designed to capture the influence of vision on a number of dimensions of HRQL including emotional well-being and social functioning (2022). The NEI-VFQ-25 includes 25 items covering general health, general vision, near vision, distance vision, driving, peripheral vision, color vision, ocular pain, role limitations, dependency, social function, mental health, and expectations. The total score ranges from 0 to 100 with higher scores signifying better (less impaired) vision.

Visual Function Questionnaire - Utility (VFQ-UI)

Recently a preference-based index scoring system has been developed for the NEI-VFQ-25 (35) (Kowalski et al. submitted; Rentz et al. submitted), the VFQ-UI. The VFQ-UI includes a single item representing each of six domains of the NEI-VFQ: near vision (see well up close), distance vision (going out for films, sports events), role function (limited work time due to vision), mental health (worry about doing things that may embarrass because of vision), vision dependency (stay at home because of vision) and social function (see people’s reaction to things I say). The items were selected to cover a range of vision-related functioning using Rasch analyses on samples of patients with central vision loss or peripheral vision loss. The VFQ-UI defines eight vision-related health states ranging from no difficulty to stopped doing work scored on a 0.00 (dead) to 1.00 perfect health range using time-tradeoff derived preference scores.

Minnesota Living with Heart Failure Questionnaire (MLHF)

The MLHF includes 21 items covering symptoms, mental health, social life, fatigue, appetite, mobility, sleep, sexual activity, work and recreational activities, and side-effects of treatment (2326). Overall scores range from 0 to 105 with higher scores signifying greater impairment (lower HRQL).

Criteria for clinically important change

It is important to assess a measured change with respect both to its statistical significance and its clinical importance or magnitude. Guyatt et al. (p 377) (36) provide a definition of a clinically important difference: “The MID [minimum important difference] is the smallest difference in score in the domain of interest that patients perceive as important, either beneficial or harmful, and which would lead the clinician to consider a change in the patient’s management.” There are two major methods for determining the clinical importance of a given magnitude of change: anchor-based and distribution-based approaches (3643). In the anchor-based approach, the change in HRQL score is related to a known anchor. The anchor itself must be an independent measure and be readily interpretable such as the categories of the New York Heart Association functional classification system or ability to climb a flight of stairs. Further, there must be an appreciable association between the anchor and the target measure (36). In the distribution-based approach the magnitude of change is compared to some measure of the variability of scores. Cohen’s guidance on classifying effect sizes is an example: 0.20 small; 0.50 medium; 0.80 large (44). The anchor-based approach provides an estimate of clinically important change while the distribution-based approach provides a basis for translating raw score change into standardized units that can be used for comparisons with estimates from prior studies or existing rules of thumb(40).

For this study, a change of 0.03 or more in the overall preference score for each of the preference-based measures is interpreted as a clinically important change (2;8;11;37;4556). Empirical estimates of clinically important change (differences) for the five preference-based measures vary from 0.01 to 0.08 with 0.03 being well represented in estimates for each of these measures.

For the single-attribute utility scores for HUI2 sensation (which includes vision) and HUI3 vision the guideline for a clinically important difference is 0.05(2). For the NEI-VFQ-25, a change of 5.0 or more in the composite score on a 0 to 100 scale is regarded as clinically important (57). For the MLHF instrument, a change of 5.0 or more in the total score (0 to 105) is regarded as clinically important (2325). For self-rated health (SRH: excellent, very good, good, fair, poor), a movement of one or more categories is considered clinically important.

Statistical Analyses

Previous work (19) indicated that patients undergoing cataract surgery changed substantially between baseline and the one-month follow-up survey (after surgery) and were typically then stable in the period between the one- and six-month follow-ups. Analyses for the cataract cohort therefore focus on change between the baseline to one-month follow-up. Improvement was more gradual in the heart failure cohort (19). Analyses focus on the change between baseline and the six-month follow-up.

Measures of Agreement

Relative agreement in direction and size among change scores for the 10 measures used in the cataract cohort and seven measures used in the heart failure cohort was assessed using an intra-class correlation coefficient (ICC) based on a two-way mixed analysis of variance model (measures fixed, patients as random). Agreement between the disease-targeted measure (NEI-VFQ-25 for cataracts, MLHF for congestive heart failure) and each of the five (EQ-5D, HUI2, HUI3, QWB-SA, and SF-6D) preference-based measures and SRH as to whether patients had improved, were stable, or got worse was assessed using a number of measures including the per cent agreement, kappa (unweighted and weighted), and the Delta statistic, a measure of agreement that is less sensitive than kappa to the marginal distributions (58). The degree of agreement (kappa) were interpreted according to the criteria suggested by Altman (59): <0.20 poor; 0.21 – 0.40 fair; 0.41 – 0.60 moderate; 0.61 – 0.80 good; 0.81 – 1.00 very good. In addition, regarding the two disease-specific measures as gold standards, the sensitivity of each of the six generic measures to change on the disease-targeted instruments was estimated using receiver operating characteristic curves (ROC) analyses. The ROC analyses determine if the results are sensitive to the choice of the threshold for clinically important change (0.03) on the preference-based measures.

Primary analyses were conducted on a sub-set of subjects for whom there is complete data at baseline and the one month follow-up (cataract cohort) and baseline and the six-month follow-up (heart failure cohort) for all of the measured included in the analyses. Thus, any differences in agreement across measures will not be the result of differences in the subjects excluded due to missing data. Secondary data analyses were conducted for the larger sample size for which data at baseline and the designated follow-up (all available pairs with complete data) and the sample size vary by pair of measures.

Results

A total of 376 cataract patients and 160 heart failure patients were recruited to the study. The majority of patients were white, cataract patients tended to be female, heart failure patients tended to be male, most cataract patients were 65+, and the heart failure patients tended to be younger with the majority in the 45–64 age group (Table 1).

For the cataract cohort data for baseline and one-month follow up assessments were available for 315 of the 376 cases. Complete data for all pairs for all measures were available for 210 cases. The distribution of demographic variables for those with and without complete data was similar and there were no statistically significant differences between the two groups.

For the heart failure cohort data for baseline and the six-month follow-up assessments were available for 110 of the 160 cases. Complete data for all pairs for all measures were available for 86 cases. Those with missing data were older than those without missing data and the difference between the two groups was statistically significant.

The overwhelming majority of respondents, 93%, reported that no one helped them to complete the questionnaires; 7% reported receiving help. Among those who received any help, 90% reported that someone read the questions to them; 55% reported that someone wrote the answers on the questionnaire for them; 6% reported that someone answered the questions for them; 4% reported that someone translated the questions into their language for them; and 9% reported some other kind of help. Therefore the overwhelming majority of responses were based on self-completion and self-assessment.

Scores for each of the measures at baseline and one month and the change scores for the cataract cohort are displayed in Table 2. Note that the mean change in the total score for the NEI-VFQ-25 (VFQt) of 9.96 exceeds the guideline for a clinically important difference of 5.00. Similarly, the mean change in overall HUI3 scores exceeds the 0.03 clinically important difference guideline. The mean change in HUI3 vision score and HUI2 sensation scores exceed the 0.05 clinically important difference guideline. The mean changes in scores for EQ-5D, QWB-SA, SF-6D, and SRH are less than the guidelines for a clinically important difference. The distribution of change scores for the VFQt is displayed in Figure 1. Using the change of 5 or more in VFQt score as the criterion, 43% of patients improved, 52% were stable, and 4% got worse.

Scores for each of the measures at baseline and six months and the mean change scores for the heart failure cohort are displayed in Table 3. Note the mean change of 8.72 in score for the MLHF exceeds the 5.00 guideline for a clinically important difference. The mean change in QWB-SA, SF-6D, and HUI3 scores exceed the guideline while the mean changes in scores for the EQ-5D, HUI2, and SRH do not. The distribution of change scores for the MLHF is displayed in Figure 2. Using the change of 5 or more in MLHF score as the criterion, 47% of patients improved, 35% were stable, and 19% got worse.

Agreement among change scores

The ICC among the 10 measures of change in the cataract cohort was 0.16 (95% confidence interval: 0.02 – 0.29). The ICC among the seven measures of change in the heart failure cohort was 0.07 (95% confidence interval: −0.14 – 0.28).

Agreement Cataract cohort

The per cent agreement varies between 33% and 57% and is displayed in Table 4 along with simple and weighted kappa statistics for agreement between pairs of measures in classifying patients as improved, stable, or worse. The simple kappa statistics for six of the nine pairs are poor and the kappa statistics for three of the pairs are fair. Fair agreement was obtained between the NEI-VFQ-25 total scores and the vision-targeted measures: HUI2 sensation, HUI3 vision, and the VFQ-UI. The results for weighted kappa are very similar to the results for the simple kappa. Results for the delta statistics are also very similar, ranging from −0.05 (VFQt and SRH) to 0.31 (VFQt and HUI3 vision) to 0.40 (VFQt and VFQ-UI). Area under the curve results for the ROC analyses range from 0.44 (SRH) to 0.67 (HUI3 vision) to 0.72 (VFQ-UI). In many cases in the ROC analyses the area under the curve is less than 0.60, indicating agreement little better than one would expect by chance. These results indicate that the lack of agreement is not sensitive to the choice of cut points for clinically important differences. Finally, results from secondary analyses for n = 315 (subjects for whom observations on any measure was available at baseline and at the one-month follow-up) were very similar to the results reported in Table 4 (data not shown).

Agreement heart failure cohort

The per cent agreement varies between 19% and 49% and is displayed along with simple and weighted kappa statistics for agreement between pairs of measures in classifying patients as improved, stable, or worse are for the heart failure cohort in Table 5. The simple kappa statistics are negative for five pairs, indicating agreement less than that which would occur by chance. Agreement between the MLFH and SRH is fair. Results for the weighted kappa are very similar. The results for the delta statistics also indicate little agreement, ranging from −0.33 (QWB-SA) to 0.26 (SRH). Area under the curve results from the ROC analyses range from 0.31 (QWB-SA) to 0.73 (SRH) and indicate that the results are not sensitive to the choice of cut points. Finally, results from secondary analyses for n = 110 (subjects for whom observations on any measure was available at baseline and at the six-month follow-up) were very similar to the results reported in Table 5 (data now shown).

Agreement among measures on classification of patients as worse, stable, or improved

Results on the extent of agreement among measures in classifying patients as improved, stable, or deteriorated for the cataract cohort are found in Table 6. Analogous results for the heart failure cohort are found in Table 7. The lack of agreement among measures evident in the ICC results reported above is evident in Tables 6 and 7. Nonetheless, many observations are aligned “on the diagonal”, indicating that there is some agreement between the disease-specific measures, VFQ and MLHF, and each of the five preference-based measures, on which patients changed and which did not.

Discussion

There is very little pair-wise agreement between the disease-targeted measures and the five preference-based measures about which patients improved, were stable, or deteriorated. In general, agreement for the cataract cohort was poor and for the heart failure cohort negligible. For the cataract cohort, the agreement between the relevant HUI single-attribute (“disease-targeted”) scores and the NEI-VFQ-25, those for HUI2 sensation and HUI3 vision, were the exceptions; agreement was fair. Agreement was also fair between the utility scored and conventional versions of the NEI-VFQ-25. Given that both of these measures are based on the same questionnaire, it is perhaps surprising that the agreement is only fair and is not clearly higher than agreement between the NEI-VFQ-25 and HUI2 sensation and HUI3 vision. In the heart failure cohort, fair agreement was observed only for the SRH. On the basis of the ROC analyses, the results reported here appear to be robust to the choice of cut points for a clinically important change.

The agreement analyses treat the disease-targeted measure as the gold standard. Yet even though there is evidence of cross-sectional and longitudinal construct validity for the two disease-targeted measures, neither can be regarded as a true gold standard. Furthermore, that vision-related or heart-related HRQL improved does not necessarily imply that overall HRQL improved. It is possible that the side effects of interventions could more than offset the gains and therefore overall HRQL might not improve. It is also possible that even though vision- or heart-related HRQL improved, overall HRQL did not due to the burdens associated with comorbidities. The NEI-VFQ-25 asks subjects about a wide variety of difficulties that they might experience due to limited vision, including reading, hobbies, navigating, driving, going up and down stairs, interacting with others, dressing, and the amount of assistance the subject needs from others. Similarly, the MLHF asks about limitations in/problems with mobility, sexual activity, interacting with others, fatigue, hobbies, worry, concentration, memory, and depression that the subject experiences due to the subject’s heart condition. Although the breadth of coverage of these disease-targeted measures probably reduces the scope for a discrepancy between trends in vision- or heart-related HRQL and overall HRQL, it does not eliminate the possibility for such discrepancies.

The results on overall change in measures underscore that scores from these five preference-based measures are not interchangeable (Table 2). In the cataract cohort, using published guidelines (2;57) on clinically important differences/changes, clearly clinically important change was detected by the NEI-VFQ-25 and HUI3. Given that vision is included in HUI3 and that the NEI-VFQ-25 is a vision-targeted measure, this results is not surprising. But vision is included in the QWB-SA and the overall score did not reflect the gain in HRQL that was measured by the NEI-VFQ-25 and HUI3. Of course, it should be noted that only the “worst” symptom for that subject in the QWB-SA is used to compute the overall score and further that in a relatively elderly cohort, it is likely that many subjects were experiencing symptoms that are more burdensome than impaired vision and thus the vision item frequently did not affect the calculation of the QWB-SA score. Others have noted the lack of responsiveness of the EQ-5D to the effects of cataract surgery(60).

For the heart failure cohort, using published guidelines (2325;46;47;51) on clinically important differences/changes, the MLHF, HUI3, QWB-SA, and SF-6D recorded clinically important change. Fatigue and shortness of breath symptoms on the QWB-SA as well as the mobility, physical, and social activity scales may have captured some of the effects of heart failure on HRQL. Similarly, the physical functioning, vitality, and role attributes on SF-6D may have registered some of the effects. HUI3 ambulation may have performed similarly(61)

It should be noted that reliability is less than perfect for each of the measures used in the study (62), so disagreement between change scores is also influenced by measurement error and short-term fluctuations in health that are unrelated to the conditions of primary interest. Change over time is measured with even less precision than absolute scores at a point in time. Jones and Feeny (63) and Pickard et al. (64) found lower levels of agreement between proxy and self-report for change scores than was evident for cross-sectional comparisons of baseline and follow-up scores. Other investigators have pointed out that due to the size of measurement error typically found in HRQL measures, change must be often be quite substantial for measures to agree (62). Our results were not sensitive to the magnitude of change that was considered clinically important, but the magnitude of true underlying changes does influence the agreement that can be expected. Hence, some measures of change may have better agreement than found in our studies, when reflecting interventions with larger overall effects on HRQL.

Some limitations of the study should be noted. Because the analyses are based on subjects for whom both baseline and the designated follow-up assessments were available, the results are not necessarily representative of the experience of the entire inception cohorts. In the cataract cohort those with and without complete data were similar. In the heart failure cohort, those with missing data were older than those without missing data. If older patients experienced less improvement in HRQL than younger patients, it is possible that the estimate of change based on subjects for whom we had complete data is biased upwards. As noted in the Results, 7% of subjects had help in completing questionnaires so responses could have been influenced by others. Another limitation is that while the scoring functions for the QWB-SA and EQ-5D are based on preference scores from random samples of community-dwelling adults in the US, the scoring functions for HUI2 and HUI3 are based on preferences from random samples of the Canadian population and the function for the SF-6D is based on UK preferences. There is evidence of the generalizability of the QWB scoring function, (30;65;66) the HUI2 scoring function, (28;67) and the HUI3 scoring function.(29;6870) In contrast there is considerable variability across “national” EQ-5D scoring functions. Nonetheless having not relied exclusively on US-based scoring functions is unlikely to be an important factor influencing the results. Finally, we classified cataract and heart failure patients as changed if the absolute value of their change score was ≥5.0 whether or not the difference was statistically significant. Hays et al. (71) note that changes that are statistically significant at the level of an individual subject will typically exceed the guideline for a clinically important difference(72).

Conclusions

The results underscore the lack of interchangeability of scores among these five widely used preference-based measures. Not only are the absolute scores not necessarily interchangeable; in these results the change scores were also not interchangeable (12). The results also point to a lack of precision in estimating the magnitude of change in HRQL.

In making choices about which preference-based measure(s) to use in a study, investigators need to consider carefully the coverage of the health-status classification systems and the relevance of those systems to their clinical or population health application, evidence on the cross-sectional construct validity of the measures in that application, and evidence of the responsiveness (longitudinal construct validity) of the measures in that context. Further, users of the results of studies that have employed preference-based measures to assess HRQL need to interpret those results carefully.

Conflict of Interest. David Feeny has a proprietary interest in Health Utilities Incorporated, Dundas, Ontario, Canada. HUInc. distributes copyrighted Health Utilities Index (HUI) materials and provides methodological advice on the use of HUI. None of the other authors declare a conflict of interest.

An earlier version of the paper was presented at the 2010 meeting of Health Technology Assessment International, Dublin, June 6–9, 2010 and at the 17th Annual Meeting of the International Society for Quality of Life Research, London, October 27–30, 2010.

Acknowledgments

Supported by Grant P01AG020679 from the National Institute on Aging. Drs. Kaplan and Hays were also provided support by NIH grants 1 P01 AG020679-01A2, UCLA Claude D. Pepper, Older Americans Independence Center, NIH/NIA 5P30AG028748, and CDC Grant U48 DP000056-04. Dr. Hays also received support from the UCLA Resource Center for Minority Aging Research/Center for Health Improvement in Minority Elderly (P30AG021684) and the UCLA/DREW Project EXPORT (P20MD000148 and P20MD000182). The funding agreement ensured the independence of the authors in the design, conduct, interpretation, data, writing, and publishing of the paper. The granting agencies have neither read nor approved of the contents of the paper. The authors acknowledge the contributions of Steven Tally, UCSD, to the work reported here. The authors also appreciate the help of Barbara Brody MPH and Denise Herman, MD from UCSD, Nancy Sweitzer, MD, PhD, and Neal Barney, MD, from UW, Greg Fonerow, MD, and John Bartlett, MD, from UCLA for their collaboration on subject acquisition. An earlier version of the paper was presented at the 2010 meeting of Health Technology Assessment International, Dublin, June 6–9, 2010.

Grant Support. Supported by Grant P01AG020679 from the National Institute on Aging. Drs. Kaplan and Hays were also provided support by NIH grants 1 P01 AG020679-01A2, UCLA Claude D. Pepper, Older Americans Independence Center, NIH/NIA 5P30AG028748, and CDC Grant U48 DP000056-04. Dr. Hays also received support from the UCLA Resource Center for Minority Aging Research/Center for Health Improvement in Minority Elderly (P30AG021684) and the UCLA/DREW Project EXPORT (P20MD000148 and P20MD000182).

Reference ListRabinRde CharroFEQ-5D: A measure of health status from the EuroQol GroupAnnals of Medicine2001733533734311491192HorsmanJFurlongWFeenyDTorranceGThe Health Utilities Index (HUI®): concepts, measurement properties and applicationsHealth Qual Life Outcomes20031016115414613568KaplanRMAndersonJPSpilkerBThe General Health Policy Model: An Integrated ApproachQuality of Life and Pharmacoeconomics in Clinical Trials1996Second ed.PhiladelphiaLippincott-Raven Press309322BrazierJRobertsJDeverillMThe estimation of a preference-based measure of health from the SF-36J Health Econ2002321227129211939242BrazierJERobertsJThe estimation of a preference-based measure of health from the SF-12Med Care2004942985185915319610FrybackDGPaltaMCherepanovDBoltDKimJSComparison of 5 Health-Related Quality-of-Life Indexes Using Item Response Theory AnalysisMed Decis Making20091020CherepanovDPaltaMFrybackDGUnderlying Dimensions of the Five Health-Related Quality-of-Life Measures Used in Utility Assessment: Evidence From the National Health Measurement StudyMed Care2010848871872520613664FeenyDHFayersPHaysRPreference-based measures: Utility and quality-adjusted life yearsAssessing quality of life in clinical trials2005Second ed.OxfordOxford University Press405429MarraCAEsdaileJMGuhDKopecJABrazierJEKoehlerBEA comparison of four indirect methods of assessing utility values in rheumatoid arthritisMed Care20041142111125113115586840MarraCAMarionSAGuhDPNajafzadehMWolfeFEsdaileJMNot all “quality-adjusted life years” are equalJ Clin Epidemiol2007660661662417493521MarraCAWoolcottJCKopecJAShojaniaKOfferRBrazierJEA comparison of generic, indirect utility measures (the HUI2, HUI3, SF-6D, and the EQ-5D) and disease-specific instruments (the RAQoL and the HAQ) in rheumatoid arthritisSoc Sci Med200546071571158215652688FeenyDHWuLEngKComparing short form 6D, standard gamble, and Health Utilities Index Mark 2 and Mark 3 utility scores: results from total hip arthroplasty patientsQuality of Life Research20041213101659167015651537LuoNJohnsonJAShawJWFeenyDCoonsSJSelf-reported health status of the general adult U.S. population as assessed by the EQ-5D and Health Utilities IndexMedical Care20051143111078108616224300FrybackDGDunhamNCPaltaMHanmerJBuechnerJCherepanovDU.S. Norms for six generic health-related Quality of Life indexes from the National Health Measurement studyMedical Care20071245121162117018007166TerweeCBDekkerFWWiersingaWMPrummelMFBossuytPMOn assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluationQuality of Life Research2003612434936212797708MarraCARashidiAAGuhDKopecJAAbrahamowiczMEsdaileJMAre indirect utility measures reliable and responsive in rheumatoid arthritis patients?Qual Life Res200561451333134416047508HatoumHTBrazierJEAkhrasKSComparison of the HUI3 with the SF-36 preference based SF-6D in a clinical trial settingValue Health200497560260915367255McDonoughCMTostesonTDTostesonANJetteAMGroveMRWeinsteinJNA longitudinal comparison of 5 preference-weighted health state classification systems in persons with intervertebral disk herniationMed Decis Making2011331227028021098419KaplanRMTallySHaysRDFeenyDGaniatsTGPaltaMFive Preference-Based Indexes in Cataract and Heart-Failure Patients Were Not Equally Responsive to ChangeJournal of Clinical Epidemiology2011564549750620685077MangioneCMLeePPGutierrezPRSpritzerKBerrySHaysRDDevelopment of the 25-item National Eye Institute Visual Function QuestionnaireArch Ophthalmol2001711971050105811448327VarmaRWuJChongKAzenSPHaysRDImpact of severity and bilaterality of visual impairment on health-related quality of lifeOphthalmology200610113101846185316889831McDonnellPJMangioneCLeePLindbladASSpritzerKLBerrySResponsiveness of the National Eye Institute Refractive Error Quality of Life instrument to surgical correction of refractive errorOphthalmology200312110122302230914644711RectorTSA conceptual model of quality of life in relation to heart failureJ Card Fail2005411317317615812743RectorTSKuboSHCohnJNPatients’ self-assessment of their Congestive Heart Failure Part IIHeart Failure1987Oct-Nov198209RectorTSFrancisGSCohnJNPatients’ self-assessment of their Congestive Heart Failure. Part 1. Patient perceived dysfunction and its poor correlation with maximal exercise testsHeart Failure1987Oct-Nov192196GarinOFerrerMPontARueMKotzevaAWiklundIDisease-specific health-related quality of life questionnaires for heart failure: a systematic review with meta-analysesQuality of Life Research20092181718519052916ShawJWJohnsonJACoonsSJUS valuation of the EQ-5D health states: development and testing of the D1 valuation modelMed Care2005343320322015725977TorranceGWFeenyDHFurlongWJBarrRDZhangYWangQMulti-attribute utility function for a comprehensive health status classification system. Health Utilities Index Mark 2Med Care199673477027228676608FeenyDHFurlongWTorranceGWGoldsmithCHZhuZDePauwSMulti-attribute and single-attribute utility functions for the health utilities index mark 3 systemMedical Care2002240211312811802084KaplanRMBushJWBerryCCHealth status: types of validity and the index of well-beingHealth Serv Res19761144785071030700KaplanRMFroschDLDecision making in medicine and health careAnnu Rev Clin Psychol2005152555617716098KaplanRMAndersonJPPattersonTLMcCutchanJAWeinrichJDHeatonRKValidity of the Quality of Well-Being Scale for persons with human immunodeficiency virus infection. HNRC Group. HIV Neurobehavioral Research CenterPsychosom Med199535721381477792372IdlerELBenyaminiYSelf-rated health and mortality: A review of twenty-seven community studiesJournal of Health and Social Behavior199738121379097506WareJJr.KosinskiMKellerSDA 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validityMed Care199633432202338628042RevickiDRentzAMKowalskiJWChenWHAssessment of unidimensionality for the visual function questionnaire-utility index (VFQ-UI) items in patients with central vision lossQuality of Life Research201019Supplement 14950¬ Ref Type: AbstractGuyattGHOsobaDWuAWWyrwichKWNormanGRMethods to explain the clinical significance of health status measuresMayo Clin Proc2002477437138311936935HarrisonMJDaviesLMBansbackNJMcCoyMJVerstappenSMWatsonKThe comparative responsiveness of the EQ-5D and SF-6D to change in patients with inflammatory arthritisQual Life Res2009111891195120519777373RevickiDAOsobaDFaircloughDBarofskyIBerzonRLeidyNKRecommendations on health-related quality of life research to support labeling and promotional claims in the United StatesQuality of Life Research20009888790011284208RevickiDHaysRDCellaDSloanJRecommended methods for determining responsiveness and minimally important differences for patient-reported outcomesJournal of Clinical Epidemiology2008261210210918177782HaysRDFarivarSSLiuHApproaches and recommendations for estimating minimally important differences for health-related quality of life measuresCOPD2005321636717136964LeidyNKWyrwichKWBridging the gap: using triangulation methodology to estimate minimal clinically important differences (MCIDs)COPD200532115716517136977TerweeCBRoordaLDDekkerJBierma-ZeinstraSMPeatGJordanKPMind the MIC: large variation among populations and methodsJ Clin Epidemiol2010563552453419926446BeatonDEvanEDSmithPvan d, VCullen KKennedyCAMinimal change is sensitive, less specific to recovery: a diagnostic testing approach to interpretabilityJ Clin Epidemiol2011564548749621109396CohenJStatistical power analysis for the behavioral sciences19882nd ed.Hillsdale, New JerseyLawrence Erlbaum AssociatesDrummondMIntroducing economic and quality of life measurements into clinical studiesAnnals of Medicine2001733534434911491193WaltersSJBrazierJEWhat is the relationship between the minimally important difference and health state utility values? The case of the SF-6DHealth Qual Life Outcomes20031412737635WaltersSJBrazierJEComparison of the minimally important difference for two health state utility measures: EQ-5D and SF-6DQuality of Life Research200581461523153216110932PickardASNearyMPCellaDEstimation of minimally important differences in EQ-5D utility and VAS scores in cancerHealth Qual Life Outcomes200757018154669MajumdarSRJohnsonJABowkerSLBoothGLDolovichLGhaliWA Canadian consensus for the standardized evaluation of quality improvement interventions in Type2 diabetesCanadian Journal of Diabetes2005293220229SamsaGEdelmanDRothmanMLWilliamsGRLipscombJMatcharDDetermining clinically important differences in health status measures: a general approach with illustration to the Health Utilities Index Mark IIPharmacoeconomics1999215214115510351188KaplanRMThe minimally clinically important difference in generic utility-based measuresCOPD2005321919717136968GrootendorstPFeenyDFurlongWHealth Utilities Index Mark 3: evidence of construct validity for stroke and arthritis in a population health surveyMed Care2000338329029910718354GroesslEJKaplanRMBarrett-ConnorEGaniatsTGBody mass index and quality of well-being in a community of older adultsAm J Prev Med2004226212612914751323KontodimopoulosNPappaEPapadopoulosAATountasYNiakasDComparing SF-6D and EQ-5D utilities across groups differing in health statusQuality of Life Research20092181879719051058SullivanPWLawrenceWFGhushchyanVA national catalog of preference-based scores for chronic conditions in the United StatesMed Care2005743773674915970790KhannaDFurstDEWongWKTsevatJClementsPJParkGSReliability, validity, and minimally important differences of the SF-6D in systemic sclerosisQual Life Res200781661083109217404896GlobeDRWuJAzenSPVarmaRThe impact of visual impairment on self-reported visual functioning in Latinos: The Los Angeles Latino Eye StudyOphthalmology2004611161141114915177964AndresAMMarzoPFChance-corrected measures of reliability and validity in K x K tablesStat Methods Med Res20051014547349216248349AltmanDGPractical statistics for medical research1991LondonChapman & HallBrowneJPvan der MeulenJHLewseyJDLampingDLBlackNMathematical coupling may account for the association between baseline severity and minimally important difference valuesJ Clin Epidemiol2010863886587420172689PresslerSJEckertGJMorrisonGCMurrayMDOldridgeNBEvaluation of the health utilities index mark-3 in heart failureJ Card Fail2011217214315021300304PaltaMHan-YengCKaplanRMFeenyDCherepanovDFrybackDGStandard error of measurement of five health utility indexes across the range of health for use in estimating reliability and responsivenessMed Decis Making2010JonesCAFeenyDHAgreement between patient and proxy responses of health-related quality of life after hip fractureJ Am Geriatr Soc200575371227123316108944PickardASJohnsonJAFeenyDHShuaibACarriereKCNasserAMAgreement between patient and proxy assessments of health-related quality of life after stroke using the EQ-5D and Health Utilities IndexStroke2004235260761214726549BalabanDJSagiPCGoldfarbNINettlerSWeights for scoring the quality of well-being instrument among rheumatoid arthritics. A comparison to general population weightsMed Care19861124119739803773579HectorRDSr.AndersonJPPaulRCWeissREHaysRDKaplanRMHealth state preferences are equivalent in the United States and Trinidad and TobagoQual Life Res2010619572973820237958WangQFurlongWFeenyDTorranceGBarrRHow robust is the Health Utilities Index Mark 2 utility function?Med Decis Making2002722435035812150600Le GalesCBuronCCostetNRosmanSSlamaPRDevelopment of a preference-weighted health status classification system in France: the Health Utilities Index 3Health Care Manag Sci2002251415111862978RaatHBonselGJHoogeveenWCEssink-BotMLFeasibility and reliability of a mailed questionnaire to obtain visual analogue scale valuations for health states defined by the Health Utilities Index Mark 3Medical Care20041421131814713735RuizMRejasJSotoJPardoARebolloI[Adaptation and validation of the Health Utilities Index Mark 3 into Spanish and correction norms for Spanish population]Med Clin (Barc)2003211203899612605729HaysRDBrodskyMJohnstonMFSpritzerKLHuiKKEvaluating the statistical significance of health-related quality-of-life change in individual patientsEval Health Prof2005628216017115851771McLeodLDCoonCDMartinSAFehnelSEHaysRDInterpreting patient-reported outcome results: US FDA guidance and emerging methodsExpert Rev Pharmacoecon Outcomes Res2011411216316921476818

Distribution of Change in National Eye Institute Visual Function Questionnaire - 25 Total Scores

Distribution of Change in Minnesota Living With Heart Failure Total Scores

Demographic characteristics of the samples

Cataract PatientsHeart Failure Patients
enrolled(n=376)completedata – insample(n=210)incompletedata – notin sample(n=166)in samplevs notenrolled(n=160)Completedata – insample(n=86)incompletedata – notin sample(n=74)inSamplevs not
Demographic
Age:
35–445 (1)3 (1)2 (1)chi(2)=24 (15)11 (13)13 (18)chi(2)=
45–64115 (31)71 (34)44 (27)2.42101 (63)63 (73)38 (51)8.96
65–91256 (68)136 (65)120 (72)Pr=0.297735 (22)12 (14)23 (31)Pr = 0.0113
Race:
white328 (87)184 (88)144 (87)chi(3)=126 (79)74 (86)52 (70)chi(3)=
black12 (3)7 (3)5 (3)1.2019 (12)8 (9)11 (15)6.01
Asian19 (5)13 (6)6 (4)Pr=0.75355 (3)1 (1)4 (5)Pr = 0.1083
other4 (1)2 (1)2 (1)2 (1)2 (2)0 (0)
missing*13 (3)4 (2)9 (5)8 (5)1 (1)7 (9)
Education:
< HS21 (6)6 (3)15 (9)chi(6)=20 (13)7 (8)13 (18)chi(6)=
HS graduate60 (16)32 (15)28 (17)10.8145 (28)25 (29)20 (27)10.55
some college78 (21)43 (20)35 (21)Pr=0.094347 (29)33 (38)14 (19)Pr = 0.1034
2 year assoc27 (7)14 (7)13 (8)12 (8)7 (8)5 (7)
4 yr coll grad90 (24)56 (27)34 (20)16 (10)7 (8)9 (12)
MA degree57 (15)38 (18)19 (11)9 (6)3 (3)6 (8)
doctorate/professional34 (9)19 (9)15 (44)6 (4)4 (5)2 (3)
missing*9 (2)2 (1)7 (4)5 (3)0 (0)5 (7)
Female222 (59)124 (59)98 (59)chi(1)=52 (33)26 (30)26 (35)chi(1)=
0.000.44
Pr=0.9982Pr = 0.5092

missing not used in tests

Baseline, One-month, and Change Scores, Cataract Cohort, n = 210

MeasureMeanMedianStd DevMinimumMaximum
BaselineVFQt76.5180.6315.4217.2398.67
EQ-5D0.830.830.170.081.00
HUI20.790.820.170.081.00
HUI2 Sensation0.760.760.140.001.00
HU130.660.690.27−0.281.00
HUI3 Vision0.800.950.220.001.00
QWB-SA0.590.610.140.151.00
SF-6D0.740.750.120.331.00
SRH2.502.000.931.005.00
VFQui0.860.920.120.410.97
One MonthVFQt86.4790.1312.5726.83100.00
EQ-5D0.840.830.160.171.00
HUI20.810.870.190.121.00
HUI2 Sensation0.840.870.170.001.00
HU130.720.800.28−0.321.00
HUI3 Vision0.910.950.150.001.00
QWB-SA0.600.610.140.150.97
SF-6D0.730.740.120.391.00
SRH2.522.000.921.005.00
VFQui0.900.940.090.460.97
ChangeVFQt9.967.8613.44−25.4971.63
EQ-5D0.020.000.13−0.510.54
HUI20.020.030.14−0.610.50
HUI2 Sensation0.080.000.19−0.651.00
HU130.050.030.21−0.820.77
HUI3 Vision0.120.000.22−0.381.00
QWB-SA0.010.000.13−0.630.39
SF-6D−0.010.000.09−0.290.22
SRH0.020.000.69−3.002.00
VFQui0.040.010.11−0.240.53

Note: EQ-5D = 5 dimension Euro QOL measure; HUI2 = Health Utilities Index Mark 2; HUI3 = Health Utilities Index Mark 3; QWB-SA = Quality of Well Being – Self Administered Score; SF-6D = Short-Form 6D; SRH = self-rated health; VFQt = total score of National Eye Institute Visual Function Questionnaire (NEI-VFQ-25); VFQui = Preference-based Score based on NEIVFQ-25.

Baseline, Six-month, and Change Scores, Heart Failure Cohort, n = 86

MeasureMeanMedianStd DevMinimumMaximum
BaselineMLHF48.2648.5025.400.00101.00
EQ-5D0.770.790.180.181.00
HUI20.760.850.220.141.00
HUI30.620.730.32−0.251.00
QWB-SA0.540.550.140.220.87
SF-6D0.630.630.110.390.93
SRH3.794.000.832.005.00
Six MonthsMLHF39.5336.5024.970.0089.00
EQ-5D0.760.780.180.211.00
HUI20.760.840.240.041.00
HUI30.650.740.33−0.341.00
QWB-SA0.580.590.150.211.00
SF-6D0.660.640.130.411.00
SRH3.494.000.981.005.00
ChangeMLHF−8.72−7.5022.12−69.0060.00
EQ-5D−0.010.000.18−0.630.52
HUI20.000.000.17−0.680.43
HUI30.030.000.23−0.690.87
QWB-SA0.040.020.16−0.450.46
SF-6D0.030.020.11−0.170.42
SRH−0.300.000.90−3.001.00

Note: EQ-5D = 5 dimension Euro QOL measure; HUI2 = Health Utilities Index Mark 2; HUI3 = Health Utilities Index Mark 3; MLHF = Minnesota Living with Heart Failure; QWB-SA = Quality of Well Being – Self Administered Scale; SF-6D = Short-Form 6D; SRH = self-rated health.

Agreement Among 10 Measures, Cataract Cohort, n = 210

Pair%AgreementKappa Statistic95%ConfidenceIntervalWeightedKappa Statistic95%ConfidenceInterval
VFQt and EQ-5D390.08(−0.01, 0.16)0.10(0.01, 0.18)
VFQt and HUI2480.11(0.01, 0.210.14(0.04, 0.25)
VFQt and HUI2 Sensation550.22(0.11, 0.320.22(0.12, 0.32)
VFQt and HUI3440.07(−0.02, 0.170.11(0.01, 0.21)
VFQt and HUI3 Vision570.25(0.15, 0.350.25(0.15, 0.36)
VFQt and QWB-SA400.06(−0.02, 0.150.12(0.03, 0.21)
VFQt and SF-6D390.09(0.00, 0.170.09(0.00, 0.17)
VFQt and SRH330.01(−0.06, 0.08−0.05(−0.12, 0.02)
VFQt and VFQui560.26(0.17, 0.350.33(0.24, 0.42)

Note: EQ-5D = 5 dimension Euro QOL measure; HUI2 = Health Utilities Index Mark 2; HUI3 = Health Utilities Index Mark 3; QWB-SA = Quality of Well Being – Self Administered Scale; SF-6D = Short-Form 6D; VFQt = total score of National Eye Institute Visual Function Questionnaire (NE1VFQ-25); VFQui = Preference-based Score based on NEIVFQ-25. Self-rated Health (SRH) was scored as 1 = poor; 2 = fair; 3 = good; 4 = very good; and 5 = excellent.

Agreement Among Seven Measures, Heart Failure cohort, n = 86

Pair%AgreementKappa Statistic95%ConfidenceIntervalWeightedKappa Statistic95%ConfidenceInterval
MLHF and EQ-5D19−0.25(−0.37, −0.13)−0.30(−0.45, −0.15)
MLHF and HUI229−0.10(−0.26, 0.05)−0.19(−0.36, −0.02)
MLHF and HUI326−0.17(−0.32, −0.02)−0.23(−0.40, −0.06)
MLHF and QWB-SA22−0.22(−0.37, −0.07)−0.30(−0.46, −0.14)
MLHF and SF-6D26−0.11(−0.25, 0.04)−0.22(−0.38, −0.06)
MLHF and SRH490.25(0.12, 0.39)0.34(0.19, 0.49)

Note: EQ-5D = 5 dimension Euro QOL measure; HUI2 = Health Utilities Index Mark 2; HUI3 = Health Utilities Index Mark 3; MLHF = Minnesota Living with Heart Failure; QWB-SA = Quality of Well Being – Self Administered Scale; SF-6D = Short-Form 6D. Self-rated Health (SRH) was scored as 1 = poor; 2 = fair; 3 = good; 4 = very good; and 5 = excellent.

Comparisons of Change among Measures of Health-Related Quality of Life from Baseline to One Month in Cataract Surgery Cohort

Got worse(n=15)Stayed same(n=62)ShowedImprovement(n=133)Row totals
EQ5D
4132138
073973119
+4103953
HUI2
8162246
02203961
+52672103
HUI3
8203058
03143249
+42871103
QWB-SA
11273472
02123953
+2236085
SF-6D
10204373
03234268
+2194869
SRH
172735
074684137
+792238

Note: - means Got Worse; 0 means Stayed Same; + means Showed Improvement.

Comparisons of Change among Measures of Health-Related Quality of Life from Baseline to Six Months in Heart Failure Cohort

Got Worse(n=46)Stayed Same(n=13)ShowedImprovement(n=27)Row totals
EQ5D
1161633
0162826
+195327
HUI2
1161330
085518
+272938
HUI3
1081432
0103417
+262937
QWB-SA
961530
073515
+304741
SF-6D
921223
087924
+294639
SRH
243431
02081341
+221014

Note: - means Got Worse; 0 means Stayed Same; + means Showed Improvement.