Medscape, LLC is pleased to provide online continuing medical education (CME) for this journal article, allowing clinicians the opportunity to earn CME credit.
This activity has been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education through the joint sponsorship of Medscape, LLC and
Medscape, LLC designates this Journal-based CME activity for a maximum of 1
All other clinicians completing this activity will be issued a certificate of participation. To participate in this journal CME activity: (1) review the learning objectives and author disclosures; (2) study the education content; (3) take the post-test with a 75% minimum passing score and complete the evaluation at
Upon completion of this activity, participants will be able to:
Define healthy user bias in health care research and means to reduce it
Assess means to reduce selection bias in health care research
Assess how to overcome confounding factors by indication in health care research
Evaluate social desirability bias and history bias in health care research
Ellen Taratus, Editor,
Camille Martin, Editor,
Jeanne Madden, PhD, Department of Population Medicine, Harvard Medical School, Boston, Massachusetts. Disclosure: Jeanne Madden has disclosed no relevant financial relationships.
Charles P. Vega, MD, Clinical Professor of Family Medicine, University of California, Irvine
Disclosure: Charles P. Vega, MD, has disclosed the following relevant financial relationships: Served as an advisor or consultant for: Lundbeck, Inc; McNeil Pharmaceuticals; Takeda Pharmaceuticals North America, Inc.
Stephen B. Soumerai, ScD, Professor of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute; Co-chair, Evaluative Sciences and Statistics Concentration of Harvard University's PhD Program in Health Policy, Harvard University, Boston, Massachusetts; Douglas Starr, MS, Co-director of Science Journalism Program at Boston University, Boston University, Boston, Massachusetts; Sumit Majumdar, MD, MPH, FRCPC, Professor of Medicine, Endowed Chair in Patient Health Management, Faculties of Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada
Disclosures: Stephen B. Soumerai, Douglas Starr, and Sumit Majumdar have disclosed no relevant financial relationships.
Evidence is mounting that publication in a peer-reviewed medical journal does not guarantee a study’s validity (
There have been major reversals of study findings in recent years. Consider the risks and benefits of postmenopausal hormone replacement therapy (HRT). In the 1950s, epidemiological studies suggested higher doses of HRT might cause harm, particularly cancer of the uterus (
The reason these studies contradicted each other had less to do with the effects of HRT than the difference in study
Another pattern in the evolution of science is that early studies of new treatments tend to show the most dramatic, positive health effects, and these effects diminish or disappear as more rigorous and larger studies are conducted (
Rigorous design is also essential for studying health policies, which essentially are huge real-world experiments (
This article focuses on a fundamental question: which types of health care studies are most trustworthy? That is, which study designs are most immune to the many biases and alternative explanations that may produce unreliable results (
The case examples in this article describe how some of the most common biases and study designs affect research on important health policies and interventions, such as comparative effectiveness of various medical treatments, cost-containment policies, and health information technology.
The examples include visual illustrations of common biases that compromise a study’s results, weak and strong design alternatives, and the lasting effects of dramatic but flawed early studies. Generally, systematic literature reviews provide more conservative and trustworthy evidence than any single study, and conclusions of such reviews of the broad evidence will also be used to supplement the results of a strongly designed study. Finally, we illustrate the impacts of the studies on the news media, medicine, and policy.
This case example describes healthy user bias in studies attempting to compare healthy users of influenza (flu) vaccines with unhealthy nonusers (eg, frail, severely ill) and attributing the differences to the vaccines. Flawed results of poorly designed experiments have dictated national vaccination policies. More rigorous longitudinal studies suggest that national flu vaccine campaigns have not lowered mortality rates in the elderly.
Selection biases may be the most ubiquitous threat to the trustworthiness of health research. Selection bias occurs when differences between treatment recipients and nonrecipients or control groups (based on such factors as income, race, or health) may be the true cause of an observed health effect rather the treatment or policy itself.
Healthy user bias is a type of selection bias that occurs when investigators fail to account for the fact that individuals who are more health conscious and actively seek treatment are generally destined to be healthier than those who do not. This difference can make it falsely appear that a drug or policy improves health when it is simply the healthy user who deserves the credit (
One well-known example is the national campaign in the United States to universally vaccinate all elderly people against the flu. The goal is to reduce the most devastating complications of flu, death and hospitalizations for pneumonia (
These cohort studies, however, did not account for healthy user bias. For example, a study of 3,415 people with pneumonia (and at high risk for flu and its complications) illustrated that elderly people who received a flu vaccine were more than 7 times as likely to also receive the pneumococcal vaccine as elderly people who did not receive a flu vaccine (
Healthy user bias, a type of selection bias, is demonstrated in a study of 3,415 patients with pneumonia (and at high risk for flu and its complications), where elderly flu vaccine recipients were already healthier than nonrecipients. Figure is based on data extracted from Eurich et al (
Characteristic Vaccinated, % Not Vaccinated, % Physically independent 95 84 Received pneumococcal vaccine 68 9 Former smoker 42 30 Statin user 35 25
Healthy user bias is a common threat to research, especially in studies of any intervention where the individual patient can seek out health care and choose to be immunized, screened, or treated (
One of the most common study designs examining the risks and benefits of drugs and other interventions is the epidemiological cohort design, which compares death and disease rates of patients who receive a treatment with the rates of patients who do not. Although seemingly straightforward, this design often fails to account for healthy user bias, especially in studies of health care benefits.
For example, one of many weak cohort studies purported to show that flu vaccines reduce mortality in the elderly (
A weak cohort study comparing the risk of death or hospitalization for pneumonia or flu among vaccinated versus unvaccinated elderly: example of failure to control for healthy users. Figure is based on data extracted from Nichol et al (
The cohort design has long been a staple in studies of treatment outcomes. Because such studies often do not account for people’s pre-existing health practices, they tend to inflate or exaggerate the benefits of treatments (eg, the flu vaccine) while downplaying harms (eg, HRT) (
Epidemiological studies that have led to national campaigns have been overturned by subsequent stronger studies. One landmark study (
A strong, particularly creative study published in 2010 (
Healthy user bias: a strong controlled study disproving the effects of the flu vaccine on all-cause mortality in the elderly during the flu “off season” (control period). The cohort study compared vaccinated elderly and unvaccinated elderly. Figure is based on data extracted from Campitelli et al (
The only logical conclusion one can reach from this study is that the benefits during the flu season were simply a result of something other than the effects of flu vaccine — most likely healthy user bias. If fewer vaccinated elders die in the absence of the flu, it is because they are already healthier than unvaccinated elders who may be already too sick to receive a flu vaccination.
Studies with strong research designs that control for selection bias and overturn the exaggerated findings of studies with weak research designs show how weak science in combination with dramatic results can influence the adoption of ineffective health policies. Certainly, greater use of flu vaccine may be reducing the incidence and symptoms of flu. However, the massive national flu vaccination campaign was predicated on reducing the number of flu-related deaths and hospitalizations for pneumonia among the elderly. It could be argued that the funds used for such a campaign could be better spent on developing more effective vaccines or treatments or other methods to reduce the spread of flu.
The news media played a major role in disseminating the misleading results of studies that did not properly take into account the influence of healthy user bias in claims that flu vaccinations could reduce mortality rates and hospitalizations among the elderly. Reuters, for example (
In a study of relatively healthy elderly HMO members, getting a flu shot significantly reduced the odds of being hospitalized with an influenza-related ailment and of dying. . . . “Our study confirms that influenza vaccination is beneficial for reducing hospitalization and death among community-dwelling HMO elderly over a 10-year period,” said the lead author. . . . Flu vaccination reduced the risk of hospitalization for pneumonia or influenza by 27 percent and reduced the risk of death by 48 percent, the report indicates.
(Excerpted from
This case example describes volunteer selection biases created by studies that use “volunteer” hospital adopters of health information technology (IT) and noncomparable “laggard” controls (the common design in the field). Volunteer hospitals already tend to have more experienced physicians and healthier patients, which may influence health outcomes more than the intervention does.
The flawed results of these sorts of experiments led to federal health IT initiatives, resulting in trillions of dollars spent on unproven and premature adoption of the technologies and few demonstrated health benefits. RCTs failed to replicate the findings on cost savings and lives saved suggested in the poorly designed studies.
Researchers often attempt to evaluate the effects of a health technology by comparing the health of patients whose physicians use the technology with the health of patients whose physicians do not. But if the 2 groups of physicians (or hospitals) are different (eg, older vs younger, high volume vs low volume of services), those differences might account for the difference in patient health, not the technology being studied.
Our national investment in health IT is a case in point. Based in part on an influential report from the RAND think tank (
Let’s examine some studies that illustrate how provider selection biases may invalidate studies about the health and cost effects of health IT.
Example of selection bias: underlying differences between groups of medical providers show how they are not comparable in studies designed to compare providers using EHRs with providers not using EHRs. Figure is based on data extracted from Simon et al (
Characteristic Percentage Using Electronic Health Records
Large (≥7 physicians) 52 Small (1–3 physicians) 29
Teaching hospital 40 Nonteaching hospital 14
≤45 46 46–55 37 >55 26
The following example illustrates how a weak cross-sectional study (a simple correlation between a health IT program and supposed health effects at one point in time) did not account for selection biases and led to exaggerated conclusions about the benefits of health IT (
Example of weak post-only cross-sectional study that did not control for selection bias: the study observed differences between practices with EHRs and practices with paper records after the introduction of EHRs but did not control for types of providers adopting EHRs. Note the unlikely outcome for nonsmoker. Figure is based on data extracted from Cebul et al (
Health Outcome Percentage of Patients Achieving Outcome
Electronic Health Record–Based Practice Paper-Based Practice Blood pressure control (<140/80 mm Hg) 56 39 Weight control (body mass index <30) 33 34 Nonsmoker 82 52
This weak cross-sectional design would be excluded because of inadequate evidence of the effects of medical services and policies by systematic reviewers adhering to the standards of the international Cochrane Collaboration (
The questionable findings of this study suggested that EHRs might not only improve blood pressure control but also reduce smoking by 30 percentage points (
The conclusion of the report — that “the meaningful use of EHRs may improve the quality of care” — is not warranted. Large practices, teaching hospitals, and younger physicians (
Differences in patient characteristics between EHR-based practices and paper-based practices in a weak post-only cross-sectional study that did not control for selection bias. Abbreviation: EHR, electronic health record. Figure is based on data extracted from Cebul et al (
Patient Characteristic Percentage of Patients Achieving Outcome
Electronic Health Record–Based Practice Paper-Based Practice Medicaid (poor) 7 23 Nonwhite 44 85 Medicare (elderly) 37 20 Commercial health insurance 48 10
Many other kinds of study design (
A study . . . involving more than 27,000 adults with diabetes found that those in physician practices using EHRs were significantly more likely to have health care and outcomes that align with accepted standards than those where physicians rely on patient records.
(Excerpted from
Given the volunteer selection biases in comparing unlike providers with EHRs and providers without EHRs, what designs can level the playing field and yield more trustworthy results? The “gold standard” of research designs (
Randomized controlled trial: the “gold standard” of research design.
This simple design starts with a population (eg, patients, health centers) and uses chance to randomly allocate some centers to the intervention (eg, health IT or not [control]). The researchers then test whether health in the intervention improved more than health in the control. The randomization generally eliminates selection biases, such as facility size or patient age or income. Such designs can reduce bias if they adhere to methodological safeguards, such as blinding patients to their treatment status and randomizing enough patients or centers.
Consider the following randomized control trial involving a state-of-the-art health IT system with decision support in nursing homes (
A strong randomized controlled trial of the effect of health information technology on the prevention of drug-related injuries among nursing home residents. Intervention participants received computerized warnings about unsafe combinations of drugs. Figure is based on data extracted from Gurwitz et al (
Type of Injury No. of Injuries per 100 Residents per Month
Intervention No Intervention (Control) Nonpreventable 10.8 10.4 Preventable 4.0 3.9
A single study, no matter how rigorous, should never be considered definitive. The best evidence of what works in medical science comes from systematic reviews of the entire body of published research by unbiased evaluators — after eliminating the preponderance of weak studies. Such a review of hundreds of health IT studies cited a lack of rigorous evidence (
[T]here is a lack of robust research on the risks of implementing these technologies and their cost-effectiveness has yet to be demonstrated, despite being frequently promoted by policymakers and “techno-enthusiasts” as if this was a given.
Advancements in health IT may well achieve the promised cost and quality benefits, but proof of these benefits requires more rigorous appraisal of the technologies than research to date has provided.
This case example describes
Landmark studies that failed to control for this bias nevertheless influenced worldwide drug safety programs for decades, despite better controlled longitudinal time-series studies that debunked the early dramatic findings published in major journals.
One of the oldest and most accepted “truths” in the history of medication safety research is that benzodiazepines (popular medications such as Valium and Xanax that are prescribed for sleep and anxiety) may cause hip fractures among the elderly. At first glance, this adverse effect seems plausible because the drugs’ sedating effects might cause falls and fractures, especially in the morning after taking a sleep medication (
RCTs — in which similar patients are randomized to either treatment or no treatment — are generally too small to detect such infrequent but important outcomes as a hip fracture: each year, less than 0.5% to 1% of the elderly population has a hip fracture (
Confounding by indication may be especially problematic in studies of benzodiazepines because physicians prescribe them to elderly patients who are sick and frail. Because sickness and frailty are often unmeasured, their biasing effects are hidden. Compared with elderly people who do not use benzodiazepines, elderly people who start benzodiazepine therapy have a 29% increased risk for hypertension, a 45% increased risk for pain-related joint complaints (an obvious predictor of hip fractures that is rarely measured in research data), a 50% increased risk for self-reporting health as worse than that of peers, and a 36% increased risk for being a current smoker (
Elderly people who begin benzodiazepine therapy (recipients) are already sicker and more prone to fractures than nonrecipients. Figure is based on data extracted from Luijendijk et al (
Patient Characteristic Percentage Increase in Risk (Hazard Ratio), Benzodiazepine Recipients vs Nonrecipients Female 67 Depression 53 Hypertension 29 Pain-related joint complaints 45 Health self-reported as worse than that of peers 50 Current smoker 36
Almost 30 years ago, a landmark study used Medicaid insurance claims data to show a relationship between benzodiazepine use and hip fractures in the elderly (
One of several results of this weak post-only epidemiological study showed that current users of benzodiazepines were more likely to fracture their hip than previous users (
Weak post-only epidemiological study suggesting that current users of benzodiazepines are more likely than previous users to have hip fractures. Figure is based on data extracted from Ray et al (
The researchers were able to gather little or no data on the sicker, long-term benzodiazepine users from their insurance claims and so could not accurately compare the 2 groups. If they had been able to collect such information, their conclusions may have been different. In short, the researchers could not determine what would have happened if these sicker patients did not receive benzodiazepines.
More than 2 dozen epidemiological studies of hip fractures and benzodiazepines have been published since the original report in 1987 (
The estimated risks of a fracture shrank over time as investigators did a better job of adjusting for the sicker patients who used benzodiazepines. By the time a more rigorous epidemiological study was conducted that controlled more completely for confounding by indication, the proverbial horse was out of the barn; these investigators demonstrated that the excess risk of benzodiazepines and hip fractures was so small that many considered the risk to be negligible or nonexistent (
Case-control studies or “look-back” studies are weak designs for evaluating medical treatments or other interventions because researchers try to draw conclusions when comparing patients whose differences, not treatment, may account for an effect. A stronger research method is the longitudinal natural experiment, in which researchers follow a group over time as their medications or policies that affect them change.
Such natural experiments allow researchers to view multiple points before and after an intervention — to observe a pre-policy trend and a post-policy trend. Rather than comparing different groups of patients at a single point in time, researchers follow patient groups
Several examples of effects that can be detected in interrupted time-series studies. The blue bar represents an intervention.
These examples illustrate the advantages of graphical data, which can show the true nature of trends. That is not to say that time-series studies never lead to erroneous conclusions. They are just less likely to do so than other designs.
In 1989 New York State began to require every prescription of benzodiazepine to be accompanied by a triplicate prescription form, a copy of which went to the New York State Department of Health. State policy makers thought this would limit benzodiazepine use, thereby reducing costs, the prevalence of benzodiazepine abuse, and the risk of hip fracture. (In formulating the policy, policy makers referred to the 1987 landmark study on benzodiazepines and hip fractures [
Benzodiazepine (BZ) use and risk of hip fracture among women with Medicaid before and after regulatory surveillance restricting BZ use in New York State. A BZ user was defined as a person who had received at least 1 dispensed BZ in the year before the policy. Figure was adapted from Wagner et al (
The researchers found that rather than a decrease in the incidence of hip fractures, the incidence continued to rise among New York women throughout the post-policy period; in fact, the incidence was slightly higher in New York than in New Jersey, where benzodiazepine use was constant (
Even today, many policies to control benzodiazepine use are based on the early dramatic findings and decades of research that did not control for confounding by indication. Like every other drug or device, benzodiazepines have both benefits and risks, but they probably have no effect on the risk of hip fracture.
The findings of these early and widely cited studies were magnified by the news media, which had a resounding impact on the public, clinicians, and policy makers. Rather than challenging the studies, many reporters simply accepted their conclusions. For example, on the day the 1987 study was published (
Sedative drugs called benzodiazepines (such as Valium) don’t increase the risk of hip fractures in the elderly, a Harvard Medical School study says. The finding suggests that US federal and state policies that restrict access to these drugs among the elderly need to be re-examined, the study authors added. . . . The policy drastically decreased use of benzodiazepines in New York, and we did not see any decline in hip fracture rates compared to New Jersey.
(Excerpted from
We have cited several examples of contradictory findings on the association between benzodiazepines and hip fractures among the elderly published several years after misleading observational research was first reported. As it did with the studies on the risks and benefits of HRT, it took many years to debunk the earlier studies that were flawed to begin with and given credence by the news media.
This case example describes bias caused by self-reports of socially desirable behavior (mothers reporting that their children watch less television than they actually watch) that became exaggerated after a controlled trial of a 1-year program to educate mothers to reduce such sedentary activity. Comparing the reports of these mothers with the reports of a control group (not participating in the program) further biased the widely reported findings. The use of unobtrusive computer observations instead of self-reports was a more valid approach.
There is a widespread bias in health research studies that leads to exaggerated conclusions and could be curtailed through the application of common sense. Researchers often use self-reports of health behaviors by study participants. But if the participants in such a study believe that one outcome is more socially desirable then another (such as avoiding fatty foods or exercising regularly), they will be more likely to state the socially desirable response — basically telling researchers what they want to hear.
Some of the more interesting examples of this bias involve studies of obesity and nutrition. A 1995 study showed that both men and women tended to understate their true calorie and fat consumption by as much as 68% in comparison to more objective methods (
Underreporting of calories and fat consumption due to social desirability among women and men. Figure is based on data extracted from Hebert et al (
Measure Underreporting
Women Men Caloric intake, kcal −68.0 −38.9 Fat intake, percentage −33 −13
These women were not lying. They were unconsciously seeing their behavior as conforming to positive societal norms. The principle applies to physicians as well. For example, when asked about their compliance with national quality of care guidelines, physicians overstated how well they did by about 30% in comparison to more objective auditing of their clinical practices. Just like those men and women self-reporting calorie and fat intake, these physicians were not lying or deliberately misleading — they knew what they should be doing and were pretty sure that they were doing it almost all the time (
Even very strong research designs like RCTs can be compromised if the investigators unwittingly tip off the study group to the desired outcomes.
The following example is one of many published studies that created selection bias due to social desirability. The study was an RCT of a 1-year primary care education program, High Five for Kids, which attempted to motivate mothers to influence their children to watch less television and follow more healthful diets to lose weight (
Study that contaminated intervention group by unwittingly tipping parents off to the socially desired outcome: fewer hours of television time per day for children. Figure is based on data extracted from Taveras et al (
Timing No. of Self-Reported Hours of Television per Day
Intervention No Intervention Before intervention 2.67 2.44 After intervention 2.13 2.36
Studies with important limitations in design nevertheless can have significant policy implications. On the basis of this study, the High Five for Kids program was declared a success and was a model for an obesity prevention research program in Mexico.
In childhood obesity research, it is difficult to design studies that eliminate social desirability bias. In a comprehensive review of measures of television watching, most studies used self-report (
In 2008, researchers published a randomized controlled study of an intervention to reduce childhood television and computer use to decrease weight (
Strong randomized controlled trial design using an electronic device that caused an involuntary reduction in television and computer use. The difference in decline in viewing between the intervention group and control group was significant. Figure is based on data extracted from Epstein et al (
This case example describes history bias: uncontrolled pre-existing or co-occurring downward trends in mortality that investigators mistakenly attributed to their national patient safety initiatives. Flawed results from their experiments led to worldwide movements to adopt and entrench ineffective initiatives. In studies of health care and policies, it is essential to graph and display time trends before and
A common threat to the credibility of health research is history bias. History bias can occur when events that take place before or during the intervention may have a greater effect than the intervention itself. An example of this kind of bias took place in a study of an intervention using medical opinion leaders to recommend appropriate drugs to their colleagues for patients with acute myocardial infarction (
Control hospitals (ie, those that did not receive the intervention) still had the desirable changes (
Percentage of acute myocardial infarction patients who received essential life-saving drugs (β blockers and thrombolytics) and a drug linked with increased mortality (lidocaine) in control hospitals before and after an intervention. Figure is based on data extracted from Soumerai et al (
What could cause such historical biases? This intervention took place during an explosion of research and news media reporting on treatments for acute myocardial infarction that could have influenced the prescribing behavior of physicians. At the same time, the US Department of Health and Human Services launched a national program targeting the drugs in the study, and the American College of Cardiology and the American Hospital Association jointly released new guidelines for the early management of patients with acute myocardial infarction. In the complex environment of health care, policies, and behavior, hundreds of historical events, if not controlled for, could easily account for the “effects” of policies and interventions. Fortunately, the use of a randomized control group in this example accounted for changes that had nothing to do with the study intervention.
In 1999, the Institute of Medicine issued a landmark report on how the misuse of technologies and drugs may be causing illnesses and deaths in hospitals throughout the nation (
Example of a weak post-only study of a hospital safety program and mortality that did not control for history. Narrow bar shows start of quality of care program. There is no evidence that data are available for the years leading up to the program. The study did not define the intervention period other than to state that planning occurred in 2003. Figure is based on data extracted from Pryor et al (45). Abbreviation: FY, fiscal year.
Fiscal Year Deaths per 100 Discharges 1999 Unknown 2000 Unknown 2001 Unknown 2002 Unknown 2003 Unknown 2004 2.2 2005 2.1 2006 2.0 2007 1.9 2008 1.9 2009 1.9 2010 1.8
No data are available for the years before the hospitals put their program in place. Without that baseline data, such post-only designs cannot provide any realistic assessment of a program’s success (
“The Quality ‘Journey’ At Ascension Health: How We’ve Prevented At Least 1,500 Avoidable Deaths A Year — And Aim To Do Even Better” (
Equally common, many pre–post studies have only one measurement before the intervention and one measurement afterward. Such a design is not much different than the weak design of the study illustrated in Figure 17, because we have no idea what would have happened anyway on the basis of the missing pre-existing trend in mortality.
Another example of weak design is a study (
A campaign to reduce lethal errors and unnecessary deaths in U.S. hospitals has saved an estimated 122,300 lives in the last 18 months, the campaign’s leader said Wednesday. . . . “We in health care have never seen or experienced anything like this,” said Dr. Dennis O’Leary, president of the Joint Commission on Accreditation of Healthcare Organizations.
(Excerpted from
Does more rigorous evidence support the notion that the 100,000 Lives Campaign actually reduced mortality rates? To investigate that question, we obtained 12 years of national statistics on hospital mortality, longitudinal data from
Example of a strong time-series design that controlled for history bias in the Institute for Healthcare Improvement’s 100,000 Lives Campaign. Figure is based on data from the Agency for Healthcare Research and Quality (
Year Deaths per 100 Discharges 1993 2.72 1994 2.63 1995 2.58 1996 2.54 1997 2.46 1998 2.50 1999 2.46 2000 2.37 2001 2.32 2002 2.24 2003 2.22 2004 2.13 2005 (Quality of care program began in January 2005) 2.09 2006 2.04 2007 1.94 2008 2.03 2009 1.92 2010 1.90 2011 1.91
Subsequently, several large RCTs demonstrated that many components of the 100,000 Lives Campaign were not particularly effective (
Scientists, journalists, policy makers, and members of the public often do not realize the extent to which bias affects the trustworthiness of research. We hope this article helps to elucidate the most common designs that either fall prey to biases or fail to control for their effects. Because much of this evidence is easily displayed and interpreted, we encourage the use of visual data sets in presenting health-related information. To further clarify our message, here (
| Hierarchy of Design | |
|---|---|
|
| |
| Multiple randomized controlled trials | The “gold standard” of evidence |
| Randomized controlled trials | A strong design, but sometimes not feasible |
| Interrupted time series with a control series | Baseline trends often allow visible effects and controls for biases |
|
| |
| Single interrupted time series | Controls for trends, but has no comparison group |
| Before and after with comparison group (single observations, sometimes called “difference in difference” design) | Comparability of baseline trend often unknown |
|
| |
| Uncontrolled before and after (pre–post) | Simple observations before and after, no baseline trends |
| Cross-sectional designs | Simple correlation, no baseline, no measure of change |
Further guidance on research design hierarchy is available (
These design principles have implications for the tens of billions of dollars spent on medical research in the United States each year. Systematic reviews of health care intervention studies show that half or more of published studies use weak designs and are untrustworthy. The results of weak study design are flawed science, misconstrued policies, and potentially billions or trillions of wasted dollars.
This article and these case reports barely break the surface of what can go wrong in studies of health care. If we do not learn and apply the basics of research design, scientists will continue to generate flip-flopping studies that emphasize drama over reality, and policy makers, journalists, and the public will continue to be perplexed. Adherence to the principles outlined in this article will help users of research discriminate between biased findings and credible findings of health care studies.
This project was supported by a Thomas O. Pyle Fellowship (Dr Soumerai) from the Department of Population Medicine, Harvard Medical School, and Harvard Pilgrim Health Care Institute, Boston; and a grant from the Commonwealth Fund (no. 20120504). Dr Soumerai received grant support from the Centers for Disease Control and Prevention’s Natural Experiments for Translation in Diabetes (NEXT-D). Dr Majumdar receives salary support as a Health Scholar (Alberta Heritage Foundation for Medical Research and Alberta Innovates – Health Solutions) and holds the Endowed Chair in Patient Health Management (Faculties of Medicine and Dentistry and Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Alberta, Canada). We are grateful to Dr Jeanne Madden and Wendy Drobnyk for editorial assistance, Ellen Taratus for outstanding editing of this article, and Caitlin Lupton for her careful analysis of numerous articles and graphic design. The Commonwealth Fund is a national, private foundation in New York City that supports independent research on health care issues and makes grants to improve health care practice and policy. The views presented here are those of the author and not necessarily those of The Commonwealth Fund, its directors, officers or staff.
The opinions expressed by authors contributing to this journal do not necessarily reflect the opinions of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions.
To obtain credit, you should first read the journal article. After reading the article, you should be able to answer the following, related, multiple-choice questions. To complete the questions (with a minimum 75% passing score) and earn continuing medical education (CME) credit, please go to
Which of the following statements regarding healthy user bias in clinical research is
It refers to the fact that healthy people are often excluded from clinical trials
It refers to the fact that healthy people actively seek treatment, including that available in clinical trials
The best means to reduce healthy user bias is to use an epidemiologic cohort study design
It is not possible to mitigate against healthy user bias in cohort studies
All of the following factors led to an overestimation of the efficacy of electronic health records (EHRs) in improving patient outcomes
Selection bias in larger health centers using more EHRs
Selection bias in more academic centers using more EHRs
Selection bias in younger physicians using more EHRs
Testing of EHRs in randomized controlled trials
Which of the following methods is the
An interrupted time-series study design
Confirmation of study end points by patients themselves
Confirmation of study end points by regional or national databases
Elimination of a random half of the study population during study analysis
Which of the following statements regarding other types of bias in clinical research is
Social desirability bias occurs when researchers alter their outcomes to reflect preferred results
Social desirability bias occurs in studies of the general public, but not in research of physicians' behavior
History bias refers to trends preceding an intervention that can affect a study outcome more than the intervention itself
Evaluating trends in the projected outcome before the intervention does not reduce history bias
|
| ||||
|
|
| |||
| 1 | 2 | 3 | 4 | 5 |
|
| ||||
|
|
| |||
| 1 | 2 | 3 | 4 | 5 |
|
| ||||
|
|
| |||
| 1 | 2 | 3 | 4 | 5 |
|
| ||||
|
|
| |||
| 1 | 2 | 3 | 4 | 5 |