Skip to main content
  • Expert Commentary
  • August 31, 2009

On Quality Measurement, Babies, and Bathwater

Note from the National Quality Measures and National Guideline Clearinghouses: The following Commentary was derived from the author's blog and revised with his permission.

Quality Measurement mavens are reeling these days, as a result of the air being let out of high-profile measures such as tight glucose control (1), door-to-antibiotic time (2), and beta-blockers (3). Some critics have even suggested that we put a moratorium on new quality measures until the science improves. I hope we don't.

I think we're seeing a natural, and fairly predictable, ebb and flow, and our reaction — even to these significant setbacks — should be thoughtful and measured. Here's why:

The publication of the Institute of Medicine's Quality Chasm report (4) (and McGlynn's findings [5] that we adhere to evidence-based practice about half the time) generated intense pressure to launch transparency and pay-for-performance initiatives. Finding virtually no outcome measures ready for prime time (the data collection burden was too large and the science of case-mix adjustment too immature), policymakers and payers logically looked to process measures (aspirin, angiotensin-converting enzyme [ACE] inhibitors, pneumovax) for common diseases (myocardial infarction, congestive heart failure, pneumonia), delivered in settings (hospitals) that could be held accountable. And they sought levels of evidence that were, if not perfect, at least good enough.

The National Quality Forum (NQF) was created to vet this evidence. But the NQF has a problem not unlike that of the U.S. Food and Drug Administration (FDA): too low an evidence bar and bad measures become "law"; too high a bar and the ravenous hunger for quality measures goes unsated. Unsurprisingly, the demand for measures won out and the bar was set relatively low — not so much in terms of study design, but rather in terms of the degree to which initial promising studies had their findings confirmed by subsequent research.

With that as prelude, we shouldn't be shocked by what we're seeing now: a mini-pattern in which one or two tightly managed, single-site studies that showed great benefit are followed by studies done in more diverse and real-world settings whose results are disappointing. It has always been thusly. The difference is that now, by the time the later studies are published, the quality measures have long since been disseminated.

I won't belabor the point since I've covered this ground previously in my discussions of the individual measures (6-8). But the fascinating trend to watch now is the beginnings of a Quality Measurement Backlash — it's not a full-fledged, "spontaneous" tea party just yet, but the night is young. Consider, for example, the Jerry Groopman/Pamela Hartzband article (9) in the Wall Street Journal which stated:

"In too many cases, the quality measures have been hastily adopted, only to be proven wrong and even potentially dangerous to patients... Yet too often quality metrics coerce doctors into rigid and ill-advised procedures. Orwell could have written about how the word "quality" became zealously defined by regulators, and then redefined with each change in consensus guidelines..."

The solution, say the authors, is to stop the presses:

"Before a surgeon begins an operation, he must stop and call a "time-out" to verify that he has all the correct information and instruments to safely proceed. We need a national time-out in the rush to mandate what policymakers term quality care to prevent doing more harm than good."

If that wasn't enough fun for the quality measurers, the article (10) by Rachel Werner and Bob McNutt in a recent Journal of the American Medical Association (JAMA) surely was. After critiquing today's measures for the usual reasons, the authors suggest a "new approach":

"First, the focus of quality improvement initiatives should be on improving rather than measuring quality of care... Second, quality improvement initiatives should be tied to local actions and local results rather than national norms. This acknowledges that quality improvement efforts are not generalizable and one solution does not fit all."

"...Quality improvement incentives can be restructured based on these principles. Current incentives are based on measured performance and are benchmarked to national norms. An alternative approach is to tie incentives to the local process of improving quality of care rather than the results of quality measures. This could take the form of requiring local teams of quality improvement personnel to identify problems through investigation, identify solutions to these problems, implement solutions, and document local improvement... A logical next step is to tie current quality improvement incentives to this approach-pay based on participation in quality improvement efforts rather than simply comparing each other on measures that do not reflect the learning that is required to really improve care."

The small, elite group of nationally recognized measure vettors is feeling increasingly besieged. You may have seen the comment on this blog by one of them, Dale Bratzler of the Oklahoma Foundation for Medical Quality (creator of many national measures, including those used in the Surgical Care Improvement Program [SCIP]), written in response to my post on tight glucose control. Bratzler wrote:

"...I am tiring of some of the criticisms related to quality initiatives because the authors of those criticisms often fall victim to the same practices that they criticize. It seems to be increasingly common for opinion pieces, editorials, anecdotal reports, underpowered studies, and single-institution studies to be used to suggest that quality initiatives are resulting in widespread patient harm. Frankly, I have not seen systematic evidence of that for most national quality initiatives and in some cases, have data to suggest that for many of the conditions targeted in those initiatives, patient outcomes are slowly but progressively improving."

Bratzler goes on to state, correctly, that the glucose standard in SCIP was not the brutally tight 80-110 mg/dL, but rather a more generous (and less dangerous) <200 mg/dL. He then acknowledges that

"...some hospitals undoubtedly go beyond the requirements of the SCIP measure and that could result in harm... But on a national basis, surgical outcomes actually are improving over time and there is no national requirement to implement programs of intensive blood sugar control."

That last point is technically accurate, but other national campaigns, or influential national organizations, have promoted tighter control than that recommended in SCIP. For example, the Surviving Sepsis Campaign targets a glucose level of <150 mg/dL (11), and the Institute for Healthcare Improvement's (IHI) target (12) is the Van den Berghe standard of 80-110 mg/dL (13) (although, to be fair, IHI stuck to the SCIP standards in their recently completed 5 Million Lives campaign [14]).

But before we get too distracted by all these angels dancing atop an insulin pen, let's take a step back and consider the big picture. We've seen that some widely disseminated and promoted performance measures haven't worked out as intended — usually because evidence emerged that was less impressive than the initial salvo.

And now we have Groopman and Hartzband arguing that we should take a "time out" on quality measures, leaving it to doctors to make their own choices since only they truly know their patients. Do we really believe that the world will be a better place if we went back to every doctor deciding by him or herself what treatment to offer, when we have irrefutable data demonstrating huge gaps between evidence-based and actual practice? Even when we KNOW the right thing to do (as in hand washing), we fail to do it nearly half the time! Do the authors really believe that the strategy should remain "Doctor Knows Best"; just stay out of our collective hair?

And if we agree that we need some measurement to catalyze improvement efforts, do we really want measures that can be met through elaborate dog-and-pony shows, with no demonstration of improved processes or outcomes? Sure, the Joint Commission should check to be sure that hospitals use strong QI methods, but there has to be more, much more. A close colleague, one of the world's foremost quality experts, wrote me about the Werner/McNutt article, finding it unhelpful:

"...because you still have the measurement problem — how are you going to know whether or not any of these actions are actually happening, and whether or not they are actually improving anything?"

The sentence in the JAMA piece, "This could take the form of requiring local teams of quality improvement personnel to identify problems through investigation, identify solutions to these problems, implement solutions, and document local improvement" reminded this colleague of the Total Quality Management (TQM) fad twenty years ago, when some accreditors and insurers began requiring documentation of "storyboards" of hospitals' Plan-Do-Study-Act (PDSA) cycles (15). He recalled visiting some hospitals preparing for inspections and,

"...I would see the storyboards on the wards and they were just laughable in terms of what they presented and their supposed relation to cause-and-effect. It was all a charade, done to get through the inspection and nothing — or very close to nothing — meaningful was really being accomplished."

So, to the Dale Bratzler's of the world, I say, Courage! Keep it up. And don't let the bums (including this one) get you (too) down. The bottom line is that we need quality measures, and we need rigorous research to create good ones. When we do create a flawed measure (an inevitability), let's admit it and fix it. If hospitals have been pushed toward tight glucose control based on now-partly discredited evidence, let's say so, improve the measure, and resolve to learn something from the experience — not just when we're thinking about glucose measurement but also when we're considering the strength of the evidence supporting other difficult-to-implement and potentially dangerous practices. Ditto for door-to-antibiotic timing.

As for me, I'll keep critiquing bad measures and pointing out when new science emerges that changes how we should think about existing measures. But I'll continue to support thoughtful implementation of transparency programs and experiments using Pay-for-Performance (P4P). From where I sit, of all our options to meet the mandates to improve quality and safety, tenaciously clinging to a Marcus Welbian (and demonstrably low quality) status quo or creating tests that can be passed by appearing to be working on improvement seem like two of the worst.


Robert M. Wachter, MD
University of California, San Francisco, CA


The views and opinions expressed are those of the author and do not necessarily state or reflect those of the National Guideline Clearinghouse (NGC), the Agency for Healthcare Research and Quality (AHRQ), or its contractor, ECRI Institute.

Potential Conflicts of Interest

Dr. Wachter is the Project Director and Lead Contractor of AHRQ WebM&M and AHRQ Patient Safety Network, for which he receives compensation. He is also a paid member of Google's Healthcare Advisory Board and a member of the Board of Directors of the American Board of Internal Medicine.


  1. Finfer S, et al. Intensive versus conventional glucose control in critically ill patients. N Engl J Med 2009 Mar 26;360(13):1283-97. Epub 2009 Mar 24.
  2. Wachter R, et al. Public reporting of antibiotic timing in patients with pneumonia: lessons from a flawed performance measure, Ann Int Med 2008 Jul 1;149(1):29-32.
  3. Devereaux PJ, et al. The Perioperative Ischemic Evaluation (POISE) trial: a randomized controlled trial of metoprolol versus placebo in patients undergoing noncardiac surgery. AHA meeting 2007; Abstract LBCT-20825.
  4. Crossing the quality chasm: a new health system for the 21st century. Committee on Quality of Health Care in America, Institute of Medicine. The National Academies Press, 2001.
  5. McGlynn EA, Asch SM, Adams J, et al. The quality of health care delivered to adults in the United States. N Engl J Med 2003;348:2635-45.
  6. Wachter R. ICU glycemic control: another can't miss quality measure bites the dust. March 30, 2009. Available at: External Web Site Policy .
  7. Wachter R. Perioperative beta blockers, redux. External Web Site Policy .
  8. Wachter R. Door to antibiotics time in pneumonia: lessons from a flawed quality measure. External Web Site Policy .
  9. Groopman J. and Hartzband P. Why 'quality' care is dangerous. Wall Street Journal 2009 Apr 8.
  10. Werner RM and McNutt R. A new strategy to improve quality: rewarding actions rather than measures. JAMA 2009;301(13):1375-77.
  11. Surviving sepsis campaign. Maintain adequate glycemic control. External Web Site Policy .
  12. Establish a glycemic control policy in your ICU. The Institute for Healthcare Improvement. External Web Site Policy .
  13. Van den Berghe G, et al. Intensive insulin therapy in critically ill patients. N Engl J Med 2001 Nov 8;345(19):1359-67.
  14. Prevent surgical site infections. The Institute for Healthcare Improvement. External Web Site Policy .
  15. Plan-Do-Study-Act (PDSA) worksheet. The Institute for Healthcare Improvement. External Web Site Policy .

Make a Comment

All submitted comments will be reviewed by NGC Staff and NGC's Editorial Board who will elect what to publish on this web site.

* indicates required field

Submit a Comment

Do you have any Disclosures and/or Conflicts of Interest to report? *

Optional Information

About Commenting

The National Guideline Clearinghouse ™ (NGC), sponsored by the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services, welcomes responses to our posted expert commentaries on the NGC Web site.

All responses will be reviewed by NGC Staff and NGC's Editorial Board. Please keep your response brief and to the point. Responses must be presented in clear statements. We reserve the right to 1) publish your response, 2) copyedit your response, and 3) invite the author of the Expert Commentary to reply to a posted response. Excessively long or offensive and nonobjective responses will be disregarded.

Please complete the form if you wish to have your response made accessible to our users or if you wish for NGC to act on your response. NOTE: Providing such information however, does not ensure that your comment will be published or made available or that NGC will take any course of action. That will be determined by NGC staff in conjunction with our Editorial Board after review of the information you supply. Criteria for publication of responses to commentaries include:

  • Relevance of response to the subject of the commentary
  • Responses that provide a unique perspective to the commentary
  • Responses that offer an alternative view than the commentary
  • Responses that are clearly articulated
  • Responses that are neither excessively long or offensive and/or nonobjective
Form Approved    OMB No. 0935-0106    Exp. Date 11/30/2017

Public reporting burden for this collection of information is estimated to average 90 seconds per response, the estimated time required to complete the survey. An agency may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB control number. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to: AHRQ Reports Clearance Officer Attention: PRA, Paperwork Reduction Project (0935-0106) AHRQ, 5600 Fishers Lane, Rockville, MD 20857.

Note: Responses accepted for publication will be posted to the NGC Web site along with the submitter's name. Current position or occupation, organizational affiliation, and disclosed potential conflicts of interest are optional and will be posted when provided. Email and other address are for internal use only and will not be published with responses or made publicly available.