Skip to main content
  • Expert Commentary
  • November 22, 2010

Managing Numerous Outcomes: Experience from a Systematic Review of Abnormal Uterine Bleeding Trials

A common, major limitation when conducting comparative effectiveness research is the occurrence of a sizable number of heterogeneous outcomes across studies. This poses a particular challenge for conditions in which the perception of symptoms and burden varies widely across individual patients, despite similar objective disease measures. In these cases, a variety of non-standardized patient reported outcomes are often used. The problem of numerous outcomes may also arise for conditions with diverse treatment alternatives which, in turn, have unique and distinct effects and adverse events. The management of abnormal uterine bleeding (AUB), discussed here, is an example of this situation. Other examples include gastroesophageal reflux disease, renal artery stenosis, and sleep apnea, and other conditions that have medical, surgical, and device treatment options.

In this commentary, we report our experience conducting a systematic review of treatments for AUB for the purpose of guideline development. During this review, we encountered a large number of outcomes examined in treatment trials (1). Presented here is the process our workgroup followed to address and partially overcome this abundance of outcomes. Researchers and clinicians in other fields may find this approach instructive when faced with a similar issue.

Outcome Chaos

AUB, an alteration in the volume, pattern, or duration of menstrual blood flow (2), is the single most common reason for gynecologic referral (3). The large societal and personal burden of AUB lies in its considerable impact on quality of life, productivity, healthcare utilization, and costs (4, 5). Interestingly, nearly half the women who seek treatment for AUB lose less blood monthly than the 80 mL that strictly defines "heavy menstrual bleeding" (6, 7). In fact, many patients are more bothered by the unpredictability of bleeding or pain associated with passage of large clots. The variable and episodic nature of the presenting symptoms and their subjective perception have resulted in increased use of research outcomes to capture the symptoms experienced by patients, often referred to as patient-reported outcome measures (2). Currently, however, many of these measures are poorly standardized and not validated (2). In addition, the use of numerous non-standardized outcomes and the testing of various therapeutic options make direct comparisons of the effectiveness of different surgical and medical interventions difficult. Nonetheless, given the prevalence of AUB, high quality research on treatment modalities for this burdensome condition is imperative for the advancement of women's health.

For the Society of Gynecologic Surgeons (SGS) guideline, the Systematic Review Group (SRG) undertook a systematic review of randomized controlled trials of AUB treatments. Initially, we searched Medline from 1950 through June 2008 for English-language trials on interventions for AUB with at least 10 participants per arm and more than one month follow-up. Within 113 eligible articles, representing 79 trials, we found 114 distinct outcomes. In response to this large number of outcomes, the SGS SRG developed a step-wise process to evaluate the outcomes in order to allow for evidence synthesis and provide recommendations for future research.

Outcome Categories

The SRG created an inventory of all unique outcomes — including adverse events — reported in all trials meeting inclusion criteria. Individual outcomes were detailed in terms of specific definition, which instruments or tests were used, and whether the instruments or tests had been validated.

After review of the inventory, the SRG designated eight overarching outcome categories based on consensus regarding 1) their importance for informing patient choices; and 2) their applicability to capture effects from various interventions of AUB. The categories were: bleeding, quality of life, pain, sexual health, bulk-related symptoms (for patients with anatomic/fibroid-related AUB), patient satisfaction, need for additional treatments, and adverse events. All individual outcomes from the inventory were grouped into these categories. Outcomes determined to have limited relevance for assessing clinical effectiveness were excluded from categorization and further analyses, as were those related to cost and resource utilization.

Grading the Importance of Outcomes

We then rated the importance of each outcome. This allowed the reviewers to weigh the results of a particular outcome when synthesizing benefits and harms across all relevant outcomes. The SRG followed an approach suggested by the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group (8) and used a 9-point scale with outcomes scored as "critical" for decision-making (score 7-9), "important but not critical" for decision-making (score 4-6), and "not important" for decision-making (score 1-3).

In preparation for this rating process, reports on the subjective experience of patients with AUB (9-14) and evaluations of the quality of patient-based outcome measures used for AUB (2, 15) were reviewed. When rating an outcome, consideration was given both to the magnitude of the impact of the outcome on a patient's well-being, and to the quality of the measurement tool used to assess the outcome in terms of what was known about its validity, reliability, and feasibility. The overall impact of the outcome on patient's health and well-being superseded the assessment of the instrument used to measure the outcome. Measurement tools were reviewed and identified as multi- or uni-dimensional, generic or condition-specific, or psychometrically-tested. An outcome received a higher rating if it was measured using a multi-dimensional, condition-specific, or psychometrically-tested instrument. This reflected the SRG's consensus that such instruments have the capacity to more broadly capture symptoms important from the patients' perspective (multi-dimensional or condition-specific) or the ability to provide higher quality measurement of outcomes (psychometrically-tested).

After extensive discussion, the SRG members individually scored each outcome on a ballot. Mean scores were calculated and rounded to the nearest whole number. When ratings or comments diverged, members used an iterative process of discussion and debate to achieve consensus. Members were also asked to report any potential conflicts of interest for this vote related to professional practice, research activities, or financial investments and, if present, to abstain from voting. Adverse events were not scored, as the group decided a priori to follow the CONSORT statement on reporting of harms (16) for appraising the relevance and importance of adverse events.

Application to Evidence Synthesis and Recommendations for Future Research

The SGS SRG successfully applied the above framework in order to complete a systematic review and concomitant clinical practice guideline on AUB treatment. Using the mean scores, the SRG focused on the "important" and "critical" outcomes within the eight categories. Following the GRADE system, the ranking of outcomes was used to weight the results across outcomes to summarize the benefits and harms and assess the overall quality of evidence. This process provided insight into how future research could be conducted to improve the clinical utility of the trials for AUB (1). In particular, to fully ascertain the effects of an AUB intervention, we recommend that all studies assess outcomes across all eight outcome categories. Assessments of bleeding, quality of life, pain related to heavy bleeding, sexual health, and bulk-related complaints (specifically in patients with leiomyomas) should be conducted both before and after treatment, and the change in outcome measure for each patient should be used as the analytic metric. In addition, the SRG recommends documenting patient satisfaction, the need for additional intervention, and adverse events.

The proposed eight outcome categories are not necessarily independent, but capture different aspects of a condition that manifests varying symptoms that can negatively impact a woman's health. The SRG could not provide a definitive list of the exact data and outcome measures that should be collected in future trials, as there is currently no one standardized and validated tool to comprehensively capture outcomes in all categories. Addressing and ameliorating the problem of non-standardized outcome measurement for AUB requires a concerted effort by clinicians, patients, and researchers to narrow the number of outcomes measured. Ideally, a limited, core set of valid, discriminatory, and feasible tools spanning all outcome categories can be agreed upon. This would allow for consistency among future trials and better comparability across them. In the meantime, researchers should consider combining available validated instruments for measuring the entire spectrum of AUB-related symptoms.


The large number and variability of outcomes reported in trials hinders comparison of different interventions, thus limiting their utility for informing clinicians, patients, and policy makers about benefits and harms of alternative treatments. The SRG followed a transparent and explicit approach to organize and rank diverse outcomes from trials on AUB that started with the generation of an outcomes inventory, used a defined process based on expert consensus to define important overarching outcome categories with clinical relevance for the condition and interventions of interest, and employed a grading scheme to rate outcomes. This approach provided new insight into how future research could be improved by focusing on a set group of clinically significant and comparable outcomes, and may be applicable to other fields where heterogeneity in outcomes hampers comparison of results across different treatments.


David D. Rahn, MD
University of Texas Southwestern Medical Center, Dallas, TX

Ethan M. Balk, MD, MPH
Tufts Medical Center, Boston, MA

Vivian Sung, MD, MPH
Women's and Infants Hospital, Alpert Medical School of Brown University, Providence, RI

Katrin Uhlig, MD, MS
Tufts Medical Center, Boston, MA


The views and opinions expressed are those of the authors and do not necessarily state or reflect those of the National Guideline Clearinghouse (NGC), the Agency for Healthcare Research and Quality (AHRQ), or its contractor ECRI Institute.

Potential Financial Conflicts of Interest

Dr. Rahn declared no potential conflicts of interest with respect to this expert commentary.

Dr. Balk notes personal financial interests in QuantRx and Echo Therapeutics. He has worked on the following clinical practice guideline or quality measure development projects: Kidney Disease: Improving Global Outcomes guidelines (various on chronic kidney disease); Kidney Disease Outcome Quality Initiative (various on chronic kidney disease); Society for Gynecological Surgeons (vaginal prolapse; dysfunctional uterine bleeding); American Academy of Orthopaedic Surgeons (anticoagulation for hip and knee replacement). He is also a member of the Core Editorial Board for the National Guideline Clearinghouse and the National Quality Measures Clearinghouse.

Dr. Sung declared no potential conflicts of interest with respect to this expert commentary.

Dr. Uhlig disclosed that she is paid by the National Kidney Foundation for her work in the development of Kidney Disease Improving Global Outcomes (KDIGO) clinical practice guidelines. She also serves as a paid consultant to the Society for Gynecological Surgeons Systematic Review Group in the development of systematic reviews and clinical practice guidelines.


  1. Rahn DD, Abed H, Sung VW, Matteson KA, Rogers RG, Morrill MY, et al. Systematic review highlights difficulty interpreting diverse clinical outcomes in abnormal uterine bleeding trials. J Clin Epidemiol 2010 Aug 11. [Epub ahead of print]
  2. Matteson KA, Boardman LA, Munro MG, Clark MA. Abnormal uterine bleeding: a review of patient-based outcome measures. Fertil Steril 2009;92(1):205-16.
  3. van Dongen H, van de Merwe AG, de Kroon CD, Jansen FW. Diagnostic hysteroscopy in abnormal uterine bleeding: a systematic review and meta-analysis. J Minim Invasive Gynecol 2009;16(1):47-51.
  4. Côté I, Jacobs P, Cumming DC. Use of health services associated with increased menstrual loss in the United States. Am J Obstet Gynecol 2003;188(2):343-8.
  5. Liu Z, Doan QV, Blumenthal P, Dubois RW. A systematic review evaluating health-related quality of life, work impairment, and health-care costs and utilization in abnormal uterine bleeding. Value Health 2007;10(3):183-94.
  6. Haynes P, Hodgson H, Anderson AB, Turnbull AC. Measurement of menstrual blood loss in patients complaining of menorrhagia. Br J Obstet Gynaecol 1977;84:763-8.
  7. Warner PE, Critchley HO, Lumsden MA, Campbell-Brown M, Douglas A, Murray GD. Menorrhagia I: measured blood loss, clinical features, and outcome in women with heavy periods: a survey with follow-up data. Am J Obstet Gynecol 2004;190(5):1216-23.
  8. Guyatt GH, Oxman AD, Kunz R, et al. What is "quality of evidence" and why is it important to clinicians? BMJ 2008;336(7651):995-8.
  9. O'Flynn N, Britten N. Menorrhagia in general practice--disease or illness. Soc Sci Med 2000;50:651-61.
  10. O'Flynn N. Menstrual symptoms: the importance of social factors in women's experiences. Br J Gen Pract 2006;56:950-7.
  11. Santer M, Wyke S, Warner P. What aspects of periods are most bothersome for women reporting heavy menstrual bleeding? Community survey and qualitative study. BMC Womens Health 2007;7:8.
  12. Santer M, Wyke S, Warner P. Women's management of menstrual symptoms: findings from a postal survey and qualitative interviews. Soc Sci Med 2008;66:276-88.
  13. Warner P, Critchley HOD, Lumsden MA, Campbell-Brown M, Douglas A, Murray G. Referral for menstrual problems: cross-sectional survey of symptoms, reasons for referral, and management. BMJ 2001;323:24-8.
  14. Garside R, Britten N, Stein K. The experience of heavy menstrual bleeding: a systematic review and meta-ethnography of qualitative studies. J Adv Nurs 2008;63:550-62.
  15. Clark TJ, Khan KS, Foon R, Pattison H, Bryan S, Gupta JK. Quality of life instruments in studies of menorrhagia: a systematic review. Eur J Obstet Gynecol Reprod Biol 2002:10;104(2):96-104.
  16. Ioannidis JP, Evans SJ, Gøtzsche PC, et al. Better reporting of harms in randomized trials: an extension of the CONSORT statement. Ann Intern Med 2004;141(10):781-8.

Make a Comment

All submitted comments will be reviewed by NGC Staff and NGC's Editorial Board who will elect what to publish on this web site.

* indicates required field

Submit a Comment

Do you have any Disclosures and/or Conflicts of Interest to report? *

Optional Information

About Commenting

The National Guideline Clearinghouse ™ (NGC), sponsored by the Agency for Healthcare Research and Quality (AHRQ), U.S. Department of Health and Human Services, welcomes responses to our posted expert commentaries on the NGC Web site.

All responses will be reviewed by NGC Staff and NGC's Editorial Board. Please keep your response brief and to the point. Responses must be presented in clear statements. We reserve the right to 1) publish your response, 2) copyedit your response, and 3) invite the author of the Expert Commentary to reply to a posted response. Excessively long or offensive and nonobjective responses will be disregarded.

Please complete the form if you wish to have your response made accessible to our users or if you wish for NGC to act on your response. NOTE: Providing such information however, does not ensure that your comment will be published or made available or that NGC will take any course of action. That will be determined by NGC staff in conjunction with our Editorial Board after review of the information you supply. Criteria for publication of responses to commentaries include:

  • Relevance of response to the subject of the commentary
  • Responses that provide a unique perspective to the commentary
  • Responses that offer an alternative view than the commentary
  • Responses that are clearly articulated
  • Responses that are neither excessively long or offensive and/or nonobjective
Form Approved    OMB No. 0935-0106    Exp. Date 11/30/2017

Public reporting burden for this collection of information is estimated to average 90 seconds per response, the estimated time required to complete the survey. An agency may not conduct or sponsor, and a person is not required to respond to, a collection of information unless it displays a currently valid OMB control number. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to: AHRQ Reports Clearance Officer Attention: PRA, Paperwork Reduction Project (0935-0106) AHRQ, 5600 Fishers Lane, Rockville, MD 20857.

Note: Responses accepted for publication will be posted to the NGC Web site along with the submitter's name. Current position or occupation, organizational affiliation, and disclosed potential conflicts of interest are optional and will be posted when provided. Email and other address are for internal use only and will not be published with responses or made publicly available.