Note from the National Guideline Clearinghouse (NGC): This guideline was developed by the National Clinical Guideline Centre (NCGC) on behalf of the National Institute for Health and Clinical Excellence (NICE) (see the "Availability of Companion Documents" field for the full version of this guidance).
Methods of Combining Clinical Studies
Data Synthesis for Intervention Reviews
Where possible, meta-analyses were conducted to combine the results of studies for each review question using Cochrane Review Manager (RevMan5) software. Fixed-effects (Mantel-Haenszel) techniques were used to calculate risk ratios (relative risk) for binary outcomes. Continuous outcomes were analysed using an inverse variance method for pooling weighted mean differences and where the studies had different scales, standardised mean differences were used. Where reported, time-to-event data was presented as a hazard ratio.
Statistical heterogeneity between individual study results in a meta-analysis was assessed by considering the Chi-squared test for significance at p<0.1 or an I-squared inconsistency statistic of >50% to indicate significant heterogeneity. Where significant heterogeneity was present, reviewers carried out predefined subgroup analyses for length of follow-up, severity of cirrhosis (for groups of patients with variceal bleeding), severity of illness in intensive care/high dependency patients (question 15). Intravenous and oral drug administration of proton pump inhibitors (question 2) and type of combination treatment (question 18) were a priori subgroups due to the specific nature of the questions.
Assessments of potential differences in effect between subgroups were based on the Chi-squared tests for heterogeneity statistics between subgroups. If no subgroup analysis was found to completely resolve statistical heterogeneity then a random effects (DerSimonian and Laird) model was employed to provide a more conservative estimate of the effect.
The means and standard deviations of continuous outcomes were required for meta-analysis. However, in cases where standard deviations were not reported, the standard error was calculated if the p-values or 95% confidence intervals were reported and meta-analysis was undertaken with the mean and standard error using the generic inverse variance method in Cochrane Review Manager (RevMan5) software. Where p values were reported as "less than", a conservative approach was undertaken. For example, if p value was reported as "p ≤0.001", the calculations for standard deviations will be based on a p value of 0.001.
For binary outcomes, absolute event rates were also calculated using the GRADEpro software using event rate in the control arm of the pooled results.
Data and Outcomes
In studies for the risk assessment review all patients received a formal risk assessment which was then scored according to the particular system(s) under investigation. Patients could then be categorised into those that scored above or below a clinically specified cut-off points (as described in more detail in chapter 5 of the full version of the original guideline. This allowed reviewers to extract the proportion of those above and those below the cut-off who experienced a particular outcome. From this reviewers derived components of "2x2 tables" (true positives, false positives, true negatives and false negatives) and then calculated accuracy parameters: sensitivity, specificity, positive/negative predictive value and positive/negative likelihood ratios. For some studies areas under curve of a receiver operating characteristics curve (AUC, which is another accuracy measure) was also extracted. When data were only graphical presented (with sufficient levels of detail), frequencies were extracted from the figures to create 2x2 tables (this is noted in the extraction Tables in section 2 of Appendix F of the full version of the original guideline document).
Data Synthesis for Risk Assessment Data
When data from 5 or more studies were available, a diagnostic meta-analysis was carried out. Graphs of point estimates for sensitivity and specificity with 95% confidence intervals were presented side-by-side for individual studies using Cochrane Review Manager (RevMan5) software. To show the differences between study results on graphical space, pairs of sensitivity and specificity were plotted for each study on one receiver operating characteristics (ROC) curve in Microsoft EXCEL software (for RevMan5 and Excel plots please see Appendix L of the full version of the original guideline document). Study results were pooled using the bivariate method for the direct estimation of summary sensitivity and specificity using a random effects approach (in WinBUGS® software - for the program code see Appendix L of the full version of the original guideline document). This model also assesses the variability by incorporating the precision with which sensitivity and specificity have been measured in each study. A confidence ellipse is shown in the graph that indicates the confidence region around the summary sensitivity/specificity point. A summary ROC curve is also presented. From the WinBUGS® output the summary estimate of sensitivity and specificity (plus their standard deviations) are reported in the graphical presentation of the meta-analysis results. The bivariate meta-analysis method is described in more detail in Appendix L of the full version of the original guideline document.
Type of Analysis
Estimates of effect from individual studies were based on intention-to-treat (ITT) analysis with the exception of the outcome of experience of adverse events where available case analysis (ACA) was used. ITT analysis is where all participants included in the randomisation process were considered in the final analysis based on the intervention and control groups to which they were originally assigned. It was assumed that participants in the trials lost to follow-up did not experience the outcome of interest (for categorical outcomes) and they would not considerably change the average scores of their assigned groups (for quantitative outcomes).
It is important to note that ITT analyses tend to bias the results towards no difference. ITT analysis is a conservative approach to analyse the data, and therefore the effect may be smaller than in reality. However, the majority of outcomes selected to be reviewed were continuous outcomes, very few people dropped out and most of the studies reported data on an ITT basis.
Appraising the Quality of Evidence by Outcomes
The evidence for outcomes from the included RCT and observational studies were evaluated and presented using an adaptation of the 'Grading of Recommendations Assessment, Development and Evaluation (GRADE) toolbox' developed by the international GRADE working group (http://www.gradeworkinggroup.org/ ). The software (GRADEpro) developed by the GRADE working group was used to assess the quality of each outcome, taking into account individual study quality and the meta-analysis results. The summary of findings is presented in landscape tables in the full version of the original guideline. The GRADE summary table includes details of the quality assessment as well as pooled outcome data, where appropriate, an absolute measure of intervention effect and the summary of quality of evidence for that outcome. For binary outcomes such as number of patients with an adverse event, the event rates (n/N: number of patients with events divided by sum of number of patients) are shown with percentages. Reporting or publication bias was only taken into consideration in the quality assessment and included in the Clinical Study Characteristics table if it was apparent. Each outcome was examined separately for the quality elements listed and defined in Table 1 of the full version of the guideline and each graded using the quality levels listed in Table 2 of the full guideline. The main criteria considered in the rating of these elements are discussed below. Footnotes were used to describe reasons for grading a quality element as having serious or very serious problems. The ratings for each component were summed to obtain an overall assessment for each outcome.
Grading the Quality of Clinical Evidence
After results were pooled, the overall quality of evidence for each outcome was considered (see the "Rating Scheme for the Strength of the Evidence" field). The following procedure was adopted when using Grading of Recommendations Assessment, Development and Evaluation (GRADE):
- A quality rating was assigned, based on the study design. RCTs start HIGH and observational studies as LOW, uncontrolled case series as LOW or VERY LOW.
- The rating was then downgraded for the specified criteria: Study limitations, inconsistency, indirectness, imprecision and reporting bias. These criteria are detailed in the full version of the original guideline. Observational studies were upgraded if there was: a large magnitude of effect, dose-response gradient, and if all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results showed no effect. Each quality element considered to have "serious" or "very serious" risk of bias was rated down -1 or -2 points respectively.
- The downgraded/upgraded marks were then summed and the overall quality rating was revised. For example, all RCTs started as HIGH and the overall quality became MODERATE, LOW or VERY LOW if 1, 2 or 3 points were deducted respectively.
- The reasons or criteria used for downgrading were specified in the footnotes.
Adaptation of GRADE for Risk Scoring Outcomes
GRADE rating tables were adapted for this review. In the first section they were presented for each risk assessment system and each outcome. Another adapted GRADE table is presented for the results of the diagnostic meta-analyses for which outcomes of some of the pre-endoscopy scoring systems were combined.
Compared to intervention studies, in risk scoring assessment studies different study designs and statistics are appropriate. Therefore the intervention GRADE table was adapted for this review to reflect these differences. For each risk outcome (mortality, rebleeding and need for intervention) results were summarised across studies. For each a range of sensitivity, specificity, positive/negative predictive value, negative likelihood ration and area under curve were reported. The aspects of GRADE were then assessed across studies. Currently no standard risk of bias checklist is used for these types of studies at the NCGC. Study limitations were assessed by considering patient selection. These were: retrospective study design, representativeness of study population, study population size, whether all patients received the assessment and how much loss to follow-up was reported and whether or what type of validation sample was used in the development of the rating system). Imprecision was downgraded whenever there was a difference in the range of reported diagnostic statistics that was ≥10%.
For data in the diagnostic meta-analyses study limitations were assessed according to the same criteria. Inconsistency was assessed by inspection of the sensitivity/specificity plots and imprecision was rated according to the confidence region of the summary plots (please see Appendix L in the full guideline document).
Additional information related to factors that affect quality such as study limitations, inconsistency, indirectness and imprecision are detailed in the full guideline document.
Evidence of Cost-Effectiveness
The Health Economist:
- Identified potentially relevant studies for each review question from the economic search results by reviewing titles and abstracts – full papers were then obtained.
- Reviewed full papers against pre-specified inclusion/exclusion criteria to identify relevant studies.
- Critically appraised relevant studies using the economic evaluations checklist as specified in The Guidelines Manual.
- Extracted key information about the study's methods and results into evidence tables (evidence tables are included in Appendix G of the full guideline document).
- Generated summaries of the evidence in NICE economic evidence profiles (included in the relevant chapter write-ups in the full guideline document).
NICE Economic Evidence Profiles
The NICE economic evidence profile has been used to summarise cost and cost-effectiveness estimates. The economic evidence profile shows, for each economic study, an assessment of applicability and methodological quality, with footnotes indicating the reasons for the assessment. These assessments were made by the health economist using the economic evaluation checklist from The Guidelines Manual, Appendix H6. It also shows incremental costs, incremental outcomes (for example, quality-adjusted life-years [QALYs]) and the incremental cost-effectiveness ratio from the primary analysis, as well as information about the assessment of uncertainty in the analysis. If a non-UK study was included in the profile, the results were converted into pounds sterling using the appropriate purchasing power parity.
Where economic studies compare multiple strategies, results are generally presented in the economic evidence profiles as an incremental analysis where possible. This is where an intervention is compared with the next most expensive non-dominated option – a clinical strategy is said to 'dominate' the alternatives when it is both more effective and less costly. Otherwise results were presented for the pair-wise comparison specified in the review question.
Undertaking New Health Economic Analysis
As well as reviewing the published economic literature for each review question, as described above, new economic analysis was undertaken by the Health Economist in priority areas. Priority areas for new health economic analysis were agreed by the GDG after formation of the review questions and consideration of the available health economic evidence.
Additional data for the analysis was identified as required through additional literature searches undertaken by the Health Economist, and discussion with the GDG. Model structure, inputs and assumptions were explained to and agreed by the GDG members during meetings, and they commented on subsequent revisions.
See Appendix I and J of the full version of the original guideline for details of the health economic analysis/analyses undertaken for the guideline.
NICE's report 'Social value judgements: principles for the development of NICE guidance' sets out the principles that GDGs should consider when judging whether an intervention offers good value for money.
In general, an intervention was considered to be cost effective if either of the following criteria applied (given that the estimate was considered plausible):
- The intervention dominated other relevant strategies (that is, it was both less costly in terms of resource use and more clinically effective compared with all the other relevant alternative strategies), or
- The intervention cost less than £20,000 per quality-adjusted life-year (QALY) gained compared with the next best strategy.
If the GDG recommended an intervention that was estimated to cost more than £20,000 per QALY gained, or did not recommend one that was estimated to cost less than £20,000 per QALY gained, the reasons for this decision are discussed explicitly in the 'from evidence to recommendations' section of the relevant chapter with reference to issues regarding the plausibility of the estimate or to the factors set out in the 'Social value judgements: principles for the development of NICE guidance.'