Note from the National Guideline Clearinghouse (NGC): This guideline was developed by the National Clinical Guideline Centre (NCGC) on behalf of the National Institute for Health and Clinical Excellence (NICE). See the "Availability of Companion Documents" field for the full version of this guidance.
Methods of Combining Clinical Studies
Data Synthesis for Intervention Reviews
Where possible, meta-analyses were conducted to combine the results of studies for each review question using Cochrane Review Manager (RevMan5) software. Fixed-effects (Mantel-Haenszel) techniques were used to calculate risk ratios (relative risk) for the binary outcomes: mortality, amputation free survival, cardiovascular events, adverse events, re-intervention rates, and withdrawal rates. The continuous outcomes: quality of life, walking distance, exercise level at follow up, change in ankle brachial pressure index (ABPI) pain measures, duration of pain control, and patient satisfaction were analysed using an inverse variance method for pooling weighted mean differences and where the studies had different scales, standardised mean differences were used. Where reported, time-to-event data was presented as a hazard ratio.
Three network meta-analyses were considered for the guideline. The three proposed networks were for the outcome of walking distance in the intermittent claudication (IC) population, mortality in the critical limb ischaemia (CLI) population and amputation free survival in the CLI population. None of the network meta-analyses were methodologically possible to conduct due to lack of evidence to build complete networks for the outcomes proposed.
Statistical heterogeneity was assessed by considering the chi-squared test for significance at probability (p) <0.1 or an I-squared inconsistency statistic of >50% to indicate significant heterogeneity. Where significant heterogeneity was present, a sensitivity analysis was carried out based on the quality of studies if there were differences, with particular attention paid to allocation concealment, blinding, and loss to follow-up (missing data). In cases where there was inadequate allocation concealment, unclear blinding, more than 50% missing data, or differential missing data, this was examined in a sensitivity analysis. For the latter, the duration of follow up was also taken into consideration prior to including in a sensitivity analysis.
Assessments of potential differences in effect between subgroups were based on the chi-squared tests for heterogeneity statistics between subgroups. If no sensitivity analysis was found to completely resolve statistical heterogeneity then a random effects (DerSimonian and Laird) model was employed to provide a more conservative estimate of the effect.
For continuous outcomes, the means and standard deviations were required for meta-analysis. However, in cases where standard deviations were not reported, the standard error was calculated if the p-values or 95% confidence intervals were reported and meta-analysis was undertaken with the mean and standard error using the generic inverse variance method in Cochrane Review Manager (RevMan5) software. When the only evidence was based on studies summarised results by only presenting means this information was included in the GRADE tables without calculating the relative and absolute effect.
For binary outcomes, absolute event rates were also calculated using the GRADEpro software using event rate in the control arm of the pooled results.
Data Synthesis for Diagnostic Test Accuracy Review
Evidence for diagnostic data was evaluated by study, using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklists.
For diagnostic test accuracy studies, the following data were extracted, either directly from the study report or calculated from other study data: components of the "2x2 table" (true positives, false positives, false negatives, and true negatives) and test accuracy parameters: sensitivity, specificity, positive/negative predictive values, and positive/negative likelihood ratios (there are other outcomes that can be included such as area under curve [AUC] for receiver operator characteristics [ROC] curves) reproducibility, applicability, and inter- and intra-operative reliability). In cases where the outcomes were not reported, 2x2 tables were constructed from raw data to allow calculation of accuracy measures.
Forest plots of sensitivity and specificity with their 95% confidence intervals were presented side-by-side for individual studies using Cochrane Review Manager (RevMan5) software (for RevMan see Appendix J of the full version of the original guideline document).
When data from five or more studies were available, a diagnostic meta-analysis was carried out. To show the differences between study results, pairs of sensitivity and specificity were plotted for each study on one ROC curve in Microsoft EXCEL software (for Excel plots please see Appendix J). A ROC plot shows true positive rate (i.e., sensitivity) as a function of false positive rate (i.e., 1 – specificity). Study results were pooled using the bivariate method for the direct estimation of summary sensitivity and specificity using a random effects approach (in WinBUGS® software - for the program code see Appendix J of the full version of the original guideline document). This model also assesses the variability by incorporating the precision by which sensitivity and specificity have been measured in each study. A confidence ellipse is shown in the graph that indicates the confidence region around the summary sensitivity/specificity point. A summary ROC curve is also presented. From the WinBUGS® output the summary estimate of sensitivity and specificity (plus their 95% confidence intervals) as well as between study variation measured as logit sensitivity and specificity as well as correlations between the two measures of variation are reported. The summary diagnostic odds ratio with its 95% confidence interval is also reported.
Appraising the Quality of Evidence by Outcomes
The evidence for outcomes from the included RCTs and observational studies were evaluated and presented using an adaptation of the 'Grading of Recommendations Assessment, Development, and Evaluation (GRADE) toolbox' developed by the international GRADE working group (http://www.gradeworkinggroup.org/ ). The software (GRADEpro) developed by the GRADE working group was used to assess the quality of each outcome, taking into account individual study quality and the meta-analysis results. The summary of findings was presented as one table in the full version of the original guideline document (called clinical evidence profiles). This includes the details of the quality assessment pooled outcome data, and where appropriate, an absolute measure of intervention effect and the summary of quality of evidence for that outcome. In this table, the columns for intervention and control indicate the sum of the sample size for continuous outcomes. For binary outcomes such as number of patients with an adverse event, the event rates (n/N: number of patients with events divided by sum of number of patients) are shown with percentages. Reporting or publication bias was only taken into consideration in the quality assessment.
Each outcome was examined separately for the quality elements listed and defined in Table 4 of the full version of the original guideline document and each graded using the quality levels listed in Table 5 of the full version of the original guideline document and the table shown in the "Rating Scheme for the Strength of the Evidence" field. The main criteria considered in the rating of these elements are discussed in section 3.3.6 "Grading of Evidence" in the full version of the original guideline document. Footnotes were used to describe reasons for grading a quality element as having serious or very serious problems. The ratings for each component were summed to obtain an overall assessment for each outcome.
The GRADE toolbox is currently designed only for RCTs and observational studies but however, for the purposes of this guideline, the quality assessment elements and outcome presentation was adapted for diagnostic accuracy and qualitative studies.
After results were pooled, the overall quality of evidence for each outcome was considered. See section 3.3.6 in the full version of the original guideline document and the "Rating Scheme for the Strength of the Evidence" field for additional detail.
Grading the Quality of Clinical Evidence
After results were pooled, the overall quality of evidence for each outcome was considered (see the "Rating Scheme for the Strength of the Evidence" field). The following procedure was adopted when using Grading of Recommendations Assessment, Development and Evaluation (GRADE):
- A quality rating was assigned, based on the study design. RCTs start HIGH and observational studies as LOW, uncontrolled case series as LOW or VERY LOW.
- The rating was then downgraded for the specified criteria: Study limitations, inconsistency, indirectness, imprecision and reporting bias. These criteria are detailed in the full version of the original guideline. Observational studies were upgraded if there was: a large magnitude of effect, dose-response gradient, and if all plausible confounding would reduce a demonstrated effect or suggest a spurious effect when results showed no effect. Each quality element considered to have "serious" or "very serious" risk of bias was rated down -1 or -2 points respectively.
- The downgraded/upgraded marks were then summed and the overall quality rating was revised. For example, all RCTs started as HIGH and the overall quality became MODERATE, LOW or VERY LOW if 1, 2 or 3 points were deducted respectively.
- The reasons or criteria used for downgrading were specified in the footnotes.
Evidence was also appraised for study limitations, inconsistency, indirectness, and imprecision. See sections 3.3.7-3.3.10 in the full version of the original guideline document for detail.
NICE Economic Evidence Profiles
The NICE economic evidence profile has been used to summarise cost and cost-effectiveness estimates (see Table 10 in the full version of the original guideline document). The economic evidence profile includes an assessment of applicability and methodological quality for each economic study, with footnotes indicating the reasons for each assessment. These assessments were made by the health economist using the economic evaluation checklist from the Guidelines Manual 2009. It also shows incremental costs, incremental outcomes (for example, quality-adjusted life years [QALYs]), and the incremental cost-effectiveness ratio, as well as information about the assessment of uncertainty in the analysis.
Several of the pair wise clinical comparisons conducted in the IC population concerned the same decision question. Due to the nature of the question and the difficulty of considering multiple-comparator evaluations in a pair wise context, the clinical and economic evidence for these questions were presented in separate sections in the full version of the original guideline document.
All costs converted into 2009/10 pounds sterling using the appropriate purchasing power parity.
Undertaking New Health Economic Analysis
As well as reviewing the published economic literature for each review question, as described above, new economic analysis was undertaken by the health economist in priority selected areas. Priority areas for new health economic analysis were agreed by the GDG after formation of the review questions and consideration of the available health economic evidence.
The GDG identified the treatment of IC using exercise and endovascular interventions as the highest priority areas for original economic modelling. Specifically, these areas include the cost effectiveness of supervised compared to unsupervised exercise, and exercise compared to angioplasty for the treatment of IC.
The following general principles were adhered to in developing the cost-effectiveness analysis:
- Methods were consistent with the NICE reference case
- The GDG was involved in the design of the model, selection of inputs and interpretation of the results
- Model inputs were based on the systematic review of the clinical literature supplemented with other published data sources where possible
- When published data was not available GDG expert opinion was used to populate the model
- Model inputs and assumptions were reported fully and transparently
- The results were subject to sensitivity analysis and limitations were discussed
- The model was peer-reviewed by another health economist at the NCGC
Additional data for the analysis was identified as required through additional literature searches undertaken by the health economist and in discussion with the GDG. Model structure, inputs, and assumptions were explained to and agreed by the GDG members during meetings, and they commented on subsequent revisions.
Full methods for the original health economic analyses undertaken for this guideline are described in Appendices K and L of the full version of the original guideline document.
NICE's report 'Social value judgements: principles for the development of NICE guidance' sets out the principles that GDGs should consider when judging whether an intervention offers good value for money.
In general, an intervention was considered to be cost effective if either of the following criteria applied (given that the estimate was considered plausible):
- The intervention dominated other relevant strategies (that is, it was both less costly in terms of resource use and more clinically effective compared with all the other relevant alternative strategies), or
- The intervention cost less than £20,000 per QALY gained compared with the next best strategy.
If the GDG recommended an intervention that was estimated to cost more than £20,000 per QALY gained, or did not recommend one that was estimated to cost less than £20,000 per QALY gained, the reasons for this decision are discussed explicitly in the 'recommendations and link to evidence' section of the relevant chapter of the full version of the original guideline document with reference to issues regarding the plausibility of the estimate or to the factors set out in the 'Social value judgements: principles for the development of NICE guidance'.
In the Absence of Cost-Effectiveness Evidence
When no relevant published studies were found, and a new analysis was not prioritised, the GDG made a qualitative judgement about cost effectiveness by considering expected differences in resource use between comparators and relevant United Kingdom National Health Service unit costs alongside the results of the clinical review of effectiveness evidence.
Health-Related Quality of Life
Early in the guideline development process, the GDG decided that they wished to inform the economic analyses with health related quality of life obtained directly from the included clinical studies. Changes in disease specific functional disability would be captured by including walking distance as an outcome. The NICE reference case specifies that the EQ-5D is the preferred method of QALY measurement. Therefore, only EQ-5D values or health state descriptions which could be mapped to EQ-5D were included as measures of health related quality of life. Disease specific questionnaires and other generic health profiles were not included as outcomes in the review.