Canadian Medical Association Journal 1995; 153: 1575-1581
Paper reprints of the full text may be obtained from: Dr. Antoni S.H. Basinski, Institute for Clinical Evaluative Sciences in Ontario, Rm. G-2, 2075 Bayview Ave., North York ON M4N 3M5; fax 416 480-6048
Contents
When applied to patient care, CPGs fall under the broad definition of medical technologies: "techniques . . . and procedures used by health-care professionals in delivering medical care to individuals, and the systems within which such care is delivered."(8) Unbridled diffusion of unproven medical technologies has led to the demand that they be evaluated. The edict "no evaluation -- no technology" has taken hold when managers and policymakers consider the adoption of technologies.(9) Yet despite the proliferation of guidelines and the enthusiasm with which they are promoted, they are relatively rarely systematically evaluated.
Increasingly, the call is made for guidelines that are evidence-based -- that is, founded on medical practice of demonstrated efficacy and effectiveness. The evidence for guidelines themselves, however, does not provide a basis for more than speculative generalizations about their performance and effects.
CPGs are embedded in the clinical milieu, intercalated among a myriad of differing structures and processes. Thus, the evaluation of a health care program within which CPGs play a central role (a guidelines program) must include consideration of the constituent participants -- organizations, patients and payers -- and of the clinical domain and history. The effects of CPGs are more likely to depend on or be influenced by these cofactors than is a drug or a new imaging technology. Because of this variability and the difficulty in manipulating and controlling the entire stock of cofactors, traditional clinical investigative strategies are difficult to apply, and it may not be possible to evaluate CPGs in a generalizable manner.(10,11) Program evaluation is one strategy that is available when what is to be evaluated is embedded in a naturally occurring setting.(12)
Table of Contents
Background
The most common evaluations of CPGs are uncontrolled before-after studies in individual settings.(13-17) These have corroborated the potential impact of CPGs in particular settings and have checked against unexpected results. The detection of changes in practice patterns after promulgation of national guidelines has also been attempted, by time-series methods.(18-20)
Ultimately, to separate the effects of CPGs from those of other factors influencing change, controlled evaluations are necessary. Few such trials of guidelines exist;(21-27) more are needed. For the thousands of CPGs developed internationally Grimshaw and Russell(21) identified only 42 published evaluations that controlled for other factors in some fashion and that involved randomization of patients, practitioners or groups. Of these, only 11 studies measured patient outcome.
In Canada formal evaluation of CPGs is rare. In the first article of this guidelines workshop series,(28) which reported on the survey of 107 organizations with a stake in CPGs, only 7 of the 55 responding organizations had formally evaluated guidelines or were doing so, all since 1992. All the evaluations were before-after assessments, and all were based, at least in part, on administrative data (e.g., laboratory test volume).
Table of Contents
Evaluation types and strategies
Evaluation of CPGs must match the evolutionary stage of the guidelines. Three basic stages of evaluation are identified:
Inception evaluation should establish the guidelines' face and content validity.(29) Basic guidelines characteristics, such as acceptability by practitioners and the public, relevance, clarity, applicability to frequently encountered clinical situations and practicality, may be evaluated. For guidelines with planned wide dissemination (e.g., those sponsored by national organizations) these factors may be best evaluated with preliminary implementation in test sites. Inception evaluation may incorporate either informal, qualitative assessments or more structured, quantitative analyses of response.
The goals of guidelines programs should be stated explicitly. The program may, for instance, seek to improve the health status of the target population or the efficiency of health care delivery. By defining the program goals in measurable terms, evaluations of guidelines programs can ascertain whether stated benchmarks have been achieved.
The evaluations must include, at least, evaluation of the program goals. Other evaluation goals may be identified; for instance, in a program seeking to improve the quality of health care such a goal might be to determine the cost and resource utilization associated with the guidelines program, even if these facets had not been primary considerations in founding the program.
Evaluations of guidelines programs may focus on process, outcome or efficiency.(30) Process evaluations document the extent to which the program was implemented as designed and is serving the target population. Outcome evaluations focus on the extent to which anticipated health outcomes are achieved for the population served. Efficiency evaluations examine cost and resource issues associated with health programs.
Furthermore, an organization may wish not only to verify that desired changes in processes or outcomes of care have occurred, but also to ensure accountability for service delivery(31) or to link adherence to guidelines with reimbursement decisions.(5-7) Guidelines may be attended by incentives if there is consensus on the need for change.(6,32) For example, linking financial incentives to performance in accordance with guidelines led to marked changes in practice behaviour two decades ago.(33) A recent study(7) confirmed dramatic changes in psychotropic drug prescribing, with no adverse effects, after legislated linkage of reimbursement to guidelines adherence. In health care systems in which purchasers buy services only from providers that operate within specified guidelines,(6) continual evaluation of adherence to guidelines is obviously a sine qua non for operation of the program.
Empiric verification that the guidelines program meets organizational objectives without untoward results is necessary. Here, scientific rules of evidence are less important than reliable, routine evaluation strategies that are timely and responsive. Less rigorous but more workable study designs (e.g., uncontrolled before-after, cross-sectional or time-series designs) applied to individual guidelines programs in natural clinical settings may meet evaluation needs.
One limitation of the guidelines-program evaluation is that rigorous inferences about the effects of guidelines themselves are not possible. For example, an uncontrolled before-after evaluation of guidelines designed to decrease length of hospital stay demonstrated a coincident decrease in length of stay of 0.9 days for patients initially admitted to the intensive care unit.(14) Were these results to be reported in the current environment, in which length of hospital stay in Ontario has decreased by approximately half that amount in 1 year for example,(34) it would be difficult to attribute the observed change solely to the influence of CPGs.
The need for scientific evaluation is evident when comparing the results of uncontrolled studies with those of studies with adequately controlled comparative designs. For example, reports from individual hospitals in which physicians participated in guidelines development have described the desired process changes (e.g., decreased rates of cesarean section(13) and laboratory test ordering(15)). However, scientific evaluations of the impact of participation in guidelines development on processes and outcomes suggest that it would be unwarranted to infer that adoption of the guidelines in other settings would result in similar effects. The North of England study demonstrated that physicians who developed prescribing guidelines were more compliant with the guidelines than the physicians who received them.(35) Similarly, in a randomized trial comparing the effects on general practitioners who codeveloped guidelines for the referral, investigation and treatment of dyspepsia cases with the effects on those who did not, no difference was found in rates of referral and appropriate investigation, but the cost of prescriptions was higher in the group that developed the guidelines.(26) In another randomized controlled trial, guidelines developed in a university-based setting were not as successful in hastening return to work after an uncomplicated myocardial infarction when tested in a community practice setting.(24) Controlled scientific evaluations are necessary to elucidate general findings about guidelines, such as the difficulties encountered in implementing them in different settings.
Table of Contents Framework for evaluating CPGs
Evaluation of CPGs requires a partnership between the research community and organizations championing guidelines. The need for guidelines-program and scientific evaluations may coincide and can be more easily met by a pooling of resources and expertise.
Table 1 displays a framework for evaluating CPGs. A sequential checklist for working through the framework is useful:
Table of Contents
Barriers and supports to evaluation
In the survey described in the first article of the series,(28) organizations were asked to identify barriers and supports to the evaluation of CPGs. The most commonly identified were resource issues, time constraints, methodologic complexity, the lack of expertise in conducting evaluations, data limitations and the lack of consensus across stakeholder groups about goals and objectives for guidelines. These are similar to obstacles identified in a limited survey of eight US organizations.(1)
The creation of CPGs is only the initial step in the policy-iteration cycle(3) and may be the least expensive. Funding only the creation of guidelines orphans them. Successful evaluation requires organizational commitment and the support of clinicians for the guidelines program and its goals. If the commitment and participation are lacking or goals are diffuse, guidelines programs and their evaluation will likely fail.
Systems for routine collection of valid data from which to assess the impact of guidelines programs (e.g., comorbidity and health status information) are often lacking. When adequate routine data are not available or do not meet evaluation needs, specialized data collection may be prohibitively expensive and obtrusive. In addition, unique data collection is time specific and hence does not fit into a continuous evaluation-review cycle. Finally, as noted earlier, a well-developed evaluation system specific to guidelines does not exist, and there is not a cadre of well-trained evaluators.
The main arguments for evaluation include the need to demonstrate professional accountability to consumers and payers. The high priority for demonstrating quality of care as well as the need for continuing education in evidence-based practice are also identified. National medical organizations, the academic community and hospitals are all identified as supporting the evaluation of CPGs. In some organizations, evaluation has become an obligatory aspect of guidelines development and promulgation.
Table of Contents
Conclusion
Unless the evaluation of CPGs is ranked on a level with development and implementation, the lessons learned about the need to establish the safety and efficacy of unproven medical technologies are bound to be lost. Routine planning for the evaluation of guidelines must be part of any guidelines-development program. As the culture of accountability to the public and payers develops, the need for a culture of routine evaluation of health services increases. Some organizations have already identified this necessity. The Minnesota Clinical Comparison and Assessment Project uses a cycle of review and revision to compare performance with guidelines.(36) The American Medical Association has identified evaluation and revision as the last two steps of an eight-step strategy to incorporate practice parameters into quality assessment, assurance and improvement.(2) In Alberta and British Columbia current proposals for funding guidelines development identify evaluation steps.(28) For example, the Clinical Practice Guidelines Program Proposal of the Alberta Medical Association plans the evaluation of the impact of guidelines 2 and 3 years after implementation. Each guideline is to be accompanied by an estimate of its health and economic impacts. The proposal requires that the supports necessary to perform evaluations are identified and that mechanisms and processes to obtain these supports are specified. To date, no published results are available.
Although the primary goal for guidelines development or implementation may be restricted, it would be preferable if the potential impact of CPGs were estimated explicitly at the time of their development, before dissemination and implementation. This would provide justification for widespread support of the CPGs as well as benchmarks for evaluating guidelines programs. Areas in which the impact could be estimated include the following:
To date, scientific evaluations have compared "guideline" with "no-guideline" or "usual-care" groups. These are akin to the early trials of thrombolysis used to test the efficacy of a drug in the treatment of acute myocardial infarction.(37,38) After the relative benefit of this drug class over placebo had been established subsequent trials compared different thrombolytic agents and different routes of administration.(39,40) The next generation of guidelines studies must compare different guidelines factors -- for example, varying development, incorporation and implementation strategies.
Finally, all incarnations of CPGs must have a limited life span. Evaluation and the evolution of new clinical knowledge provide the necessary impetus for the continual improvement of guidelines programs.