Evaluation of clinical practice guidelines

Antoni S.H. Basinski, MD, PhD, CCFP

Canadian Medical Association Journal 1995; 153: 1575-1581

This article is the fifth in a series of six that appeared in the October, November and December issues of CMAJ.

Paper reprints of the full text may be obtained from: Dr. Antoni S.H. Basinski, Institute for Clinical Evaluative Sciences in Ontario, Rm. G-2, 2075 Bayview Ave., North York ON M4N 3M5; fax 416 480-6048

Abstract
Introduction
Background
Evaluation types and strategies
Framework for evaluating CPGs
Barriers and supports to evaluation
Conclusion
References

Abstract

Compared with the current focus on the development of clinical practice guidelines the effort devoted to their evaluation is meagre. Yet the ultimate success of guidelines depends on routine evaluation. Three types of evaluation are identified: evaluation of guidelines under development and before dissemination and implementation; evaluation of health care programs in which guidelines play a central role; and scientific evaluation, through studies that provide the scientific knowledge base for further evolution of guidelines. Identification of evaluation and program goals, evaluation design and a framework for evaluation planning are discussed.

Table of Contents

Introduction

Clinical practice guidelines (CPGs) have been widely produced and disseminated.(1-3) They are touted as vehicles for improving the quality of health care and decreasing costs and utilization. Increasingly, they form the basis for assessing accountability in the delivery of health care services.(4-7)

When applied to patient care, CPGs fall under the broad definition of medical technologies: "techniques . . . and procedures used by health-care professionals in delivering medical care to individuals, and the systems within which such care is delivered."(8) Unbridled diffusion of unproven medical technologies has led to the demand that they be evaluated. The edict "no evaluation -- no technology" has taken hold when managers and policymakers consider the adoption of technologies.(9) Yet despite the proliferation of guidelines and the enthusiasm with which they are promoted, they are relatively rarely systematically evaluated.

Increasingly, the call is made for guidelines that are evidence-based -- that is, founded on medical practice of demonstrated efficacy and effectiveness. The evidence for guidelines themselves, however, does not provide a basis for more than speculative generalizations about their performance and effects.

CPGs are embedded in the clinical milieu, intercalated among a myriad of differing structures and processes. Thus, the evaluation of a health care program within which CPGs play a central role (a guidelines program) must include consideration of the constituent participants -- organizations, patients and payers -- and of the clinical domain and history. The effects of CPGs are more likely to depend on or be influenced by these cofactors than is a drug or a new imaging technology. Because of this variability and the difficulty in manipulating and controlling the entire stock of cofactors, traditional clinical investigative strategies are difficult to apply, and it may not be possible to evaluate CPGs in a generalizable manner.(10,11) Program evaluation is one strategy that is available when what is to be evaluated is embedded in a naturally occurring setting.(12)

Table of Contents

Background

The most common evaluations of CPGs are uncontrolled before-after studies in individual settings.(13-17) These have corroborated the potential impact of CPGs in particular settings and have checked against unexpected results. The detection of changes in practice patterns after promulgation of national guidelines has also been attempted, by time-series methods.(18-20)

Ultimately, to separate the effects of CPGs from those of other factors influencing change, controlled evaluations are necessary. Few such trials of guidelines exist;(21-27) more are needed. For the thousands of CPGs developed internationally Grimshaw and Russell(21) identified only 42 published evaluations that controlled for other factors in some fashion and that involved randomization of patients, practitioners or groups. Of these, only 11 studies measured patient outcome.

In Canada formal evaluation of CPGs is rare. In the first article of this guidelines workshop series,(28) which reported on the survey of 107 organizations with a stake in CPGs, only 7 of the 55 responding organizations had formally evaluated guidelines or were doing so, all since 1992. All the evaluations were before-after assessments, and all were based, at least in part, on administrative data (e.g., laboratory test volume).

Table of Contents

Evaluation types and strategies

Evaluation of CPGs must match the evolutionary stage of the guidelines. Three basic stages of evaluation are identified:

evaluation during the development of guidelines and before their full dissemination and implementation (inception evaluation);
evaluation of health care programs within which guidelines play a central role (guidelines-program evaluation); and
evaluation of the effects of guidelines within defined health care environments (scientific evaluation).

Inception evaluation

Inception evaluation is geared toward identifying the need for the development of CPGs, assessing the characteristics of the guidelines and fitting them into guidelines programs. Although few businesses would risk the failure of full-scale product releases without prior testing and test marketing, these preliminary activities have been largely ignored in most guidelines development endeavours. Inception evaluation is a necessary precursor to dissemination and implementation of CPGs. When guidelines are internally developed this task starts with assessment of the need for new guidelines and continues to the stage of full dissemination and implementation. For externally developed guidelines, inception evaluation may be a necessary first step in establishing a guidelines program, as is the case when guidelines have not undergone prior evaluation or when the clinical milieu or implementation strategy differs widely from that of the original guideline.

Inception evaluation should establish the guidelines' face and content validity.(29) Basic guidelines characteristics, such as acceptability by practitioners and the public, relevance, clarity, applicability to frequently encountered clinical situations and practicality, may be evaluated. For guidelines with planned wide dissemination (e.g., those sponsored by national organizations) these factors may be best evaluated with preliminary implementation in test sites. Inception evaluation may incorporate either informal, qualitative assessments or more structured, quantitative analyses of response.

Guidelines-program evaluation

The principal stage of evaluation occurs after dissemination and implementation of CPGs. Guidelines are not merely abstract distillations of clinical information: they are intended to be used to achieve the goals of health care service programs. Hence, the evaluation plan must consider the operation and products of the health service delivery system into which the guidelines have been incorporated rather than simply the guidelines themselves. Health care programs that explicitly use guidelines to achieve program goals are called guidelines programs.

The goals of guidelines programs should be stated explicitly. The program may, for instance, seek to improve the health status of the target population or the efficiency of health care delivery. By defining the program goals in measurable terms, evaluations of guidelines programs can ascertain whether stated benchmarks have been achieved.

The evaluations must include, at least, evaluation of the program goals. Other evaluation goals may be identified; for instance, in a program seeking to improve the quality of health care such a goal might be to determine the cost and resource utilization associated with the guidelines program, even if these facets had not been primary considerations in founding the program.

Evaluations of guidelines programs may focus on process, outcome or efficiency.(30) Process evaluations document the extent to which the program was implemented as designed and is serving the target population. Outcome evaluations focus on the extent to which anticipated health outcomes are achieved for the population served. Efficiency evaluations examine cost and resource issues associated with health programs.

Furthermore, an organization may wish not only to verify that desired changes in processes or outcomes of care have occurred, but also to ensure accountability for service delivery(31) or to link adherence to guidelines with reimbursement decisions.(5-7) Guidelines may be attended by incentives if there is consensus on the need for change.(6,32) For example, linking financial incentives to performance in accordance with guidelines led to marked changes in practice behaviour two decades ago.(33) A recent study(7) confirmed dramatic changes in psychotropic drug prescribing, with no adverse effects, after legislated linkage of reimbursement to guidelines adherence. In health care systems in which purchasers buy services only from providers that operate within specified guidelines,(6) continual evaluation of adherence to guidelines is obviously a sine qua non for operation of the program.

Empiric verification that the guidelines program meets organizational objectives without untoward results is necessary. Here, scientific rules of evidence are less important than reliable, routine evaluation strategies that are timely and responsive. Less rigorous but more workable study designs (e.g., uncontrolled before-after, cross-sectional or time-series designs) applied to individual guidelines programs in natural clinical settings may meet evaluation needs.

One limitation of the guidelines-program evaluation is that rigorous inferences about the effects of guidelines themselves are not possible. For example, an uncontrolled before-after evaluation of guidelines designed to decrease length of hospital stay demonstrated a coincident decrease in length of stay of 0.9 days for patients initially admitted to the intensive care unit.(14) Were these results to be reported in the current environment, in which length of hospital stay in Ontario has decreased by approximately half that amount in 1 year for example,(34) it would be difficult to attribute the observed change solely to the influence of CPGs.

Scientific evaluation

There is a need for studies designed to evaluate general aspects of guidelines development, format, dissemination and implementation. Such scientific evaluations aim to test hypotheses and provide a scientific base for further guidelines development. These studies must strike a balance between external and internal validity. External validity refers to the ability to draw generalizable conclusions about guidelines technology (e.g., Are practitioners more likely to follow evidence-based guidelines than guidelines based on expert opinion?). Studies that place a high priority on internal validity (establishing a causal link between guidelines factors and their effects) may require more rigorous control of guidelines implementation, often at the expense of generalizability.

The need for scientific evaluation is evident when comparing the results of uncontrolled studies with those of studies with adequately controlled comparative designs. For example, reports from individual hospitals in which physicians participated in guidelines development have described the desired process changes (e.g., decreased rates of cesarean section(13) and laboratory test ordering(15)). However, scientific evaluations of the impact of participation in guidelines development on processes and outcomes suggest that it would be unwarranted to infer that adoption of the guidelines in other settings would result in similar effects. The North of England study demonstrated that physicians who developed prescribing guidelines were more compliant with the guidelines than the physicians who received them.(35) Similarly, in a randomized trial comparing the effects on general practitioners who codeveloped guidelines for the referral, investigation and treatment of dyspepsia cases with the effects on those who did not, no difference was found in rates of referral and appropriate investigation, but the cost of prescriptions was higher in the group that developed the guidelines.(26) In another randomized controlled trial, guidelines developed in a university-based setting were not as successful in hastening return to work after an uncomplicated myocardial infarction when tested in a community practice setting.(24) Controlled scientific evaluations are necessary to elucidate general findings about guidelines, such as the difficulties encountered in implementing them in different settings.

Table of Contents

Framework for evaluating CPGs

Evaluation of CPGs requires a partnership between the research community and organizations championing guidelines. The need for guidelines-program and scientific evaluations may coincide and can be more easily met by a pooling of resources and expertise.

Table 1 displays a framework for evaluating CPGs. A sequential checklist for working through the framework is useful:

Identify the type of evaluation required.
- Inception evaluation: Guideline has not been developed or has not been implemented, or both.
- Guidelines-program evaluation: Guidelines play a central role in a functioning health care program. When possible, set objective specifications for the processes and outcomes of care that are expected to be influenced or to change. Identify measurable components of the program.
- Scientific evaluation: Obtain generalizable evidence about aspects of guidelines technology.
Identify the goals for the guidelines. For guidelines development identify the need for guidelines. For guidelines programs identify the organizational goals for the program (e.g., the primary goal may be to improve certain quality parameters, to contain costs or to improve the appropriateness of health care delivery).
Identify goals for the evaluation.
- Inception evaluation: Goals may be needs assessment of guidelines, face and content validity, and pilot testing of feasibility of implementation.
- Guidelines-program evaluation: Goals may be the achievement of guidelines-program primary goals (assessing effects on patient outcomes or on spending) and, possibly, other facets.
- Scientific evaluation: Goals may be independent of those of the guidelines program in which testing occurs.
Choose an evaluation design congruent with the evaluation type and goals.
- Inception evaluation: Exploratory qualitative evaluations may be in order for innovative or new guidelines programs in which a search for unknown or unexpected effects is a priority.
- Guidelines-program evaluation: Before-after or uncontrolled designs may be adequate.
- Scientific evaluation: A controlled trial or other valid design is necessary. Consider the need to control or account for cofactors.
- Specify the evaluation components: study subjects, measurements (e.g., test volumes, delays in referral), data sources (e.g., routinely collected administrative data, patient charts, surveys) and analytic strategies (e.g., control for cofactors through the use of regression analysis).
- Choose a time frame for the evaluation. Are the results necessary immediately or continuously as input into program management? Is the evolution of clinical knowledge rapid, necessitating frequent changes to guidelines content?
- Specify response to the evaluation. Anticipation of the possible results of evaluation may facilitate a timely response to it. For example, what will be done if the evaluation demonstrates that program goals are not met? Some responses will depend on the particular findings of the evaluation, whereas others may be prespecified (e.g., adoption of incentive systems if no improvement is noted).

Table of Contents

Barriers and supports to evaluation

In the survey described in the first article of the series,(28) organizations were asked to identify barriers and supports to the evaluation of CPGs. The most commonly identified were resource issues, time constraints, methodologic complexity, the lack of expertise in conducting evaluations, data limitations and the lack of consensus across stakeholder groups about goals and objectives for guidelines. These are similar to obstacles identified in a limited survey of eight US organizations.(1)

The creation of CPGs is only the initial step in the policy-iteration cycle(3) and may be the least expensive. Funding only the creation of guidelines orphans them. Successful evaluation requires organizational commitment and the support of clinicians for the guidelines program and its goals. If the commitment and participation are lacking or goals are diffuse, guidelines programs and their evaluation will likely fail.

Systems for routine collection of valid data from which to assess the impact of guidelines programs (e.g., comorbidity and health status information) are often lacking. When adequate routine data are not available or do not meet evaluation needs, specialized data collection may be prohibitively expensive and obtrusive. In addition, unique data collection is time specific and hence does not fit into a continuous evaluation-review cycle. Finally, as noted earlier, a well-developed evaluation system specific to guidelines does not exist, and there is not a cadre of well-trained evaluators.

The main arguments for evaluation include the need to demonstrate professional accountability to consumers and payers. The high priority for demonstrating quality of care as well as the need for continuing education in evidence-based practice are also identified. National medical organizations, the academic community and hospitals are all identified as supporting the evaluation of CPGs. In some organizations, evaluation has become an obligatory aspect of guidelines development and promulgation.

Table of Contents

Conclusion

Unless the evaluation of CPGs is ranked on a level with development and implementation, the lessons learned about the need to establish the safety and efficacy of unproven medical technologies are bound to be lost. Routine planning for the evaluation of guidelines must be part of any guidelines-development program. As the culture of accountability to the public and payers develops, the need for a culture of routine evaluation of health services increases. Some organizations have already identified this necessity. The Minnesota Clinical Comparison and Assessment Project uses a cycle of review and revision to compare performance with guidelines.(36) The American Medical Association has identified evaluation and revision as the last two steps of an eight-step strategy to incorporate practice parameters into quality assessment, assurance and improvement.(2) In Alberta and British Columbia current proposals for funding guidelines development identify evaluation steps.(28) For example, the Clinical Practice Guidelines Program Proposal of the Alberta Medical Association plans the evaluation of the impact of guidelines 2 and 3 years after implementation. Each guideline is to be accompanied by an estimate of its health and economic impacts. The proposal requires that the supports necessary to perform evaluations are identified and that mechanisms and processes to obtain these supports are specified. To date, no published results are available.

Although the primary goal for guidelines development or implementation may be restricted, it would be preferable if the potential impact of CPGs were estimated explicitly at the time of their development, before dissemination and implementation. This would provide justification for widespread support of the CPGs as well as benchmarks for evaluating guidelines programs. Areas in which the impact could be estimated include the following:

Access: What proportion of the target population is likely to be reached by health services using the CPGs?
Quality: What process and outcome measures will be influenced? What happens to those treated under the CPGs? What health benefits can be expected for participants in the program?
Efficiency: What are the costs of the program? What resources are required to implement the program?

The nature of CPGs makes them not just another medical "magic bullet"; their impact also depends on the prior behaviour of physicians(19) and the structures and processes of the clinical environments in which the guidelines are embedded. Thus, although routine evaluations are mandatory, evaluation methods not commonly encountered in clinical investigation may be the norm.

To date, scientific evaluations have compared "guideline" with "no-guideline" or "usual-care" groups. These are akin to the early trials of thrombolysis used to test the efficacy of a drug in the treatment of acute myocardial infarction.(37,38) After the relative benefit of this drug class over placebo had been established subsequent trials compared different thrombolytic agents and different routes of administration.(39,40) The next generation of guidelines studies must compare different guidelines factors -- for example, varying development, incorporation and implementation strategies.

Finally, all incarnations of CPGs must have a limited life span. Evaluation and the evolution of new clinical knowledge provide the necessary impetus for the continual improvement of guidelines programs.

Table of Contents

References

Audet AM, Greenfield S, Field M: Medical practice guidelines: current activities and future directions. Ann Intern Med 1990; 113: 709-714
Kelly JT, Toepp MC: Practice parameters: development, evaluation, dissemination, and implementation. QRB Qual Rev Bull 1992; 18: 405-409
Basinski ASH, Naylor CD, Cohen MM et al: Standards, guidelines and clinical policies. CMAJ 1992; 146: 833-837
Leape LL: Practice guidelines and standards. An overview. QRB Qual Rev Bull 1990; 16: 42-49
Shapiro DW, Lasker RD, Bindman AB et al: Containing costs while improving quality of care: the role of profiling and practice guidelines. Annu Rev Public Health 1993; 14: 219-241
Brook RH: Appropriateness: the next frontier. BMJ 1994; 308: 218-219
Shorr RI, Fought RL, Ray WA: Changes in antipsychotic drug use in nursing homes during implementation of the OBRA-87 regulations. JAMA 1994; 271: 358-362
Institute of Medicine (US): Assessing Medical Technologies, National Academy Press, Washington, 1985
Jennett B: Health technology assessment. BMJ 1992; 305: 67-68
Mittman BS, Siu AL: Changing provider behavior: applying research on outcomes and effectiveness in health care. In Shortell SM, Reinhardt UE (eds): Improving Health Policy and Management: Nine Critical Research Issues for the 1990s, Health Administration Press, Ann Arbor, Mich, 1992: 195-226
Moses LE: Framework for considering the role of data bases in technology assessment. Int J Technol Assess Health Care 1990; 6: 183-193
Ferris LE, Naylor CD, Basinski ASH et al: Program evaluation in health care. CMAJ 1992; 146: 1301-1304
Myers SA, Gleicher N: A successful program to lower cesarean-section rates. N Engl J Med 1988; 319: 1511-1516
Eagle KA, Mulley AG, Skates SJ et al: Length of stay in the intensive care unit. Effects of practice guidelines and feedback. JAMA 1990; 264: 992-997
Wachtel TJ, O'Sullivan P: Practice guidelines to reduce testing in the hospital. J Gen Intern Med 1990; 5: 335-341
Schectman JM, Elinsky EG, Bartman BA: Primary care clinician compliance with cholesterol treatment guidelines. J Gen Intern Med 1991; 6: 121-125
Weingarten S, Agocs L, Tankel N et al: Reducing lengths of stay for patients hospitalized with chest pain using medical practice guidelines and opinion leaders.Am J Cardiol 1993; 71: 259-262
Kosecoff J, Kanouse DE, Rogers WH et al: Effects of the National Institutes of Health Consensus Development Program on physician practice. JAMA 1987; 258: 2708-2713
Hill MN, Weisman CS: Physicians' perceptions of consensus reports. Int J Technol Assess Health Care 1991; 7: 30-41
Sherman CR, Ptotsky AL, Weis KA et al: The consensus development program. Detecting changes in medical practice following a consensus conference on the treatment of prostate cancer. Int J Technol Assess Health Care 1992; 8: 683-693
Grimshaw JM, Russell IT: Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet 1993; 342: 1317-1322
Schectman JM, Elinsky EG, Pawlson LG: Effect of education and feedback on thyroid function testing strategies of primary care clinicians. Arch Intern Med 1991; 151: 2163-2166
Lomas J, Enkin M, Anderson GM et al: Opinion leaders vs. audit and feedback to implement practice guidelines. JAMA 1991; 265: 2202-2207
Pilote L, Thomas RJ, Dennis C et al: Return to work after uncomplicated myocardial infarction: a trial of practice guidelines in the community. Ann Intern Med 1992; 117: 383-389
Emslie C, Grimshaw J, Templeton A: Do clinical guidelines improve general practice management and referral of infertile couples? BMJ 1993; 306: 1728-1731
Jones RH, Lydeard S, Dunleavey J: Problems with implementing guidelines: a randomised controlled trial of consensus management of dyspepsia. Qual Health 1993; 2: 217-221
Stiell IG, McKnight RD, Greenberg GH et al: Implementation of the Ottawa Ankle Rules. JAMA 1994; 271: 827-832
Carter AO, Battista RN, Hodge MJ et al: Report on activities and attitudes of organizations active in the clinical practice guidelines field. CMAJ 1995; 153: 901-907
Feinstein A: Clinimetrics, Yale University Press, New Haven, Conn, 1987: 145
Posovac EJ, Carey RG: Program Evaluation: Methods and Case Studies, 3rd ed, Prentice-Hall, Englewood Cliffs, NJ, 1989: 11
Green LW, Lewis FM: Measurement and Evaluation in Health Education and Health Promotion, Mayfield, Palo Alto, Calif, 1986
Kane RL, Garrard J: Changing physician prescribing practices: regulation vs. education. JAMA 1994; 271: 393-394
Brook RH, Williams KN: Effect of medical care review on the use of injections: a study of the New Mexico Experimental Medical Care Review Organization. Ann Intern Med 1976; 85: 509-515
Basinski ASH: Hospital utilization. In Naylor CD, Anderson GM, Goel V (eds): Practice Patterns in Ontario, Institute for Clinical Evaluative Sciences in Ontario, Toronto, 1994
North of England Study of Standards and Performance in General Practice: medical audit in general practice

Disclaimer

This guideline is for reference and education only and is not intended to be a substitute for the advice of an appropriate health care professional or for independent research and judgement. The CMA relies on the source of the CPG to provide updates and to notify us if the guideline becomes outdated. The CMA assumes no responsibility or liability arising from any outdated information or from any error in or omission from the guideline or from the use of any information contained in it.

CMAJ: Dec. 1, 1995

"Guidelines Workshop" articles

This page was last updated Jan. 8, 1996.