Evaluation of clinical practice guidelines

Antoni S.H. Basinski, MD, PhD, CCFP

Canadian Medical Association Journal 1995; 153: 1575-1581

[résumé]


Paper reprints of the full text may be obtained from: Dr. Antoni S.H. Basinski, Institute for Clinical Evaluative Sciences in Ontario, Rm. G-2, 2075 Bayview Ave., North York ON M4N 3M5; fax 416 480-6048
Abstract
Résumé
Introduction
Background
Evaluation types and strategies
Framework for evaluating CPGs
Barriers and supports to evaluation
Conclusion
References

Abstract

Compared with the current focus on the development of clinical practice guidelines the effort devoted to their evaluation is meagre. Yet the ultimate success of guidelines depends on routine evaluation. Three types of evaluation are identified: evaluation of guidelines under development and before dissemination and implementation; evaluation of health care programs in which guidelines play a central role; and scientific evaluation, through studies that provide the scientific knowledge base for further evolution of guidelines. Identification of evaluation and program goals, evaluation design and a framework for evaluation planning are discussed.

Résumé

Comparativement à l'importance accordée actuellement à l'élaboration de guides de pratique clinique, l'effort consacré à leur évaluation est faible. La réussite ultime des guides est toutefois liée à une évaluation régulière. On décrit trois types d'évaluation : l'évaluation des guides en préparation, avant la diffusion et la mise en oeuvre, l'évaluation des programmes de soins de santé où les guides jouent un rôle pivot, et l'évaluation scientifique par des études qui fournissent les connaissances scientifiques de base nécessaires à l'évolution des guides. Il est question de définition des buts de l'évaluation et du programme, de la conception de l'évaluation et d'un cadre de planification de l'évaluation.

Top of document


Introduction

Clinical practice guidelines (CPGs) have been widely produced and disseminated.(1-3) They are touted as vehicles for improving the quality of health care and decreasing costs and utilization. Increasingly, they form the basis for assessing accountability in the delivery of health care services.(4-7)

When applied to patient care, CPGs fall under the broad definition of medical technologies: "techniques . . . and procedures used by health-care professionals in delivering medical care to individuals, and the systems within which such care is delivered."(8) Unbridled diffusion of unproven medical technologies has led to the demand that they be evaluated. The edict "no evaluation -- no technology" has taken hold when managers and policymakers consider the adoption of technologies.(9) Yet despite the proliferation of guidelines and the enthusiasm with which they are promoted, they are relatively rarely systematically evaluated.

Increasingly, the call is made for guidelines that are evidence-based -- that is, founded on medical practice of demonstrated efficacy and effectiveness. The evidence for guidelines themselves, however, does not provide a basis for more than speculative generalizations about their performance and effects.

CPGs are embedded in the clinical milieu, intercalated among a myriad of differing structures and processes. Thus, the evaluation of a health care program within which CPGs play a central role (a guidelines program) must include consideration of the constituent participants -- organizations, patients and payers -- and of the clinical domain and history. The effects of CPGs are more likely to depend on or be influenced by these cofactors than is a drug or a new imaging technology. Because of this variability and the difficulty in manipulating and controlling the entire stock of cofactors, traditional clinical investigative strategies are difficult to apply, and it may not be possible to evaluate CPGs in a generalizable manner.(10,11) Program evaluation is one strategy that is available when what is to be evaluated is embedded in a naturally occurring setting.(12)

Top of document

Background

The most common evaluations of CPGs are uncontrolled before-after studies in individual settings.(13-17) These have corroborated the potential impact of CPGs in particular settings and have checked against unexpected results. The detection of changes in practice patterns after promulgation of national guidelines has also been attempted, by time-series methods.(18-20)

Ultimately, to separate the effects of CPGs from those of other factors influencing change, controlled evaluations are necessary. Few such trials of guidelines exist;(21-27) more are needed. For the thousands of CPGs developed internationally Grimshaw and Russell(21) identified only 42 published evaluations that controlled for other factors in some fashion and that involved randomization of patients, practitioners or groups. Of these, only 11 studies measured patient outcome.

In Canada formal evaluation of CPGs is rare. In the first article of this guidelines workshop series,(28) which reported on the survey of 107 organizations with a stake in CPGs, only 7 of the 55 responding organizations had formally evaluated guidelines or were doing so, all since 1992. All the evaluations were before-after assessments, and all were based, at least in part, on administrative data (e.g., laboratory test volume).

Top of document

Evaluation types and strategies

Evaluation of CPGs must match the evolutionary stage of the guidelines. Three basic stages of evaluation are identified:

Inception evaluation

Inception evaluation is geared toward identifying the need for the development of CPGs, assessing the characteristics of the guidelines and fitting them into guidelines programs. Although few businesses would risk the failure of full-scale product releases without prior testing and test marketing, these preliminary activities have been largely ignored in most guidelines development endeavours. Inception evaluation is a necessary precursor to dissemination and implementation of CPGs. When guidelines are internally developed this task starts with assessment of the need for new guidelines and continues to the stage of full dissemination and implementation. For externally developed guidelines, inception evaluation may be a necessary first step in establishing a guidelines program, as is the case when guidelines have not undergone prior evaluation or when the clinical milieu or implementation strategy differs widely from that of the original guideline.

Inception evaluation should establish the guidelines' face and content validity.(29) Basic guidelines characteristics, such as acceptability by practitioners and the public, relevance, clarity, applicability to frequently encountered clinical situations and practicality, may be eval- uated. For guidelines with planned wide dissemination (e.g., those sponsored by national organizations) these factors may be best evaluated with preliminary implementation in test sites. Inception evaluation may incorporate either informal, qualitative assessments or more structured, quantitative analyses of response.

Guidelines-program evaluation

The principal stage of evaluation occurs after dissemination and implementation of CPGs. Guidelines are not merely abstract distillations of clinical information: they are intended to be used to achieve the goals of health care service programs. Hence, the evaluation plan must consider the operation and products of the health service delivery system into which the guidelines have been incorporated rather than simply the guidelines themselves. Health care programs that explicitly use guidelines to achieve program goals are called guidelines programs.

The goals of guidelines programs should be stated explicitly. The program may, for instance, seek to improve the health status of the target population or the efficiency of health care delivery. By defining the program goals in measurable terms, evaluations of guidelines programs can ascertain whether stated benchmarks have been achieved.

The evaluations must include, at least, evaluation of the program goals. Other evaluation goals may be identified; for instance, in a program seeking to improve the quality of health care such a goal might be to determine the cost and resource utilization associated with the guidelines program, even if these facets had not been primary considerations in founding the program.

Evaluations of guidelines programs may focus on process, outcome or efficiency.(30) Process evaluations document the extent to which the program was implemented as designed and is serving the target population. Outcome evaluations focus on the extent to which anticipated health outcomes are achieved for the population served. Efficiency evaluations examine cost and resource issues associated with health programs.

Furthermore, an organization may wish not only to verify that desired changes in processes or outcomes of care have occurred, but also to ensure accountability for service delivery(31) or to link adherence to guidelines with reimbursement decisions.(5-7) Guidelines may be attended by incentives if there is consensus on the need for change.(6,32) For example, linking financial incentives to performance in accordance with guidelines led to marked changes in practice behaviour two decades ago.(33) A recent study(7) confirmed dramatic changes in psychotropic drug prescribing, with no adverse effects, after legislated linkage of reimbursement to guidelines adherence. In health care systems in which purchasers buy services only from providers that operate within specified guidelines,(6) continual evaluation of adherence to guidelines is obviously a sine qua non for operation of the program.

Empiric verification that the guidelines program meets organizational objectives without untoward results is necessary. Here, scientific rules of evidence are less important than reliable, routine evaluation strategies that are timely and responsive. Less rigorous but more workable study designs (e.g., uncontrolled before-after, cross-sectional or time-series designs) applied to individual guidelines programs in natural clinical settings may meet evaluation needs.

One limitation of the guidelines-program evaluation is that rigorous inferences about the effects of guidelines themselves are not possible. For example, an uncontrolled before-after evaluation of guidelines designed to decrease length of hospital stay demonstrated a coincident decrease in length of stay of 0.9 days for patients initially admitted to the intensive care unit.(14) Were these results to be reported in the current environment, in which length of hospital stay in Ontario has decreased by approximately half that amount in 1 year for example,(34) it would be difficult to attribute the observed change solely to the influence of CPGs.

Scientific evaluation

There is a need for studies designed to evaluate general aspects of guidelines development, format, dissemination and implementation. Such scientific evaluations aim to test hypotheses and provide a scientific base for further guidelines development. These studies must strike a balance between external and internal validity. External validity refers to the ability to draw generalizable conclusions about guidelines technology (e.g., Are practitioners more likely to follow evidence-based guidelines than guidelines based on expert opinion?). Studies that place a high priority on internal validity (establishing a causal link between guidelines factors and their effects) may require more rigorous control of guidelines implementation, often at the expense of generalizability.

The need for scientific evaluation is evident when comparing the results of uncontrolled studies with those of studies with adequately controlled comparative designs. For example, reports from individual hospitals in which physicians participated in guidelines development have described the desired process changes (e.g., decreased rates of cesarean section(13) and laboratory test ordering(15)). However, scientific evaluations of the impact of participation in guidelines development on processes and outcomes suggest that it would be unwarranted to infer that adoption of the guidelines in other settings would result in similar effects. The North of England study demonstrated that physicians who developed prescribing guidelines were more compliant with the guidelines than the physicians who received them.(35) Similarly, in a randomized trial comparing the effects on general practitioners who codeveloped guidelines for the referral, investigation and treatment of dyspepsia ca! ses with the effects on those who did not, no difference was found in rates of referral and appropriate investigation, but the cost of prescriptions was higher in the group that developed the guidelines.(26) In another rand omized controlled trial, guidelines developed in a university-based setting were not as successful in hastening return to work after an uncomplicated myocardial infarction when tested in a community practice setting.(24) Controlled scientific evaluations are necessary to elucidate general findings about guidelines, such as the difficulties encountered in implementing them in different settings.

Top of document

Framework for evaluating CPGs

Evaluation of CPGs requires a partnership between the research community and organizations championing guidelines. The need for guidelines-program and scientific evaluations may coincide and can be more easily met by a pooling of resources and expertise.

Table 1 displays a framework for evaluating CPGs. A sequential checklist for working through the framework is useful:

  1. Identify the type of evaluation required.
  2. Identify the goals for the guidelines. For guidelines development identify the need for guidelines. For guidelines programs identify the organizational goals for the program (e.g., the primary goal may be to improve certain quality parameters, to contain costs or to improve the appropriateness of health care delivery).
  3. Identify goals for the evaluation.
  4. Choose an evaluation design congruent with the evaluation type and goals.

Top of document

Barriers and supports to evaluation

In the survey described in the first article of the series,(28) organizations were asked to identify barriers and supports to the evaluation of CPGs. The most commonly identified were resource issues, time constraints, methodologic complexity, the lack of expertise in conducting evaluations, data limitations and the lack of consensus across stakeholder groups about goals and objectives for guidelines. These are similar to obstacles identified in a limited survey of eight US organizations.1

The creation of CPGs is only the initial step in the policy-iteration cycle(3) and may be the least expensive. Funding only the creation of guidelines orphans them. Successful evaluation requires organizational commitment and the support of clinicians for the guidelines program and its goals. If the commitment and participation are lacking or goals are diffuse, guidelines programs and their evaluation will likely fail.

Systems for routine collection of valid data from which to assess the impact of guidelines programs (e.g., comorbidity and health status information) are often lacking. When adequate routine data are not available or do not meet evaluation needs, specialized data collection may be prohibitively expensive and obtrusive. In addition, unique data collection is time specific and hence does not fit into a continuous evaluation-review cycle. Finally, as noted earlier, a well-developed evaluation system specific to guidelines does not exist, and there is not a cadre of well-trained evaluators.

The main arguments for evaluation include the need to demonstrate professional accountability to consumers and payers. The high priority for demonstrating quality of care as well as the need for continuing education in evidence-based practice are also identified. National medical organizations, the academic community and hospitals are all identified as supporting the evaluation of CPGs. In some organizations, evaluation has become an obligatory aspect of guidelines development and promulgation.

Top of document

Conclusion

Unless the evaluation of CPGs is ranked on a level with development and implementation, the lessons learned about the need to establish the safety and efficacy of unproven medical technologies are bound to be lost. Routine planning for the evaluation of guidelines must be part of any guidelines-development program. As the culture of accountability to the public and payers develops, the need for a culture of routine evaluation of health services increases. Some organizations have already identified this necessity. The Minnesota Clinical Comparison and Assessment Project uses a cycle of review and revision to compare performance with guidelines.(36) The American Medical Association has identified evaluation and revision as the last two steps of an eight-step strategy to incorporate practice parameters into quality assessment, assurance and improvement.(2) In Alberta and British Columbia current proposals for funding guidelines development identify evaluation steps.(28) For exa! mple, the Clinical Practice Guidelines Program Proposal of the Alberta Medical Association plans the evaluation of the impact of guidelines 2 and 3 years after implementation. Each guideline is to be accompanied by an estimate of its health and economic impacts. The proposal requires that the supports necessary to perform evaluations are identified and that mechanisms and processes to obtain these supports are specified. To date, no published results are available.

Although the primary goal for guidelines development or implementation may be restricted, it would be preferable if the potential impact of CPGs were estimated explicitly at the time of their development, before dissemination and implementation. This would provide justification for widespread support of the CPGs as well as benchmarks for evaluating guidelines programs. Areas in which the impact could be estimated include the following:

The nature of CPGs makes them not just another medical "magic bullet"; their impact also depends on the prior behaviour of physicians(19) and the structures and processes of the clinical environments in which the guidelines are embedded. Thus, although routine evaluations are mandatory, evaluation methods not commonly encountered in clinical investigation may be the norm.

To date, scientific evaluations have compared "guideline" with "no-guideline" or "usual-care" groups. These are akin to the early trials of thrombolysis used to test the efficacy of a drug in the treatment of acute myocardial infarction.(37,38) After the relative benefit of this drug class over placebo had been established subsequent trials compared different thrombolytic agents and different routes of administration.(39,40) The next generation of guidelines studies must compare different guidelines factors -- for example, varying development, incorporation and implementation strategies.

Finally, all incarnations of CPGs must have a limited life span. Evaluation and the evolution of new clinical knowledge provide the necessary impetus for the continual improvement of guidelines programs.

Top of document

References

  1. Audet AM, Greenfield S, Field M: Medical practice guidelines: current activities and future directions. Ann Intern Med 1990; 113: 709-714
  2. Kelly JT, Toepp MC: Practice parameters: development, evaluation, dissemination, and implementation. QRB Qual Rev Bull 1992; 18: 405-409
  3. Basinski ASH, Naylor CD, Cohen MM et al: Standards, guidelines and clinical policies. Can Med Assoc J 1992; 146: 833-837
  4. Leape LL: Practice guidelines and standards. An overview. QRB Qual Rev Bull 1990; 16: 42-49
  5. Shapiro DW, Lasker RD, Bindman AB et al: Containing costs while improving quality of care: the role of profiling and practice guidelines. Annu Rev Public Health 1993; 14: 219-241
  6. Brook RH: Appropriateness: the next frontier. BMJ 1994; 308: 218-219
  7. Shorr RI, Fought RL, Ray WA: Changes in antipsychotic drug use in nursing homes during implementation of the OBRA-87 regulations. JAMA 1994; 271: 358-362
  8. Institute of Medicine (US): Assessing Medical Technologies, National Academy Press, Washington, 1985
  9. Jennett B: Health technology assessment. BMJ 1992; 305: 67-68
  10. Mittman BS, Siu AL: Changing provider behavior: applying research on outcomes and effectiveness in health care. In Shortell SM, Reinhardt UE (eds): Improving Health Policy and Management: Nine Critical Research Issues for the 1990s, Health Administration Press, Ann Arbor, Mich, 1992: 195-226
  11. Moses LE: Framework for considering the role of data bases in technology assessment. Int J Technol Assess Health Care 1990; 6: 183-193
  12. Ferris LE, Naylor CD, Basinski ASH et al: Program evaluation in health care. Can Med Assoc J 1992; 146: 1301-1304
  13. Myers SA, Gleicher N: A successful program to lower cesarean-section rates. N Engl J Med 1988; 319: 1511-1516
  14. Eagle KA, Mulley AG, Skates SJ et al: Length of stay in the intensive care unit. Effects of practice guidelines and feedback. JAMA 1990; 264: 992-997
  15. Wachtel TJ, O'Sullivan P: Practice guidelines to reduce testing in the hospital. J Gen Intern Med 1990; 5: 335-341
  16. Schectman JM, Elinsky EG, Bartman BA: Primary care clinician compliance with cholesterol treatment guidelines. J Gen Intern Med 1991; 6: 121-125
  17. Weingarten S, Agocs L, Tankel N et al: Reducing lengths of stay for patients hospitalized with chest pain using medical practice guidelines and opinion leaders.Am J Cardiol 1993; 71: 259-262
  18. Kosecoff J, Kanouse DE, Rogers WH et al: Effects of the National Institutes of Health Consensus Development Program on physician practice. JAMA 1987; 258: 2708-2713
  19. Hill MN, Weisman CS: Physicians' perceptions of consensus reports. Int J Technol Assess Health Care 1991; 7: 30-41
  20. Sherman CR, Ptotsky AL, Weis KA et al: The consensus development program. Detecting changes in medical practice following a consensus conference on the treatment of prostate cancer. Int J Technol Assess Health Care 1992; 8: 683-693
  21. Grimshaw JM, Russell IT: Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet 1993; 342: 1317-1322
  22. Schectman JM, Elinsky EG, Pawlson LG: Effect of education and feedback on thyroid function testing strategies of primary care clinicians. Arch Intern Med 1991; 151: 2163-2166
  23. Lomas J, Enkin M, Anderson GM et al: Opinion leaders vs. audit and feedback to implement practice guidelines. JAMA 1991; 265: 2202-2207
  24. Pilote L, Thomas RJ, Dennis C et al: Return to work after uncomplicated myocardial infarction: a trial of practice guidelines in the community. Ann Intern Med 1992; 117: 383-389
  25. Emslie C, Grimshaw J, Templeton A: Do clinical guidelines improve general practice management and referral of infertile couples? BMJ 1993; 306: 1728-1731
  26. Jones RH, Lydeard S, Dunleavey J: Problems with implementing guidelines: a randomised controlled trial of consensus management of dyspepsia. Qual Health 1993; 2: 217-221
  27. Stiell IG, McKnight RD, Greenberg GH et al: Implementation of the Ottawa Ankle Rules. JAMA 1994; 271: 827-832
  28. Carter AO, Battista RN, Hodge MJ et al: Report on activities and attitudes of organizations active in the clinical practice guidelines field. Can Med Assoc J 1995; 153: 901-907
  29. Feinstein A: Clinimetrics, Yale University Press, New Haven, Conn, 1987: 145
  30. Posovac EJ, Carey RG: Program Evaluation: Methods and Case Studies, 3rd ed, Prentice-Hall, Englewood Cliffs, NJ, 1989: 11
  31. Green LW, Lewis FM: Measurement and Evaluation in Health Education and Health Promotion, Mayfield, Palo Alto, Calif, 1986
  32. Kane RL, Garrard J: Changing physician prescribing practices: regulation vs. education. JAMA 1994; 271: 393-394
  33. Brook RH, Williams KN: Effect of medical care review on the use of injections: a study of the New Mexico Experimental Medical Care Review Organization. Ann Intern Med 1976; 85: 509-515
  34. Basinski ASH: Hospital utilization. In Naylor CD, Anderson GM, Goel V (eds): Practice Patterns in Ontario, Institute for Clinical Evaluative Sciences in Ontario, Toronto, 1994
  35. North of England Study of Standards and Performance in General Practice: medical audit in general practice: effects on doctors' clinical behaviour and the health of patients with common childhood conditions. BMJ 1992; 304: 1480-1488
  36. Borbas C, Stump MA, Dedeker K et al: The Minnesota Clinical Comparison and Assessment Project. QRB Qual Rev Bull 1990; 16: 87-92
  37. Gruppo Italiano per lo Studio della Sopravvivenza nell'Infarto Miocardico: GISSI-2: a factorial randomized trial of alteplase versus streptokinase and heparin versus no heparin among 12 490 patients with acute myocardial infarction. Lancet 1990; 336: 65-71
  38. The International Study Group: In-hospital mortality and clinical course of 20 891 patients with suspected acute myocardial infarction randomised between alteplase and streptokinase with or without heparin. Lancet 1990; 336: 71-75
  39. ISIS-3 (Third International Study of Infarct Survival) Collaborative Group: ISIS-3: a randomised comparison of streptokinase vs tissue plasminogen activator vs anistreplase and of aspirin plus heparin vs aspirin alone among 41,299 cases of suspected acute myocardial infarction. Lancet 1992; 339: 753-770
  40. The GUSTO investigators: An international randomized trial comparing four thrombolytic strategies for acute myocardial infarction. N Engl J Med 1993; 329: 673-682

CMAJ December 1, 1995 (vol 153, no 11) / JAMC le 1er décembre 1995 (vol 153, no 11)