|
How valid are utilization review tools in assessing appropriate use of acute care beds? Norman Kalant,* Marc Berlinguet, Jean G. Diodati,* Leonidas Dragatakis,* François Marcotte* CMAJ 2000;162(13):1809-13 See also:
Contents Abstract Background: Despite their widespread acceptance, utilization review tools, which were designed to assess the appropriateness of care in acute care hospitals, have not been well validated in Canada. The aim of this study was to assess the validity of 3 such tools ISD (Intensity of service, Severity of illness, Discharge screens), AEP (Appropriateness Evaluation Protocol) and MCAP (Managed Care Appropriateness Protocol) as determined by their agreement with the clinical judgement of a panel of experts. Methods: The cases of 75 patients admitted to an acute cardiology service were reviewed retrospectively. The criteria of each utilization review tool were applied by trained reviewers to each day the patients spent in hospital. An abstract of each case prepared in a day-by-day format was evaluated independently by 3 cardiologists, using clinical judgement to decide the appropriateness of each day spent in hospital. Results: The panel considered 92% of the admissions and 67% of the subsequent hospital days to be appropriate. The ISD underestimated the appropriateness rates of admission and subsequent days; the AEP and MCAP overestimated the appropriateness rate of subsequent days in hospital. The kappa statistic of overall agreement between tool and panel was 0.45 for ISD, 0.24 for MCAP and 0.25 for AEP, indicating poor to fair validity of the tools. Interpretation: Published validation studies had average kappa values of 0.320.44 (i.e., poor to fair) for admission days and for subsequent days in hospital for the 3 tools. The tools have only a low level of validity when compared with a panel of experts, which raises serious doubts about their usefulness for utilization review. [Contents] Reducing the time that patients spend in hospital is seen as one way to control health care costs. One procedure to demonstrate inefficiency in the use of acute care hospitals is utilization review, using a tool designed to assess the appropriateness of hospital admission and subsequent days spent in hospital by the retrospective application of objective criteria. Three tools widely used for this procedure are the ISD criteria set (Intensity of service, Severity of illness, Discharge screens1), the AEP (Appropriateness Evaluation Protocol2) and the MCAP (Managed Care Appropriateness Protocol3). The need for hospital admission is determined daily by the application of a set of explicit criteria; if these are met for a given day, then that day spent in an acute care hospital is considered appropriate. Although designed for managing care during time spent in hospital, the tools have also been used retrospectively to assess resource utilization. The ISD consists of sets of diagnosis-independent criteria applicable to specified levels of care (e.g., critical, acute, subacute) and to different body systems. Each set consists of 3 sections: severity of illness, intensity of service and discharge screens. The AEP and the MCAP, which was derived from the AEP, each consist of a set of admission criteria and a set of day-of-care criteria related to patient severity of illness and the clinical services required. The criteria are independent of diagnosis or body system and are applicable to all patients. These tools have been used for utilization review in most provinces. Butler and associates4studied the reliability and validity of the AEP. The MCAP was used in 2 institutional studies5,6 and the ISD was used in a number of provincial surveys713 to determine the rate of inappropriate use of acute care beds. In several cases the tool criteria were reviewed before the study to ensure that they reflected local practice, but in none of the studies with the ISD and the MCAP was the tool validated by comparison with decisions of appropriateness obtained by a "gold standard," namely a panel of physicians. The purpose of our study was to assess the validity of the 3 tools as determined by the level of agreement with the judgement of a panel of experts. We used the 1997 versions of the ISD and the MCAP and the last version of the AEP (AEP criteria are restated with no substantive changes, but with notes and clarifications of each criterion in an unpublished manual supplied by Dr. Restuccia, dated May 199814). [Contents] Methods Seventy-five consecutive patients admitted during March 1998 to the coronary care unit (CCU) with a provisional diagnosis of acute myocardial infarction or unstable angina were considered. A health record analyst trained as a primary reviewer for the ISD, and with prior experience in its application to cardiac patients, assessed the admission day and each subsequent day spent in hospital according to the criteria for care at the level at which the patient was being treated (CCU, telemetry unit or acute cardiac unit). A senior nurse with extensive experience in the cardiac unit was trained as a primary reviewer in the use of the MCAP and determined the appropriateness of each day in hospital using that tool. Although the reviews were carried out during each patient's stay in hospital, there was no contact with the attending physicians and no influence on patient care. Daily reviews were ended when the patient was discharged or transferred from the acute care nursing unit. A physician reviewer verified the application of the criteria but did not overrule any decisions on the basis of other information or clinical judgement. Several months later the primary reviewer of the MCAP reassessed the hospital stays using the AEP criteria. A physician who was unaware of the appropriateness ratings obtained with the tools prepared a summary of each patient's stay using a standardized, day-by-day format, and included information on the day on which it became available to the attending physician and residents as well as information contained in the progress notes from the patient's record (copied verbatim when feasible). The summaries were reviewed independently by 3 cardiologists, none of whom was involved in the care of these patients. The cardiologists used their clinical judgement to assess the appropriateness of the admission and of each day in hospital. They then met to discuss their assessments to reach a consensus. However, complete agreement was not essential, and in such cases the majority opinion was accepted as the panel's judgement. Each tool was assessed by the level of overall agreement between the tool and the panel as determined by the kappa statistic15 and by the specific inappropriate agreement (the number of days for which care at an acute care level was judged inappropriate by both the tool and the panel divided by the number of days judged inappropriate by one or both the tool and the panel).16 Kappa has a value of zero if agreement is due to chance alone and a value of 1 if agreement is perfect. Thus, 00.4 indicates poor agreement, 0.40.75 indicates fair to good agreement and 0.751.0 indicates good to perfect agreement.17 Sensitivity and specificity were calculated for each tool, using the panel's judgement as the "gold standard," with "inappropriate" as the equivalent of "positive." Thus, sensitivity is the "true inappropriate" rate of a tool (the proportion of the days considered inappropriate by the panel that were also judged inappropriate by the tool), and specificity is its "true appropriate" rate. Other statistical procedures included the 2 test for differences between proportions and 2-way analysis of variance (unequal subgroup numbers). [Contents] Results The number of days judged appropriate by the tools and the panel are shown in Table 1. Compared with the panel's findings, the number of appropriate days was underestimated by the ISD and overestimated by the MCAP and the AEP, for both admission days and subsequent days. The rates of appropriateness of hospital admission differed significantly between the ISD and the AEP and MCAP (p < 0.02), and the rates of appropriateness were lower for subsequent days in hospital than for admission days (p < 0.01). The proportion of days judged inappropriate increased with increasing length of stay for about 6 days and then became stable (Fig. 1), for both panel and tools. For 33 patients with a final diagnosis of myocardial infarction the panel and each tool essentially judged all admissions to be appropriate; the asesssments of the appropriateness of subsequent days were virtually identical to those for the whole group. The level of agreement between the tools and the panel, as shown by the kappa values, was low, except for the ISD in its assessment of admission days (Table 2). The agreement between the MCAP and the AEP was good; this result was expected because the MCAP was derived from the AEP. With the panel's judgements taken to be "correct," all tools showed a sensitivity (true inappropriate rate) of 1.00 and a specificity (true appropriate rate) of 0.87 or higher for admission days. For subsequent days the ISD showed a sensitivity of 0.94 and a specificity of 0.57; the corresponding values were 0.33 and 0.87 for the MCAP and 0.29 and 0.92 for the AEP. Before the consensus meeting, the kappa value for agreement between individual physicians and each of the tools was 0.11 on average, and between pairs of physicians 0.32 on average. [Contents] Interpretation In previous Canadian studies413 an average of 39% (range 8%73%) of admissions and 49% (range 25%82%) of hospital days were judged inappropriate by one of the utilization review tools. Although our inappropriateness rates were well within these ranges, comparison is of questionable value because of the patient populations studied. However, one report10 included a subgroup of patients with myocardial infarction, assessed by the ISD: 96% of admissions but only 36% of subsequent days were deemed appropriate. Our results for such patients were very similar (Table 1). The importance of these rates depends on the validity of the utilization review tools, which are assessed by comparing the "decisions" of the tools with the judgement of a panel of experts. We searched the MEDLINE and HEALTHSTAR databases for independent validation studies providing kappa estimates of the agreement between tool and panel judgements for adult patients. Because the criteria have been modified over the years, the search was limited to the past 10 years. All of the reports found were based on the ratings of a primary reviewer. For both the ISD and the AEP and MCAP (considered together) the kappa values for overall agreement with the expert panels varied considerably but on average showed only poor to fair validity (Table 3); specific inappropriate agreement was similar. Initially these tools were designed for managing care and included review by a physician of days rated inappropriate on primary review, with the option of overriding such ratings. Because all of the validation studies were based on the primary review alone, the results support the conclusion of Strumwasser and coworkers16 that the tools used in this way have too low a level of validity to justify their use as the sole basis for deciding the appropriateness of days in hospital. Because the past513 and current use of the tools for utilization review is also based on primary review alone, the estimated rates of inappropriate use of acute care hospitals should be viewed with some degree of scepticism. There are 2 potential sources of bias in the results of our study. First, preparation of the patient abstracts involved selection of information that may have affected the panel's decision. Although we do not believe that any important data were omitted, the problem, common to all retrospective utilization review studies, remains. Second, the time lag before the application of the AEP tool may have introduced a systematic bias in the reviewer; we believe that this bias is negligible because a second reviewer monitored the application of the criteria. Incorporation of a secondary review into the utilization review process might raise the level of agreement with the expert panel and so make the tool more acceptable. However, it was the frequent divergence of clinical opinion among individual physicians that led to the development of the tools: the goal was to replace the subjective use of implicit criteria by a physician reviewer with the objective application of explicit criteria of the tool.33 Reported kappa values (including our results) for agreement between individual physician reviewers are on average 0.29 (standard error 0.06).16,20,22,34 This indicates that the addition of a secondary review by a single physician would reintroduce the source of variability that the tool was intended to eliminate and would add a major source of inconsistency in ratings. In contrast, agreement between panels of physicians3537 is much stronger than between individual physicians, so secondary review of inappropriate days by an expert panel would increase the validity of the tool. However, the tools are relatively inefficient for assessing subsequent days in hospital, because of low sensitivity or low specificity. A focused review process38 by a panel of physicians using a minimal representative patient sample39 may be more efficient. An additional concern is the lack of information about the predictive validity of the tools. The concept of an inappropriate day at an acute care level suggests that the quality of care would not be reduced at a subacute level. The accuracy of this assumption must be established by showing that patients discharged to a lower level of care, on the basis of the tool's criteria, have outcomes at least as good as those of patients "managed" by the attending physician using clinical judgement. There have been no reports of such trials. Although utilization review tools are widely accepted these considerations taken together raise serious questions about the value of the tools as they are currently used and whether they should be used at all.
We thank Hélène Jean-Baptiste for acting as primary reviewer for the ISD tool, for record keeping and for tabulating results. We also thank Jennifer Eastmond for serving as primary reviewer for the MCAP and AEP tools and Frédéric Abergel for computer program development.
This project was supported by a grant from the Ministère de la Santé et des Services Sociaux du Québec.
Competing interests: None declared.
[Contents] From *the Department of Medicine and the Division of Cardiology, Sir Mortimer B. Davis Jewish General Hospital, Montreal, Que., and the Ministère de la Santé et des Services Sociaux du Québec, Quebec City, Que. This article has been peer reviewed. Reprint requests to: Dr. Norman Kalant, Department of Medicine, Sir Mortimer B. Davis Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montreal QC H3T 1E2; fax 514 340-7508; nkalant@mtd.jgh.mcgill.ca References
© 2000 Canadian Medical Association or its licensors |