Canadian Medical Association Journal Home |
Validity of utilization review tools
In response to: Y. Robens-Paradise et al; C.J. Wright, K. Cardiff; A. Mariotto; P. Dodek et al; D. Zitner et al We based our conclusions on our results together with those of previously published studies (Table 3) of general medical, surgical and psychiatric patients[Research].1 Our sample size was 75 for admissions and 461 for subsequent days (because each day is rated independently); the average kappa scores were based on much larger samples (e.g., 759 admissions and 3142 days of hospitalization for the ISD). We therefore believe that the conclusions are well grounded. With any type of project, techniques that are current at the outset may become nominally outdated before termination. Since the ISD and the MCAP are proprietary, we are unable to determine if there have been substantive changes in the tool criteria, but a time series of published kappa values does not show an increase in validity over the past decade. The comment about the age of the tools implies that current versions have higher validity than earlier ones; is there evidence of this? We did not use subacute care criteria because we were focusing on acute care; as noted, we stopped case evaluation when the patient was moved to a different level of care. We omitted the secondary review for reasons already given, including a theoretical concern that it would probably decrease validity. Yoel Robens-Paradise and colleagues assert that a secondary review improves tool validity; what is their evidence for that? The validity of an expert panel cannot be assessed because there is nothing accepted as more accurate to which it can be compared. Charles Wright and Karen Cardiff's comments on utilization management are undoubtedly correct, but they are irrelevant to an assessment of tools for utilization review. These writers state that we have not recognized the value of these tools for system planning; we would put it differently that they do not recognize that if the tools fail to accomplish what they have been designed for, then they are not valuable. If, for example, a tool misidentifies a significant number of days as inappropriate, a reviewer searching for the reasons for the inappropriate stay (when in fact there are none) may be led to form erroneous conclusions about the relative importance of the various reasons for such days and then to make inappropriate changes in the system. Nobody would trust a new laboratory test with as low a level of accuracy as that exhibited by these tools; why have so many hospitals accepted them without first validating them? We believe that the onus is on those who choose to use them to show that they do what they are supposed to do. Aldo Mariotto claims that the AEP measures operating efficiency (undefined), not appropriateness. However, one of the developers of the AEP2 stated clearly that its purpose is to assess appropriateness of hospitalization; furthermore Coast3 commented on its inability to measure efficiency. We agree that the quality of the clinical record is a critical factor in applying these tools, whether concurrently or retrospectively; any deficiencies were constant for the tools and the panel in this study. As to structuring the panel review, the panel's task was to arrive at a clinical judgement, normally an idiosyncratic process; to structure the review process would have defeated its purpose. Peter Dodek and colleagues argue that a major source of disagreement between a review tool (specifically the ISD) and the panelists is the fact that Canadian hospitals generally do not have separate subacute care units; the panel may therefore consider a day in acute care as appropriate because there is no alternative. This was not of concern in our study because we have a subacute cardiac unit adjacent to the coronary care unit. The original manuscript stated that "the panel was also asked to recommend, for those days not requiring care at an acute level, a more suitable level of care; for this purpose, it was assumed that all levels of care were available." Unfortunately, this sentence was deleted to meet space limitations. Dodek and colleagues criticize us for claiming to be the first to carry out a validation study of these tools; indeed all the data in Table 3 are taken from such validation studies, including the 2 to which they refer (see our references 16 and 20).1 We did, however, point out that the ISD and the MCAP had not previously been validated in Canadian studies. Dodek and colleagues raise an interesting question concerning the application of the kappa statistic to the pool of all days in hospital (other than the admission day): they suggest that the days of a given patient are not fully independent of each other (e.g., if day a is inappropriate, then it is possible that day a + 1 will be inappropriate), and that this "may amplify disagreement as measured by .... kappa." However, it seems equally likely that nonindependence may amplify agreement and lead to a false elevation of kappa. To settle this question, we have calculated kappa scores separately for each of the hospital days 26; in this way, each kappa value is based on only 1 observation per patient. The values of kappa are as follows: day 2, 0.40 (n = 72); day 3, 0.123 (n = 64); day 4, 0.42 (n = 59); day 5, 0.287 (n = 55); day 6, 0.181 (n = 48). Thus, removal of any hypothetical dependence effects does not raise the kappa as predicted by Dodek and colleagues; we conclude that nonindependence of consecutive observations is not responsible for the low kappa values found in validation studies. David Zitner and colleagues describe a hypothetical utilization review process that approaches the ideal but might be very time consuming. We agree with Tu [Commentary]4 that more research is needed to develop a useful utilization review tool. Acknowledgements: We are grateful to Dr. Ian Shrier and Aude Dufresne for valuable discussions of the statistical analyses.
Norman Kalant References
Copyright 2000 Canadian Medical Association or its licensors |