Detection and classification of inappropriate
hospital stay

John S. Butler, MSc
Brendan J. Barrett, MB
Gloria Kent, RN
Jackie McDonald, RN
Rosalie Haire, RN
Patrick S. Parfrey, MD

Clin Invest Med 1996; 19 (4): 251-258


Dr. Barrett, Dr. Parfrey and Ms. Haire are with the General Hospital Corporation, and Dr. Barrett, Dr. Parfrey, Ms. Kent, Ms. McDonald and Mr. Butler are with the Clinical Epidemiology Unit, Faculty of Medicine, Memorial University of Newfoundland, St. John's, Nfld.

This study was funded by a grant from the General Hospital Corporation.

(Original manuscript submitted Aug. 10, 1995; received in revised form Jan. 1, 1996; accepted Mar. 13, 1996)

Copyright 1996, Canadian Medical Association


Contents


Abstract

Objective: To study the reliability and validity of concurrent review of hospital-bed utilization carried out by a trained nurse.

Design: Analysis of interrater reliability and validity of utilization review.

Setting: Tertiary care hospital associated with a university.

Patients: Eighty patients randomly selected from 203 patients admitted to the hospital.

Interventions: Appropriateness of days of stay in hospital was classified prospectively, on the basis of clinical judgement, by two nurses working independently, by a third nurse working with the Appropriateness Evaluation Protocol (AEP) and by a multidisciplinary review panel of nurses and physicians working retrospectively with the use of data gathered by the first nurse.

Main outcome measures: Agreement between different raters on the number of and reason for inappropriate admission days, total number of inappropriate days and of inappropriate days due to delayed discharge, to diagnostic procedures or to inefficient medical management.

Results: Agreement between the two nurses who used clinical judgement was substantial (kappa or the intraclass correlation coefficient [RI] 0.77 to 0.98) on the number of and reason for inappropriate admission days, on the total number of inappropriate hospital days and on days due to delayed discharge, diagnostic procedures or inefficient medical management. Agreement was moderate (RI 0.47) on the number of inappropriate days' stay awaiting surgery. Agreement was substantial (kappa or RI 0.69 to 0.94) between the two nurses who used clinical judgement and the panel, except on the total number of inappropriate days; however, for this variable, exclusion of one case increased the RI from 0.35 to 0.80. Agreement was substantial between the two nurses who used clinical judgement and the nurse who used the AEP on appropriateness of admission days and the number of inappropriate days. Agreement between the panel and the nurse who used the AEP on the number of inappropriate days rose from 0.36 to 0.88 when the one outlying case was excluded. Some admissions were classified as premature when the AEP was used, whereas other raters considered the admissions unnecessary. There was poor agreement between the nurse who used the AEP and the other raters on the number of inappropriate days' stay awaiting surgery or diagnostic tests.

Conclusions: Data collection and judgement of appropriateness of hospital stay by a trained nurse is feasible and reliable. A nurse working prospectively and a panel working retrospectively sometimes disagree. The AEP provides a similar estimate of the number of inappropriate days but may be insensitive to patient factors that influence the timing of admission.

[Table of Contents]

Résumé

Objectif : Étudier la fiabilité et la validité de l'évaluation concomitante, par une infirmière spécialement formée, de l'utilisation de lits d'hôpitaux.

Conception : Analyse de la fiabilité et de la validité inter-observateur dans l'évaluation de l'utilisation des ressources.

Contexte : Hôpital universitaire de soins tertiaires.

Sujets : Quatre-vingts patients choisis au hasard parmi 203 sujets hospitalisés.

Interventions : L'à-propos de la durée du séjour hospitalier a été évalué prospectivement d'après le jugement clinique par deux infirmières travaillant de façon indépendante, de même que par une troisième infirmière utilisant le protocole d'évaluation de l'à-propos (PEA), et par un groupe multidisciplinaire d'infirmières et de médecins qui ont analysé rétrospectivement les données recueillies par la première infirmière.

Variables mesurées : Accord entre les observateurs quant aux variables suivantes : nombre et justification de jours non appropriés d'hospitalisation, nombre total de jours non appropriés et de jours non appropriés attribuables au retard du congé, aux épreuves diagnostiques ou à la gestion médicale inefficiente.

Résultats : Un accord substantiel a été observé entre les deux infirmières utilisant 1e jugement clinique pour évaluer le nombre et la justification de jours non appropriés d'hospitalisation, le nombre total de jours non appropriés et de jours non appropriés attribuables au retard du congé, aux épreuves diagnostiques ou à la gestion médicale inefficiente (kappa ou coefficient de corrélation intra-classe [RI] 0,77 à 0,98). Un accord modéré (RI 0,47) a été noté quant au temps d'attente non approprié pour une chirurgie. Un accord substantiel (kappa ou RI 0,69 à 0,94) a été noté entre les deux infirmières utilisant le jugement clinique et le groupe multidisciplinaire, à l'exception de l'évaluation du nombre total de jours non appropriés; pour cette dernière variable, cependant, l'exclusion d'un cas fait passer RI de 0,35 à 0,80. Un accord substantiel a été noté entre les deux infirmières utilisant le jugement clinique et l'infirmière utilisant le PEA quant à l'à-propos de la durée du séjour et du nombre de jours non appropriés. Pour ce qui est du nombre de jours non appropriés, l'exclusion du cas excentrique fait cependant passer l'accord entre le groupe multidisciplinaire et l'infirmière utilisant le PEA de 0,36 à 0,88. Certaines admissions étaient prématurées d'après le PEA, alors qu'elles étaient non justifiées d'après d'autres évaluateurs. I1 y avait désaccord entre l'infirmière utilisant le PEA et les autres évaluateurs quant au nombre de jours non appropriés en attente de chirurgie ou d'épreuves diagnostiques.

Conclusions : L'extraction de données et l'évaluation de l'à-propos de la durée du séjour hospitalier par une infirmière spécialement formée sont réalisables et fiables. Des désaccords peuvent survenir entre l'évaluation prospective par l'infirmière et l'évaluation rétrospective par un groupe multidisciplinaire. Le PEA fournit un estimé similaire du nombre de jours no appropriés mais semble un outil peu sensible aux facteurs chez le patient influençant le moment de l'admission.

[Table of Contents]


Introduction

Review of bed utilization is used to assess the efficiency of the use of hospital beds in order to identify areas of potential cost savings and to continue to meet patient needs.

Three of us (B.J.B., J.M. and P.S.P.) previously studied bed utilization at the General Hospital in St. John's, Nfld.[1,2] In these studies, appropriateness of hospital stay was assessed by a multidisciplinary panel of social workers, nurses and physicians. This panel review process was burdensome and costly; it also led to delays in obtaining results. In addition, the subjective decision making of the panel could allow biased decisions. An objective instrument, the Appropriateness Evaluation Protocol (AEP), was developed elsewhere to permit criteria-based identification and classification of inappropriate bed utilization.[3] This instrument avoids some of the subjectivity of the methods employed in our earlier studies. However, it was not clear how applicable or flexible it would be in evaluating appropriateness in our patient population.

In the context of a subsequent bed-utilization review, we studied the reliability and validity of other methods to identify and classify inappropriate
hospital-bed utilization. The specific objectives of this study were to (1) examine the feasibility of training a nurse to gather prospectively the information needed to determine the appropriateness of bed utilization from medical records and charge nurses, and (2) to test the reliability and validity of the data collection and decision making of the trained nurse through comparison with those of a second nurse experienced in bed utilization review who used the same methods, a panel that independently reviewed the data collected and a third nurse who used a modified version of the AEP.[3]

[Table of Contents]

Methods

Patients and data collection

This study was conducted as a component of a bed-utilization review involving 203 patients admitted for services provided by the departments of general surgery, urology, plastic surgery/burns, ophthalmology, hematology, radiotherapy, gynecology/oncology and infectious disease of the General Hospital, a tertiary care institution associated with Memorial University of Newfoundland, in St. John's, Nfld. All admissions to general surgery and urology, which have a large number of admissions, were studied for 21Ž2 weeks, and all admissions to other services, which have a smaller number of admissions, were studied for 5 weeks.

A nurse (nurse A) with considerable clinical experience was given some training in utilization review. She followed all 203 patients, except for 4 who were transferred to a service not included in the study, daily until discharge. To test the reliability of the data-gathering process, 80 of the cases were randomly selected to be representative of the population of patients during the study period. These patients were also prospectively followed by a second nurse (nurse B), who used the same methods as nurse A. These 80 cases were further prospectively studied by a third nurse (nurse C), who used a modified version of the AEP to identify and classify inappropriate admissions and hospital days.3 Finally, a panel of nurses and physicians retrospectively evaluated the same 80 patients on the basis of the data collected by nurse A.

Nurses A and B used a standardized form for recording demographic data and a daily profile of the patients' medical conditions, medical and nursing interventions and plan of care from admission to discharge. Unit charge nurses were consulted about the patients' conditions and plan of care.

The study was approved by the Human Investigations Committee at Memorial University of Newfoundland.

Identification and classification of inappropriate days

Nurses A and B independently identified and classified inappropriate hospital days with the use of a standardized classification system (Appendix 1). With the use of the same classification system, a panel of three physicians and three charge nurses from the services included in the study also identified and classified inappropriate days from the data gathered by nurse A. In making its decisions, the panel followed some agreed guidelines: elective surgical cases not needing an extensive preoperative work-up should go through a preadmission clinic; consultations, tests and operations should be done within 24 hours of request; patients should be discharged as soon as their medical and social circumstances permit; early discharge planning should be undertaken for identifiably difficult cases; a thorough admission assessment should identify all problems and result in a coordinated medical care plan. The panel was blind to the identity of the patients and the decisions of the nurses.

Nurse C performed the same task independently with the use of a modified version of the AEP. The modifications allowed the use of a preadmission clinic, when suitable, and eliminated a criterion that allowed a postoperative day for several invasive diagnostic procedures, such as angiography and myelography. An additional new criterion required discharge with home care when only dressing changes or intramuscular or subcutaneous injections were needed.

Analysis

We analysed the degree of agreement between nurses A and B to test the interobserver reliability of the judgement-based, rather than criteria-driven, method of assessing inappropriateness of hospital stay. The degree of agreement between nurse A and the panel was studied to see whether we could avoid panel review in future studies. The strength of agreement between nurses A and C provided a validity check on the data-collection ability and judgement of nurse A. The agreement between any of nurse A, nurse B or the panel with the AEP provided some construct validation of our partially subjective judgement methods. Decisions about appropriateness cannot be tested for criterion validity because there is no gold standard criterion method for judging appropriateness.

Agreement between raters on the following variables was assessed: (1) appropriateness of admission, (2) reasons for inappropriate admission days, (3) the total number of inappropriate days and (4) the number of days that were inappropriate owing to any given reason. The kappa statistic was used to assess agreement on nominal data such as appropriateness of admission days and the reason for inappropriate admission days.[4,5] The intraclass correlation coefficient (RI) was used to assess agreement on continuous data, including the total number of inappropriate days and the number of inappropriate days due to a specific reason (e.g., delayed surgery).[4,5]

Baseline characteristics of groups were compared with the use of Student's t-test and chi-sqared test, with Yates' correction for continuity, if applicable.

[Table of Contents]

Results

There were no significant differences at baseline between the 203 patients followed during the bed-
utilization review and the 80 patients included in the sample (Table 1). Table 2 shows the characteristics of and inappropriate bed utilization by the 203 patients, as determined by nurse A.

Table 3 shows the pattern of agreement between raters for appropriateness of admission days and the reason for inappropriate admission (for days that raters agreed were inappropriate). As reflected by the almost perfect agreement between nurses A and B, the interrater reliability of the nurses' data collection and judgement was very high. The validity of nurse A's assessment is supported by the substantial agreement between nurse A and the panel and between nurse A and nurse C, who used the AEP. There was less agreement between the panel and nurse C on the appropriateness of admission days; 86% of the disagreements involved the panel declaring the admission day appropriate whereas the AEP declared it inappropriate. The panel was influenced by patient status and the distance the patient had travelled to reach the hospital, leading it to judge more preoperative admission days to be appropriate. There was also less agreement between all raters and nurse C on the reasons for inappropriate admission days in cases in which all of the raters agreed that the day was inappropriate. This lack of agreement was mainly due to cases classified as premature admissions according to the AEP, whereas, according to most other raters, the scheduled procedure could have been done in an ambulatory setting.

There was substantial disagreement between the panel and all other raters about one case (case 7). Following a lengthy debate, the panel, which was aware of the ultimately fatal outcome for the patient, decided that all 27 days of care were appropriate. Nurses A, B and C, who made decisions prospectively, each classified all of the stay as inappropriate on the basis that care could have been provided outside of an acute care institution. Because this case had a major effect on the results, we analysed agreement between raters excluding and including case 7 (Table 4). When case 7 was excluded, there was excellent agreement between all pairs of raters. When case 7 was included, the high level of agreement persisted in all comparisons not involving the panel.

Decisions of nurses A, B and the panel were compared with regard to the number of inappropriate days due to specific reasons (Table 5). This analysis included case 7, since all raters agreed that there were no inappropriate days due to these reasons in this case. There was substantial to excellent agreement between raters on the number of inappropriate days of stay while awaiting diagnostic tests or discharge or due to inefficient medical management. However, there was only moderate agreement on the number of inappropriate days of stay while awaiting surgery, and the lack of agreement on this variable was not due to any single case.

Inappropriate days are classified differently by the AEP than by other raters, except for days of stay while awaiting surgery or diagnostic tests. There was very little agreement, beyond the amount due to chance, between the nurse who used the AEP and all of the other raters on the number of days spent while awaiting diagnostic tests. All raters other than the nurse who used the AEP felt that case 29 involved 7 or 8 inappropriate days of stay awaiting a diagnostic test that could have been performed as an outpatient service; however, the AEP did not detect any such inappropriate days in this case. If this case were ignored, the RI between all of the raters and the nurse who used the AEP for days of stay awaiting diagnostic tests would rise to about 0.25, reflecting modest agreement. A similar level of agreement was observed between each rater and the nurse who used the AEP on the number of days of stay awaiting surgery.

[Table of Contents]

Discussion

This study indicates the reliability of having a nurse collect data and judge the appropriateness of bed utilization on the basis of partially implicit criteria. There was excellent agreement between nurses A and B on almost all decisions, except on the number of inappropriate days of stay awaiting su