Public Health Agency of Canada / Agence de santé publique du Canada
Skip all navigation -accesskey z Skip to sidemenu -accesskey x Skip to main menu -accesskey m Skip all navigation -accesskey z
Français Contact Us Help Search Canada Site
PHAC Home Centres Publications Guidelines A-Z Index
Child Health Adult Health Seniors Health Surveillance Health Canada



Volume 20, No.3 - 2000

 [Table of Contents] 

 

Public Health Agency of Canada (PHAC)

Development of Record Linkage of Hospital Discharge Data for the Study of Neonatal Readmission

Shiliang Liu and Shi Wu Wen


Abstract

Computerized record linkage has been used increasingly in epidemiologic studies. We developed a multi-stage, deterministic matching algorithm using various combinations of key variables. Then, from the records for March 1, 1993, to March 31, 1996, contained in the discharge abstract database of the Canadian Institute for Health Information (CIHI), we examined the relation between length of hospital stay at birth and neonatal readmission. A combined use of province/territory of occurrence, 6-digit postal code of residence, date of birth and sex (step 1) matched 88.5% of 26,629 eligible neonatal readmission records with their birth records. Additional use of institution code and chart number or health card number combined with date of birth and sex (step 2 and step 3) increased the matching rate to 93.0%. Compared with the gold standard, step 1 correctly matched 94.4% of the records. We conclude that this deterministic matching algorithm is a feasible and convenient approach to data linkage for the study of neonatal readmission. The linkage strategy may also be helpful in epidemiologic studies of other short-term events.

Key words: epidemiologic method; hospital discharge abstract; medical record linkage; neonatal readmission

 


Introduction

Studies of existing databases are attractive to epidemiologists and other health researchers because they can be done efficiently at the level of large populations. For example, it is possible to examine the relation between birth weight, gestational age, maternal age and infant mortality or morbidity at the country level by analyzing existing data, as the information is routinely recorded in vital and hospital statistics. However, the lack of comprehensive information in a single database often impedes researchers in this effort. In recent years, the development of computerized record linkage has made it possible to overcome such obstacles in existing database studies.1-17

Record linkage methods can be summarized into three broad categories: manual, deterministic and probabilistic. Manual matching is the oldest, most time-consuming and most costly method, but remains the standard. However, it is not a feasible option when large databases are involved. Probabilistic linkage is used to identify and link records from one data set to corresponding records in another data set (or two records from different locations in a single data set) on the basis of a calculated statistical probability for a set of relevant variables (e.g. name, sex, date of birth). Deterministic linkage matches records from two data sets (or two records from different locations in a single data set) using a unique variable (e.g. social insurance number or hospital chart number) or by full agreement of a set of common variables (e.g. name, sex, birth date).

Probabilistic linkage is considered the preferred method, because the calculation of the probability can be refined in various respects to accommodate weights associated with identifier values and coding errors, thus maximizing the available information in the data.1-3,16,17 However, the probabilistic linkage requires detailed prior knowledge about various measures of the relative importance of specific identifier values-for example, frequency-in both files that are to be linked. Investigators often do not have this degree of prior knowledge.6

This paper aims to illustrate the use of deterministic linkage of hospital discharge records in the hospital discharge database of the Canadian Institute for Health Information (CIHI), taking neonatal readmission as an example. One of our previous studies revealed a substantial recent reduction in length of newborn hospital stay at birth.18 We hypothesized that this reduction might increase rates of neonatal readmission. To allow an examination of the relation between length of newborn hospital stay at birth and subsequent neonatal readmission, a linkage of readmission record with the infant's own birth record is required.


Methods

Three years of CIHI data (fiscal years 1993/94 to 1995/96) were used. Data for Nova Scotia, Quebec and Manitoba were excluded because CIHI collected only a small proportion of hospital discharge records in these provinces.19 Live infants were identified by a field of "age unit" with a code of "NB." Infants weighing less than 1500 g, those discharged from hospital after 21 days from birth and those who subsequently died in their hospital of birth were excluded. A neonatal readmission was defined as admission of an infant to any hospital, within 28 days of birth. Infants who were transferred from another institution were not included as readmission cases. Multiple births were excluded from both birth and readmission records because non-identifiable variables were shared among them.

Both birth and readmission records have information on province/territory and institution of occurrence, institution chart number, date of birth, sex, provincial health card number, 6-digit postal code, admission date, discharge date and diagnostic codes. Institution code, institution chart number and provincial health card number are scrambled for confidentiality considerations (Table 1).

 



TABLE 1

Availability of proposed matching variables for record linkage in birth file and in neonatal readmission file

Variable

Birth file

Readmission file

Number of records

788,480

27,405

Province (%) 100.0 100.0
Institute number (%) 100.0 100.0
Chart number (%)a 97.4 98.2
Health card number (%)b 86.8 80.6
Postal code (%)c 97.9 98.0
Residence code (%) 70.1 71.0
Date of birth (%) 100.0 100.0
Sex (%) 100.0 100.0
Admission date (%) 100.0 100.0
Discharge date (%) 100.0 100.0
a Different institutions have different chart number series. Only an infant who is readmitted to the same hospital of birth is assigned the identical chart number.

b It was found that a majority of infants were assigned their mothers' health card number at birth or at readmission.

c About 1% of records showed no information on postal code; another 1% contained incomplete 6-digit postal codes in both files.


   

Theoretically, the health card number and/or institution chart number, although scrambled, can be used as a unique variable for record linkage because the same number is used for each individual once it has been assigned by the provincial/territorial authority or hospital. However, because of delay in obtaining the health card number, infants are usually assigned their mother's number or that field is left blank at birth. We were concerned that using the health card number alone might lead to confusion or error if infants were subsequently given their own number or shared a number with their siblings. The institution chart number is effective only when an infant is readmitted to the hospital where he or she was born; only a small proportion of cases were readmitted to the hospital of their birth, however.

Accordingly, we considered it appropriate to use a set of variables for multi-stage deterministic linkage. Based on our assessment of the availability and appropriateness of the variables on CIHI discharge records, a computer matching algorithm was designed. As described in Figure 1, records of birth and readmission were matched first by full agreement of province/territory of occurrence, 6-digit postal code of residence, date of birth and sex (step 1); second, by full agreement of institution code, institution chart number, sex and date of birth (step 2); and third, by full agreement of provincial/territorial health card number, sex and date of birth (step 3); finally, matching was supplemented by a logic check of the matched cases (step 4). The logic check involved determining whether there were conflicts or contradictions between birth date, discharge date, readmission date and age at readmission.

To evaluate the accuracy of the record linkage carried out in step 1, on which the majority of the successful matches were based, we created a linked file by using step 2 alone to identify the infants who were readmitted to the hospital of birth. We considered this linked file as the gold standard, because the institution chart number is unique in these records. We then separated the linked birth and readmission records, and performed step 1 to link them again in order to assess its matching accuracy as compared with that of the gold standard.

Finally, we assessed the potential bias caused by exclusions and unsuccessful linkage by comparing the distributions of variables of interest, such as birth weight, length of hospital stay and main diagnostic categories for readmission, between the linked and unlinked cases. In this comparison, the unlinked cases included those who were excluded according to the selection criteria before the linkage procedure was performed. SAS software for Unix, version 6.12 (SAS Institute Inc., Cary, North Carolina), was used in all data abstraction and linkage processing.


FIGURE 1

Matching algorithm for record linkage of hospital discharge data

Matching algorithm for record linkage of hospital discharge
			data

 


   

Results

A total of 817,351 live infants were born in hospitals in the nine Canadian provinces and territories studied and were recorded by CIHI during the period of March 1, 1993, to March 31, 1996. After excluding infants who weighed less than 1500 g, who were discharged from hospital after 21 days from birth, who subsequently died in hospital or who were part of multiple births, we found 798,840 live birth records that met the inclusion criteria. During the corresponding period, a total of 27,405 infants in the same nine Canadian provinces and territories were readmitted to hospitals within 28 days of birth. According to the selection criteria, 26,629 of these readmissions were eligible to be linked with birth records.

Step 1 successfully matched 23,571 readmitted infants (after excluding 26 duplicates) to their birth records, accounting for 88.5% of the 26,629 eligible readmission cases. Implementation of steps 2 and 3 increased the successful matches to 24,766 readmission cases, representing 93.0% of eligible readmission cases, after two pairs were excluded by step 4 (logic check). Details of the matching process are given in Figure 1.

Among the 7430 cases in the linked file used as the gold standard, 7023 (94.5%) cases were successfully matched by implementation of step 1 as described in Figure 1. Of these 7023 cases, 2 cases were falsely matched and 7 were duplicates, as a result of their non-identical matching variables. Therefore, the correct matching rate was 94.4% using step 1, i.e. full agreement of province of occurrence, 6-digit postal code of residence, sex and date of birth.

Comparison of linked and unlinked cases showed that they were quite similar in main characteristics and diagnoses of interest (Table 2). However, statistically significant higher proportions of infants of low birth weight (6.4% versus 5.6%) and readmissions with a diagnosis of jaundice (40.9% versus 38.6%) were observed in unlinked cases. There was also an increase in the rate of successful record linkage from fiscal year 1993/94 to 1995/96 (Table 2).

 

 



TABLE 2

Comparison of main characteristics of linked and unlinked
cases in a study of neonatal readmission

Characteristic

Linked cases

Unlinked casesa

p value

Number

24,766

2,639

 
% of fiscal year 1993/94 30.8 35.2 <0.01
% of fiscal year 1994/95 33.4 33.2 NS
% of fiscal year 1995/96 35.8 31.6 <0.01
% of males 57.0 56.3 NS
% of birth weight <2500 g 5.6 6.4 <0.01
Mean age at readmission (days) 10.8 10.7 NS
% of length of stay <2 days at birth 25.6 25.8 NS
% of infants with jaundice 40.9 38.6 <0.05
% of infants with dehydration 5.9 6.1 NS
% of infants with inadequate weight gain 2.8 2.4 NS
% of infants with feeding problems 9.8 10.2 NS
% of infants with sepsis 5.4 5.3 NS
a The number includes the cases that were excluded prior to linkage procedure by subject selection criteria.

   NS = Not significant

 


   

Discussion

Probabilistic matching is a recommended strategy for computerized record linkage. It is considered the preferred method because the calculation of the probability can be refined in various respects to accommodate weights associated with identifier values and coding errors, thus maximizing the available information in the data.1-3,16,17

If there is a common unique identifier (e.g. social insurance number) in both files to be linked, and if the common unique identifier is quite accurately recorded in the data, deterministic linkage can be performed conveniently by using routine statistical software such as SAS. However, such a common unique identifier is often not available. For example, social insurance numbers or other personal identifiers are often issued to adults only, so that they cannot be used in studies involving infants and children. For confidentiality considerations, the data collector is often prohibited from releasing the subject's name. Even if the subject's name can be released to investigators, spelling mistakes in names are frequent.15

Postal code is a well-developed system of Canada Post Corporation. This information is often recorded completely, and the chance of a mistake is relatively low, as the code tends to be shorter and simpler than name and address. In addition, because it does not reveal an individual's identity, it can be fully released to investigators without confidentiality concerns. We performed a frequency procedure on our raw data, and found that the chances of two individuals sharing the same sex, date of birth and 6-digit postal code were very low (data not shown). With combined use of sex, date of birth and other information, this variable can play a key role in identifying the same individual. This procedure (i.e. step 1) accounted for the majority of the linked records (88.5%) in our study of neonatal readmission; as well, it was quite accurate (94.4% as compared with the gold standard). In Canada, a small proportion of births occur outside of hospitals. If we had access to data on out-of-hospital births, the matching rate would be even higher.

As with other linkage methods, the success of deterministic linkage depends largely on the completeness and accuracy of the information in the files to be linked and an appropriate combination of matching variables. In our linkage procedures, failure in matching was largely caused by missing or incomplete information on the variables used, such as postal code. However, as suggested by the increasingly successful matching rates from year 1993/94 to 1995/96 (Table 2), the quality of CIHI hospital discharge data is improving, and this provides promise for future studies using deterministic linkage.

When failure in linkage occurs, it is important to assess its potential impact on the study results. One consequence is that the sample size available for analysis will be reduced. However, because sample size is usually not an issue in existing database studies, the real concern of incomplete record linkage is the potential bias introduced by unsuccessful linkage. Our comparison of linked and unlinked cases showed no substantial differences in main characteristics and diagnostic categories of interest (despite statistically significant differences in low birth weight and jaundice rates), suggesting that no major bias was introduced by this record linkage.

One limitation of record linkage using postal code as a key matching variable should be emphasized. In modern society, people relocate quite frequently. As a result, deterministic record linkage involving postal code may be less reliable in studies of long-term events. In our case, the chance of relocation within 28 days after birth was low, unless the patients gave different addresses at different hospital admissions (e.g. gave parents' address at birth but grandparents' address at readmission). In addition, relying on full agreement of a set of matching variables often restricts some potential matching pairs or reduces the sensitivity.

The deterministic matching algorithm provided a feasible and convenient approach to data linkage for our study of neonatal readmission. Although it was developed for a specific purpose, it may also be used for epidemiologic studies of other short-term events, such as epidemic outbreak, rehospitalization, adverse drug or vaccine reactions and familial aggregation of disease or risk factor. For example, with some modifications of the linkage program, it may be used to study maternal readmission or the relation between maternal characteristics and infant outcomes.


Acknowledgements

This study was carried out under the auspices of the Canadian Perinatal Surveillance System. The authors thank Dr Catherine McCourt for her comments on the manuscript.


References

    1. Newcombe HB, Kennedy JM, Axford SJ. Automatic linkage of vital records. Science 1959;130:954-9.

    2. Howe GR, Lindsay J. A generalized iterative record linkage computer system for use in medical follow-up studies. Comput Biomed Res 1981;14:327-40.

    3. Newcombe HB. Handbook of record linkage: methods for health and statistical studies, administration, and business. Oxford, England: Oxford University Press, 1988.

    4. Miller AB, Howe GR, Sherman GJ. Mortality from breast cancer after irradiation during fluoroscopic examinations in patients being treated for tuberculosis. N Engl J Med 1989;321:1285-9.

    5. Herderson J, Goldacre MJ, Graveney MJ, Simmons HM. Use of medical record linkage to study readmission rates. Br Med J 1989;299:709-13.

    6. Van Den Brandt PA, Schouten LJ, Goldbohm RA, Dorant E, Hunen PMH. Development of a record linkage protocol for use in the Dutch Cancer Registry for epidemiological research. Int J Epidemiol 1990;19:553-8.

    7. Roos LL, Wajda A. Record linkage strategies. Part I: estimating information and evaluating approaches. Meth Inform Med 1991;30:117-23.

    8. Goldberg MS, Carpenter M, Thériault G, Fair M. The accuracy of ascertaining vital status in a historical cohort study of synthetic textiles workers using computerized record linkage to the Canadian Mortality Data Base. Can J Public Health 1993;84:201-4.

    9. Nash JQ, Chandrakumar M, Farrington CP, Williamson S, Miller E. Feasibility study for identifying adverse events attributable to vaccination by record linkage. Epidemiol Infect 1995;114:475-80.

    10. The West of Scotland Coronary Prevention Study Group. Computerised record linkage: compared with traditional patient follow-up methods in clinical trials and illustrated in a prospective epidemiological study. J Clin Epidemiol 1995;48:1441-52.

    11. Jamieson E, Roberts J, Browne G. The feasibility and accuracy of anonymized record linkage to estimate shared clientele among three health and social service agencies. Meth Inform Med 1995;34:371-7.

    12. Howe GR. Lung cancer mortality between 1950 and 1987 after exposure to fractionated moderate-dose-rate ionizing radiation in the Canadian fluoroscopy cohort study and a comparison with lung cancer mortality in the atomic bomb survivors study. Radiat Res 1995;142:295-304.

    13. Howe GR, McLaughlin J. Breast cancer mortality between 1950 and 1987 after exposure to fractionated moderate-dose-rate ionizing radiation in the Canadian fluoroscopy cohort study and a comparison with breast cancer mortality in the atomic bomb survivors study. Radiat Res 1996;145:694-707.

    14. Herrchen B, Gould JB, Nesbitt TS. Vital statistics linked birth/infant death and hospital discharge record linkage for epidemiological studies. Comput Biomed Res 1997;30:290-305.

    15. Adams MM, Wilson HG, Casto DL, Berg CJ, McDermott JM, Gaudino JA, McCarthy BJ. Constructing reproductive histories by linking vital records. Am J Epidemiol 1997;145:339-48.

    16. Waien SA. Linking large administrative databases: a method for conducting emergencey medical services cohort studies using existing data. Acad Emerg Med 1997;4:1087-95.

    17. Howe GR. Use of computerized record linkage in cohort studies. Epidemiol Rev 1998;20:112-21.

    18. Wen SW, Liu S, Fowler D. Trends and variations in neonatal length of in-hospital stay in Canada. Can J Public Health 1998;89:115-9.

    19. Wen SW, Liu S, Marcoux S, Fowler D. Uses and limitations of routine hospital admission/separation records for perinatal surveillance. Chronic Dis Can 1997;18(3):113-9.

 


Author References

Shiliang Liu and Shi Wu Wen, Bureau of Reproductive and Child Health, Laboratory Centre for Disease Control, Health Canada, Tunney's Pasture, Address Locator: 0601E2, Ottawa, Ontario  K1A 0L2

[Previous][Table of Contents] [Next]

Last Updated: 2002-10-20 Top