Clin Invest Med - August 1997 Vol. 20, No 4 / Readying the SF-36 for use in Canada

Readying a US measure of health status, the SF-36, for use in Canada
Sharon Wood Dauphinee, PhD, PT
Louise Gauthier, MSc, OT
Barbara Gandek, MSc
Lise Magnan, MSc, PT
Uriel Pierre, MSc, PT
Clin Invest Med 1997;20(4):224-38.
[résumé]
Dr. Wood Dauphinee is Professor and Director of the School of Physical and Occupational Therapy and Associate Dean of Rehabilitation Science, and Ms. Gauthier is former Associate Professor, School of Physical and Occupational Therapy, Faculty of Medicine, McGill University, Montreal, Que.; Ms. Gandek is Project Director, International Quality of Life Assessment Project, at the Health Institute, New England Medical Center, Boston, Mass.; Ms. Magnan was with the School of Physical and Occupational Therapy, Faculty of Medicine, McGill University, Montreal, Que., and now works as a physiotherapist in the Rehabilitation Therapy Department, Whakatane Hospital, Whakatane, New Zealand; and Mr. Pierre is a physiotherapist with Vigi Santé, a company that administers private nursing homes, Montreal, Que.
(Original manuscript submitted Aug. 8, 1996; received in revised form Mar. 26, 1997; accepted Apr. 22, 1997)
Reprint requests to: Dr. Sharon Wood Dauphinee, School of Physical and Occupational Therapy, McGill University, 3654 Drummond St., Montreal QC H3G 1Y5; fax 514 398-6360

Contents

Abstract
Résumé
Introduction
Methods
Results
Discussion
Acknowledgements
References

Abstract

Objective: To culturally adapt and translate for use in French- and English-speaking areas of Canada the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36), a carefully developed and standardized measure that is useful for assessing the outcomes of care.
Design: For the Canadian French version, the methods involved forward and backward translations, quality ratings of the translated product, a scaling exercise and pilot tests. A process of cultural adaptation, along with the scaling exercise and pilot tests, was used to create a form in Canadian English.
Results: The authors produced acceptable versions of the SF-36 in Canadian French and English.
Conclusions: Although further psychometric testing of the Canadian versions of the SF-36 is desirable, they are now available for use in clinical practice and research in Canada.
Résumé
Objectif : Adapter et traduire, pour l'utiliser dans les régions francophones et anglophones du Canada, le Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36), moyen de mesure mis au point avec soin et normalisé qui aide à évaluer les résultats des soins.
Conception : Pour la version en français du Canada, on a procédé à une traduction et à une retraduction, à une évaluation de la qualité du rendu, à un exercice d'échelonnage et à des essais pilotes. Pour créer un questionnaire en anglais du Canada, on a utilisé un processus d'adaptation culturelle, de même que l'exercice d'échelonnage et des essais pilotes.
Résultats : Les auteurs ont produit des versions acceptables du SF-36 en français et en anglais du Canada.
Conclusion : Même s'il est souhaitable de soumettre la version canadienne du SF-36 à d'autres tests psychométriques, ces documents sont maintenant disponibles pour la pratique clinique et la recherche au Canada.
[ Top of document ]

Introduction
Health care researchers and providers are increasingly accepting the need to evaluate treatments in terms of outcomes that reflect patients' values about their health.¹⁴ There are a variety of scales and indices specifically designed to rate a patient's perception of his or her well-being, state of health or quality of life.^4,5
One such measure is the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36).⁶ This questionnaire has a 20-year history of conceptual development, testing and refinement.^7,8 During the past 5 years it has been translated and culturally adapted with the use of a standard protocol prepared by the International Quality of Life Assessment (IQOLA) group for use in countries around the world.^9,10
The SF-36 assesses 8 health concepts: limitations in (1) physical functioning or (2) social activities due to health problems, limitations in usual role activities due to (3) physical health problems ("role physical") or (4) emotional problems ("role emotional"), (5) pain, (6) vitality, (7) mental health and (8) perception of health in general. Each concept is assessed on a multi-item scale. There is also 1 item that assesses change in health since the previous year. Respondents select standardized response choices on either a dichotomous (yes/no) or an ordinal (i.e., "excellent," "very good," "good," "fair," "poor") scale. Both "standard" and "acute" forms of the survey are available. The "standard" form asks subjects to reflect over the past 4 weeks in selecting their responses, whereas the time frame for the "acute" form is 1 week. The questionnaire is normally completed by the respondent (14 years and older) in 5 to 10 minutes, but it may be administered by an interviewer, in person or over the telephone.¹¹ The questions and responses contained in the US version are given in Appendix 1. Eight scale scores are created with the use of summary techniques;¹² each ranges from 0 to 100, with higher numbers indicating better health.¹³ Physical and mental summary measures have also been constructed from the 8 multi-item scales.¹⁴ These summary measures are based on factor analyses of correlations of the SF-36 scales, drawn from data from the general US population as well as from specific patient populations. Users' manuals are available.^13,15,16
The psychometric properties of the SF-36 have been extensively evaluated. During the development of the survey, items were carefully selected from other widely used surveys to reflect the multidimensional nature of health and the range of health states.⁶ This process supported content validity. The survey's criterion validity has been established through comparisons with other well-known measures,¹⁷²⁰ and its construct validity has been extensively tested and supported in a variety of general populations and patient samples.^18,2124 Reliability studies have demonstrated good internal consistency, with Cronbach's alpha being generally greater than 0.8 for all scales except social functioning, for which it is sometimes lower, possibly because of the small number of items in that scale.^{13,18,23,2528} Testretest reliability evaluations have also found that the scores are generally reproducible,^18,29 although some of the subjective scale scores are difficult to replicate.²⁶ Finally, responsiveness to change in clinical status has been examined, and the SF-36 appears to be sensitive in this regard when compared with other generic measures of health.^22,2931 The measure has been used in general health surveys in several countries^13,32,33 to create age- and sex-specific norms. It has also been employed in patient-based studies of a variety of medical, surgical and psychiatric conditions.
This article describes the process used in translating, modifying and adapting the US version of the SF-36 for use in both French- and English-speaking areas of Canada.
[ Top of document ]
Methods
The methods employed for translation and cultural adaptation of the SF-36 for use in French Canada followed the process set out in the IQOLA Project protocol.¹⁰ The translation procedure algorithm is presented in Fig. 1, and the steps are detailed below. A much abbreviated version of the protocol was used to prepare the original US form for use among English-speaking Canadians.

Translation process for the Canadian French version

The items and response choices in the US version of the SF-36 were translated by 2 independent translators native to Quebec. These individuals were selected on the basis of professional qualifications, experience in the health field and the fact that they had never studied in France. The latter criterion was applied because of known differences in the French language spoken in Quebec and in France. Because conceptual rather than literal equivalence was emphasized, the choice of specific words and phrases was to reflect the French language spoken in Canada. Each translator was also asked to rate the difficulty of the translation process on a prepared form with rating scales from 0 (not at all difficult) to 100 (extremely difficult).
When the translations were ready, the 2 translators met with 2 native francophone health care professionals (L.G. and U.P.) and the scientific leader for Canada (S.W.D.) to obtain consensus on a forward translation of the instructions, items and response choices, as well as to determine the difficulty in reaching consensus. The francophone health care professionals ensured that the health terms were used appropriately. Each individual rated the difficulty of achieving consensus on a scale from 0 (not at all difficult) to 100 (extremely difficult).
Two additional native francophone professional translators were recruited to rate the quality of the forward translation. These translators had trained as health care professionals, but neither had prior knowledge of the SF-36. The ratings of quality were made according to defined criteria: (1) clarity of the translation (ease of reading and understanding), (2) use of common language (i.e., use of colloquial language and avoidance of technical terms), and (3) conceptual equivalence (whether the translation captured the original meaning). Feedback from the quality ratings was used as the basis for a second consensus meeting. The same individuals participated. Each item or response that did not have a perfect rating was examined and discussed until a satisfactory version was found.
The reconciled Canadian French version was sent to 2 Canadian English translators. These individuals translated it into English independently, creating 2 backward translations. These translations were sent to the US project team at the Health Institute in Boston, who rated their equivalence to the original US version. The following criteria were judged: (1) conceptual equivalence (the degree to which the concept and intent of the original version was captured) and (2) equivalence in measuring severity or frequency of the concept. Any items or responses not found to be equivalent were re-entered into the adaptation process until a suitable translation was found.
The forward-translated Canadian French version was pretested on native francophone Canadians who had no university education. These participants completed the questionnaire in terms of their own health, after being assured that we were not interested in the state of their health but in how successfully we had translated the questionnaire. Each participant was debriefed on each item and its responses by trained interviewers who were native francophone Canadians. Responses and queries were addressed by the study group and the translators.
In addition to this formal process, the Canadian French SF-36 was given to other investigators for use in their studies. Each investigator was requested to call if problems were found or if he or she had suggestions. When this happened, a telephone conference call was held with the independent investigator and the original consensus group. For minor issues, decisions were made by a francophone health care professional and the head translator, in collaboration with the scientific leader.

Psychometric testing of the Canadian French version
The forward-translated Canadian French version was also given to 50 francophone outpatients attending the Royal Victoria Hospital in Montreal. The purposes of this pilot test were to examine the feasibility of administering the questionnaire, receive feedback on the translation and obtain preliminary information about the psychometric performance of the Canadian French SF-36. We included outpatients who met the following criteria: (1) they were outpatients attending the Royal Victoria Hospital, (2) they were 14 years of age or older, (3) French was their mother tongue, (4) they were living in the community, (5) they were able to read French, and (6) they agreed to participate. Consenting participants were asked to complete the Canadian French version under the supervision of a trained interviewer and to answer standard questions about the form. We used a modification of the debriefing questionnaire of the European Organization for Research and Treatment of Cancer.³⁴ Minor adjustments were made on the basis of these patients' comments.
The final version was subsequently used in a cross-sectional study of seniors living in Montreal's inner core. Subjects were identified with the use of a random telephone selection method. Subjects 65 years of age or older, with no cognitive deficits, who spoke French and were able and willing to complete a 25-minute interview were administered the Canadian French version of the SF-36 by a trained interviewer. Sociodemographic and behavioural data were also collected. These 2 samples (Montreal seniors and outpatients at the Royal Victoria Hospital) were combined to examine the scaling assumptions and psychometric properties of the Canadian French version.
Adaptation process for the Canadian English version
A consensus panel comprised of native English-speaking Canadians from different provinces was established. After having been given an explanation of the exercise, each member read the US version of the SF-36 and made notes about the items and the response choices in terms of their appropriateness for use in Canada. Specifically, the reading level of the language, the choice and use of idiomatic expressions, and the meaning of the item or response were considered. The group then convened and discussed the questionnaire in general and its use of language for each item and response choice.
Psychometric testing of the Canadian English version
The Canadian English version was also administered to English-speaking outpatients at the Royal Victoria Hospital. The process mirrored that used to administer the survey to French-speaking patients. In addition, elderly subjects participating in the cross-sectional telephone survey who spoke English were administered the Canadian English version. These 2 samples were used to examine the scaling assumptions and psychometric properties of the Canadian English version.
Evaluation of responses
We conducted a scaling exercise with the use of the US English and the Canadian French response choices.³⁵ The objective was to determine whether the rank ordering of responses and the intervals between responses for both Canadian French and English were similar to those found in data on the US survey. Students in physical and occupational therapy were recruited. Consenting students were informed that the purpose of the exercise was to measure the meaning of certain expressions in French and in English. They were asked to complete either the French or English form, according to their mother tongue. Students with another mother tongue (i.e., Italian or Chinese) selected the language with which they felt most comfortable. The forms were made up of series of 100-mm visual analogue scales. Fig. 2 provides an example of a visual analogue scale and a response of "good," chosen from the possible reponses "poor," "fair," "good," "very good" and "excellent." Response choices from a given response set were presented in random order on a single page. Seven response sets with a total of 23 response choices were presented. Extreme descriptors (i.e., poor and excellent) served as anchors. Students were asked to place the response on the line according to the degree of intensity or frequency expressed by the adjective being considered.
Statistical analyses
Difficulty, quality and equivalence ratings (French only): An item-by-item evaluation was undertaken to identify items for which the Canadian French translation was problematic. Mean difficulty and quality ratings per item and per response were calculated. In addition, to determine the difficulty in reaching consensus, we calculated the mean ratings among the 4 francophone members of the consensus group.
Response choices (English and French): Both ordinal and interval characteristics of the response scales are prerequisites for scaling the SF-36. Thus, after distances on the visual analogue scales were measured and respondents who probably misunderstood the test (as indicated by a disordinal response pattern or the nature of the response) were identified, analyses were conducted at the Health Institute in Boston. To determine ordinality, the means of the response choices were calculated and tested for compliance with the ordinality criterion. This was determined by dividing the number of times an ordinal pair of responses was observed by the total number of possible ordinal pairs for that set. The average percentage of discordant response pairs in the French and English versions was then calculated for each response set. In addition, the means of each response within a response set were examined to determine whether they were roughly interval. Comparisons among Canadian French, Canadian English and US English responses were made.
Preliminary psychometric testing (English and French): Psychometric tests of scaling properties of the French and English SF-36 data sets were conducted at the Health Institute with the use of the Multitrait Analysis ProgramRevised (MAPR).^25,36 This program provides information needed to evaluate whether assumptions underlying the SF-36 scales are met. SF-36 scales are constructed with the use of the method of summated ratings. This assumes that items within a scale can be aggregated without weighting or standardization of items.¹² Items should have roughly equal relationships to their underlying health concept (i.e., similar item-scale correlations) to allow for item aggregation without weighting. Items should also have roughly equal means and standard deviations to allow for item aggregation without standardization. In addition, items should have substantial linear correlations (r greater than or equal to 0.40, corrected for overlap)³⁷ with their hypothesized scales, to substantiate their aggregation into a scale (test of item internal consistency). Each item should also have higher correlations with the scale that it is hypothesized to represent than with other scales; for example, a physical functioning item should have a higher correlation with the physical functioning scale than with the mental health scale (test of item discriminant validity). Because the standard error of a correlation coefficient one over the square root of n is determined by sample size, and the sample size is relatively small for this analysis, we report the number of correlations between items and hypothesized scales that were higher than correlations with other scales, both with and without regard to statistical significance. Finally, scales should demonstrate internal reliability; this was measured with Cronbach's alpha.³⁸ A minimum reliability of alpha = 0.70 has been recommended for scales used in group-level analyses.³⁹
[ Top of document ]
Results

Difficulty of translation (Canadian French)
Unfortunately, only 1 of the 2 translators had completed the form concerning the difficulty experienced in translating the survey when they arrived for the first consensus meeting. Neither having the translator complete the form while the group waited nor completion following the consensus meeting (during which each item and response choice were discussed) was acceptable. Based on scores from 0 (not at all difficult) to 100 (extremely difficult), the difficulty ratings made by 1 translator ranged from 0 to 60. Twelve of the 41 item ratings and 7 of the 47 response choices were scored higher than 10. The "stems" (introductory statements) to questions 4, 5, 9 and 11 were noted to be particularly difficult and received ratings of 30, 30, 60 and 60, respectively. Response choices containing the words "limited," "moderately" and "mostly" were also perceived as difficult to translate.
In general terms, these findings were mirrored by the difficulties experienced by the first consensus group in trying to reach agreement on the French wording of the items and response choices. Mean ratings of difficulty ranged from 0 to 60 for the items and from 0 to 27 for the response choices.
Quality of the Canadian French translation
The quality of translation of the items and responses was rated on a scale from 0 (not at all perfect) to 100 (perfect). In terms of clarity, scores ranged from 78 to 100, with a mean of 95.1; "common language" scores ranged from 70 to 100, with a mean of 98.2, and "conceptual equivalence" scores ranged from 45 to 100, with a mean of 92.1. For the quality of the response choices, the means of one item related to "clarity" and 2 items related to "common language" fell below 90. "Conceptual equivalence" was rated above 90 for every response choice. When a perfect score was not awarded, the raters sometimes wrote comments: i.e., "activités énergétiques are not 'vigorous activities'"; "soulever is better than lever." The comments were discussed and necessary modifications were made at the second consensus meeting.
Comments from francophone patients and from the francophone subjects with low educational levels who tested the form were infrequent but were incorporated if appropriate.
Comparability of the backward translations
Both backward translations were judged to be acceptable by the US project team. However, 1 translation did not appear to be as close to the original US version as the other. In the less faithful backward translation, there were occasional problems: specific words (such as "strenuous" in referring to sports or "physical" in relation to health) were missing; an example was omitted; or an item was incorrectly phrased as a question rather than a statement. Nonetheless, both translations captured the intent of the US version and were generally equivalent in measuring the severity or frequency of the concept.
Canadian English version
All of the consensus panel members were in complete agreement that the words and idiomatic phrases in the US version of the SF-36 were clear and easily understandable to English-speaking Canadians. There were no conceptual problems with the use of the US version; even the examples were reflective of Canadian life. Therefore, we made no alterations to the US version, with the exception of changing "mile" to "kilometre" in question 3 ("Walking more than one mile"). This change was agreed upon after considerable discussion. Although the actual distances (1 mile v. 1 km) are different, the concept -- being able to walk quite a long way -- is the same.
Evaluation of responses
One hundred and six English-speaking and 99 French-speaking physical and occupational therapy students completed the survey form. Of the English-speaking students 88.7% were female and 11.3% male; on average they were 22.2 (standard deviation [SD] 3.2) years old and had completed 15.98 (SD 1.79) years of schooling. The French-speaking students were 76.3% female and 23.7% male; on average they were 21.6 (SD 2.1) years old and had finished 15.76 (SD 1.44) years of schooling. Of the students who completed the French form, 98% listed their mother tongue as French. Of those who answered the English survey, 73% stated that English was their native language; 25% listed their mother tongue as "other." Fifteen students (9 English-speaking and 6 French-speaking) were judged as probably not having understood the task on the basis of a pattern of disordinal or outlying responses; these students' responses were excluded from further analyses.
The mean response choice ratings are given in Table 1. Parallel information for a US sample is also provided. Comparison of the means by response choice allows differences due to language to become apparent. In the Canadian data, the English and French response choice means, with the exception of "some of the time," were all within half of a scale point of each other. The larger difference (3.46 English v. 2.91 French) for "some of the time" (translated as parfois) suggested that the translation was not equivalent. It was changed to quelquefois. There also was a trend in the French responses toward higher mean ratings in the "not at all" to "extremely" response set.
Comparisons of the US responses with the Canadian English and French responses showed that the Canadian English and US values were generally within 0.20 scale points of each other. US and Canadian French values were within half of a scale point of each other, with the exception of moyennement (moderately), which was 0.65 points higher in the Canadian French responses than in the US responses.
Table 1 also demonstrates the ordinality of response choices within a response continuum for both the French and English responses, and allows the interval properties of the Canadian and US data sets to be evaluated and compared with the SF-36 scoring values. If the mean ratings are rounded up or down to the nearest integer, according to convention, the distance between rounded values is generally 1, with some exceptions (including souvent/la plupart du temps and un peu/moyennement in the French responses). Results for the "excellent" to "poor" response continuum were similar among all 3 data sets: the distance between "very good" and "excellent" was less than 1, and the distance between "poor" and "fair" was greater than 1. Results for the pain severity response continuum ("none" to "very severe") were also similar among all 3 data sets. For both response continuums, there were differences in the ratings given by these 3 samples and the scoring values.
Table 2 provides the mean percentage of responses for which the order was correct. In the Canadian French responses, the mean percentage of ordinal pairs ranged from a low of 95.9% to a high of 99.5%. The range in the Canadian English responses was 98.4% to 99.7%. Canadian subjects seemed to have difficulty with 2 response choice pairs. Nine percent of English subjects noted that "some of the time" was more than "a good bit of the time," and 39% of French subjects marked "a good bit of the time" (souvent) as being more than "most of the time" (la plupart du temps).
Psychometric pilot studies
Hospital outpatients and community-dwelling elderly patients: Two groups of subjects took part in the pilot studies. The first group consisted of 50 French-speaking and 51 English-speaking outpatients attending the rheumatology clinic, the metabolic day centre or the geriatric day centre of the Royal Victoria Hospital. The French- and English-speaking patients were visiting the hospital for similar medical reasons and had comparable medical histories and current problems. The second group consisted of 92 French-speaking individuals and 69 English-speaking elderly people living in the community. The English- and French-speaking elderly people reported similar patterns of independent living. When these samples were combined, there were 142 individuals who had completed the Canadian French SF-36 and 120 who had completed the English version. Among those speaking French, 73 (51.4%) were male and 69 (48.6%) female. The average age was 67.4 (SD 14.1) years and about one-third reported that they had formal education beyond high school. In the English-speaking group, 59 (49.2%) were male and 61 (50.8%) were female; the subjects' average age was 70.1 (SD 13.5) years. Almost 45% reported having postsecondary education. Thus, the individuals responding in English were slightly older and better educated, on average, than those who completed the Canadian French survey.
Psychometric tests confirmed that assumptions underlying the construction and scoring of the summated rating scales were met in both the English and French versions. Among the 8 scales, median correlations of items to their hypothesized scales ranged from 0.59 to 0.87 for the English version and from 0.62 to 0.88 for the French version (Table 3 and Table 4). All correlations were greater than 0.40, with the exception of 2 general health items in the English sample ("I seem to get sick a little easier than other people" and "I expect my health to get worse"), which had correlations of 0.38 and 0.32, respectively. In the French sample, all items had higher correlations with their hypothesized scale than with other scales. In the English sample, items had higher correlations with their hypothesized scale than with other scales in all but 8 out of 280 comparisons. Within each scale, items had roughly equal means and variances, and the correlations of items to their hypothesized scale were roughly equal (data not reported). Cronbach's alpha ranged from 0.76 to 0.93 in the English sample and 0.80 to 0.94 in the French sample, and exceeded the 0.70 level recommended for group comparisons for all scales. In fact, 3 scales -- physical functioning, bodily pain and role emotional -- in both English and French met the 0.90 level suggested for use of a scale with individuals.³⁹
In addition to contributing data for the psychometric analyses, the outpatients provided information about the time it took to complete the SF-36 and their reactions to it. The time to complete the form in English or French was virtually identical: 12.8 (SD 7.2) and 12.5 (SD 5.5) minutes, respectively. Ten of 51 patients completing the English version reported that 1 or more questions were confusing. The most common comment from patients was that questions 3 and 9 had long "stems" which were difficult to "keep in mind." Three patients found the questionnaire mildly upsetting. Among those completing the Canadian French version, 8 patients reported that 1 or more questions were confusing. More than half reported a problem with question 4d. The original question began, "Have you had difficulty . . ." This had been translated as Avez-vous eu du mal, which created confusion; it was changed to Avez-vous eu de la difficulté. No patient completing the Canadian French version found it upsetting.
[ Top of document ]
Discussion
Our experience suggests that, through the careful use of qualitative and quantitative approaches, a measure developed and tested in one country can be adapted to create acceptable versions for use in another country, language and culture.
On balance, the Canadian French version appears to have benefitted from the rigorous process of forward and backward translation, which involved the input of professional translators and health care professionals as well as an iterative procedure in which feedback from patients was evaluated rapidly by the study group and incorporated if appropriate. Although the quality ratings were, on average, above 90%, specific comments about items provided information for the use of the study group. Similarly, the ratings of difficulty made us aware of phrases or concepts that were problematic during translation, and we were alert to feedback about these words in particular. This awareness was reflected in the successful backward translations.
The pilot test among the hospital outpatients demonstrated that the Canadian French version was easy to administer and acceptable to the patients. Although a few patients had difficulty with individual items, the only systematic problem was one of translation -- Avez-vous eu de la difficulté rather than Avez-vous eu du mal -- and was easily corrected. The tests of scaling assumptions and the preliminary psychometric results were well within adequate limits, given the size of the sample.
After considerable thought and discussion, we decided to make only 1 change to the US form to create the Canadian English version. Although it was somewhat surprising that so few modifications were needed, it should be noted that the Australian and United Kingdom IQOLA groups made only minor changes to the survey.¹⁶ In the psychometric testing, difficulties were noted with the general health scale. Question 11 contributes 4 of 5 items to this scale, and question 1 completes it. In this multidimensional scale, 3 items (1, 11b, 11d) measure current health, item 11a assesses resistance to illness and 11c taps health outlook. Although the psychometric tests confirmed the scaling and scoring assumptions, items 11a and 11c did not perform as well as the other 3 items. The developers²⁵ and Swedish investigators³² also experienced difficulties with these items. Given the diverse concepts contained in the scale, it is not surprising that the items that measure concepts other than current health are not strongly correlated with the other items in the scale. Moreover, investigators studying older adults in Britain extensively discussed the item "I expect my health to get worse" (11c) and noted that it was seen as very negative.⁴⁰ Elderly patients in our sample also appeared to have had a difficult time responding to this item, as well as to the one about the ease of becoming ill, compared with others. Whether this difficulty is specific to elderly patients will have to be evaluated in future studies.
The purpose of the scaling exercise was to examine the ordinal and interval properties of the response choices. In both data sets, ordinality was clearly evident. Psychometric theory proposes that an ordinal scale is a "quasi-interval" scale and can be treated as interval for analyses.⁴¹ Moreover, results of analyses of unequal interval scales have been consistent with those of equal interval scales.⁴² When the Canadian French and Canadian English data sets are viewed individually, the mean response choices in each response set appear roughly interval, except in the "excellent" to "poor" and "none" to "very severe" response sets. US investigators have also had difficulty with these 2 response continuums.⁴³ However, differences also exist in the actual intervals when the data sets are compared. In theory, the SF-36 in Canada could be scored with the use of the specific French and English mean values obtained in the scaling exercise, rather than the scoring values, to determine whether these differences matter. Experience with these sorts of rating scales suggests that they will not make a difference (at least within Canada) because the statistics used are very robust among nearly all monotonic transformations of the scale scores.⁴¹ There are more likely to be differences between the Canadian English and French scores because some of the response sets (pas du tout to énormément) or response choices (moyennement) are shifted to a higher level in the French data set. Although this could be a problem in combining Canadian French and English data, it should be kept in mind that the results reported here were from 1 scaling exercise of 190 respondents. These results would need to be replicated in other samples, with varying sociodemographic and health characteristics, to determine whether a problem really exists. Meanwhile, since the results of the scaling exercise in Canada generally conform to US results, we recommend that the US scoring values continue to be used in Canada.
In general, this process of translation and cultural adaptation followed the guidelines proposed by Guillemin, Bombardier and Beaton.⁴⁴ These guidelines include the use of forward and backward translations by multiple translators, committee review and pretesting in monolingual and bilingual individuals. Re-weighting scores, also recommended, may be considered, but more data are needed. The process has resulted in acceptable versions of Canadian French and Canadian English SF-36 forms. Although the Canadian SF-36 forms are ready for use in clinical practice and research to evaluate the effects of treatment, further psychometric testing of reliability, validity and responsiveness in different patient samples is clearly necessary. As well, age- and sex-specific normative data on populations in English- and French-speaking areas of Canada would be valuable in monitoring the health of the population and in assessing the impact of health and social policies.⁴
Information about the Canadian French and English SF-36 forms may be obtained from the Medical Outcomes Trust, 20 Park Plaza, Suite 1014, Boston MA 02116-4313. The Trust is a nonprofit organization; prices for products are set to cover costs. Various packets that include SF-36 forms, scoring and interpretation manuals, and information about publications are available at different prices.
[ Top of document ]
Acknowledgements
The IQOLA Project is sponsored by Glaxo Wellcome Inc., Research Triangle Park, NC, and Schering-Plough Corporation, Kenilworth, NJ. Associate sponsors are Astra, Procter & Gamble Pharmaceuticals, Searle and Solvay Duphar B.V. In addition, support for the work in Canada was provided by Berlex Canada, Boehringer Ingelheim Pharmaceuticals, F. Hoffmann-LaRoche AG, Hoechst-Roussel Canada, Janssen Research Foundation, Marion Merrell Dow Canada, Miles Canada, Ortho-McNeil, SmithKline Beecham Pharmaceuticals and Zeneca Pharmaceuticals.
We also wish to thank Claudette Corrigan and Adrian Levy for assistance with data collation and statistical analysis, respectively. They are also grateful for the assistance of the translation team members. In particular, Rosemarie Bélisle and Helena Scheffer provided almost instant translations and excellent advice, time and time again. Finally, the contributions of the other members of the IQOLA group are acknowledged: N.K. Aaronson, Amsterdam; C. Acquadro, Lyon; J. Alonso, Barcelona; G. Apolone, Milan; J. Bjørner, Copenhagen; J. Brazier, Sheffield; D. Bucquet, Montpellier; S. Kaasa, Trondheim; A. Leplège, Paris; M. Bullinger, Hamburg; S. Fukuhara, Tokyo; D. Razavi, Brussels; R. Sanson-Fisher, Newcastle; M. Sullivan, Gothenburg; A. Wagner, Boston; and J.E. Ware, Boston.
[ Top of document ]

References

American College of Physicians. Comprehensive functional assessment for elderly patients. Ann Intern Med 1988;109:70-2.
Deyo RA, Carter WB. Strategies for improving and expanding the application of health status measure in clinical settings: a researcher-developer viewpoint. Med Care 1992;30(suppl):176S-86S.
Fowler J Jr, Cleary PD, Magaziner J, Patrick DL, Benjamin KL. Methodological issues in measuring patient-reported outcomes: the agenda of the work group on outcomes assessment. Med Care 1994;32:1565-76.
Ware JE, Davies AR. Monitoring health outcomes from the patients' point of view: a primer. Kenilworth (NJ): Integrated Therapeutics Group; 1995.
Patrick DL, Deyo RA. Generic and disease specific measures in assessing health and quality of life. Med Care 1989;27(suppl):217S-32S.
Ware JE, Sherbourne CD. The MOS 36-item Short-Form Health Survey (SF-36): I. Conceptual framework and item selection. Med Care 1992;30:473-83.
Stewart AL, Ware JE, editors. Measuring functioning and well-being: the Medical Outcomes Study approach. Durham (NC): Duke University Press; 1992.
McHorney CA, Ware JE, Rogers WH, Raczek AL, Lu JF. The validity and relative precision of MOS short- and long-form health status scales and Dartmouth COOP charts: results from the Medical Outcomes Study. Med Care 1992;30(suppl):253S-65S.
Aaronson NK, Acquadro C, Alonso J, Apolone G, Bucquet D, Bullinger ME, et al. International Quality of Life Assessment (IQOLA) Project. Qual Life Res 1992;1:349-51.
Ware JE, Keller SD, Gandek B, Brazier JE, Sullivan M and the IQOLA Project Group. Evaluating translations of health status questionnaires. Methods from the IQOLA Project. Int J Technol Assess Health Care 1995;11:525-51.
McHorney CA, Kosinski M, Ware JE. Comparisons of the costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interview: results from a national survey. Med Care 1994;32:551-67.
Likert R. A technique for the measurement of attitudes. Arch Psychol 1932;140:5-55.
Ware JE, Snow KK, Kosinski M, Gandek B. SF-36 Health Survey Manual and Interpretation Guide. Boston (MA): The Health Institute, New England Medical Center; 1993.
Ware JE, Kosinski M, Bayliss MS, McHorney CA, Rogers WH, Raczek AE. Comparison of methods for scoring and statistical analysis of SF-36 health profile and summary measures: summary of results from the Medical Outcomes Study. Med Care 1995;33(suppl):S264-79.
Ware JE, Kosinski M, Keller SD. SF-36 physical and mental health summary scales: a user's manual. Boston (MA): The Health Institute, New England Medical Center; 1994.
Medical Outcomes Trust. SF-36 health survey scoring manual for English language adaptations: Australia/New Zealand, Canada, United Kingdom. Boston (MA): Medical Outcomes Trust; 1994.
Weinberger M, Samsa GP, Hanlon JT, Schmader K, Doyle ME, Cowper PA, et al. An evaluation of a brief health status measure in elderly veterans. J Am Geriatr Soc 1991;39:691-4.
Brazier JE, Harper R, Jones NMB, O'Cathain A, Thomas KJ, Usherwood T, et al. Validating the SF-36 health survey questionnaire: new outcome measure for primary care. BMJ 1992;305:160-4.
Jenkinson C, Wright L, Colter A. Criterion validity and reliability of the SF-36 in a population sample. Qual Life Res 1994;3:7-12.
EUROQOL Group. EUROQOL -- A new facility for the measurement of health related quality of life. Health Pol 1990;16:199-208.
McHorney CA, Ware JE, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care 1993;31:247-63.
Garratt AM, Macdonald LM, Ruta DA, Russell IT, Buckingham JK, Krukowski ZH. Towards measurement of outcomes for patients with varicose veins. Anal Health Care 1993;2:5-10.
Lyons RA, Perry HM, Littlepage BNC. Evidence for the validity of the Short-Form 36 Questionnaire (SF-36) in an elderly population. Age Aging 1994;23:182-4.
McCallum J. The SF-36 in an Australian sample: validating a new, generic health status measure. Aust J Public Health 1995;19:160-6.
McHorney CA, Ware JE, Lu JFR, Sherbourne CD. The MOS 36-Item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions and reliability across diverse patient groups. Med Care 1994;32:40-66.
Muller MJ, Aaronson NK, TE Velde A, Sprangers MAG, Buitelaar AC, Abbink EM. Psychometric properties of the MOS SF-36 health survey in a population of patients with cancer [abstract]. Qual Life Res 1995;4:465.
Bousquet J, Knani J, Dhivert H, Richard A, Chicoye A, Ware JE, et al. Quality-of-life in asthma: I. Internal consistency and validity of the SF-36 questionnaire. Am J Resp Crit Care Med 1994;149:371-5.
Ruta DA, Garratt AM, Wardlaw D, Russell IT. Developing a valid and reliable measure of health outcome for patients with low back pain. Spine 1994;19:1887-96.
Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in the health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol 1997;50:79-93.
Katz JN, Larson MG, Phillips CB, Fossel AH, Liang MH. Comparative measurement sensitivity of short and longer health status instruments. Med Care 1992;30:917-25.
Jenkinson C, Lawrence K, McWhinnie D, Gordon J. Sensitivity to change of health status measures in a randomized controlled trial: comparison of the COOP charts and the SF-36. Qual Life Res 1995;4:47-52.
Sullivan M, Karlsson J, Ware JE. The Swedish SF-36 health survey: I. Evaluation of data quality, scaling assumptions, reliability, and construct validity across general populations in Sweden. Soc Sci Med 1995;41:1349-58.
Jenkinson C, Wright L, Coulter A. Quality of life measurement in health care: a review of measures and population norms for the UK SF-36. Oxford (England): Health Services Research Unit, Department of Public Health and Primary Care, University of Oxford; 1993.
Aaronson NK, Ahmedzai S, Bergman B, Bullinger M, Cull A, Duez N, et al. The European Organization for Research and Treatment of Cancer QLQ-C30: a quality-of-life instrument for use in international clinical trails in oncology. J Natl Cancer Inst 1993;85:365-76.
Thurstone LL, Chave EJ. The measurement of attitudes. Chicago: University of Chicago Press; 1929.
Hays RD, Hayashi T, Carson S, Ware J. User's guide for the Multitrait Analysis Program (MAP). Report. N-2786-RC. Santa Monica (CA): The RAND Corporation; 1988. [report no. N-2786-RC]
Howard KI, Forehand GG. A method for correcting item-total correlations for the effect of relevant item inclusion. Educ Psychol Measure 1962;22:731-5.
Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951;16:297-334.
Nunnally JC. Psychometric theory. 2nd ed. New York: McGraw-Hill; 1978.
Hayes V, Morris J, Wolfe C, Morgan M. The SF-36 Health Survey Questionnaire: Is it suitable for use with older adults? Age Aging 1995;24:120-3.
Nunnally JC, Bernstein JH. Psychometric theory. 3rd ed. New York: McGraw-Hill; 1994.
Spector PE. Ratings of equal and unequal response choice intervals. J Soc Psychol 1980;112:115-9.
Ware JE, Nelson EC, Sherbourne CD, Stewart AL. Preliminary tests of a 6-item General Health Survey: a patient application. In: Stewart AL, Ware JE, editors. Measuring functioning and well being. Durham: Duke University Press; 1992:291-303.
Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. J Clin Epidemiol 1993;46:1417-32.

| CIM: August 1997 / MCE : août 1997 |

CMA Webspinners / >