Government of CanadaPublic Health Agency of Canada / Agence de santé publique du Canada
   
Skip all navigation -accesskey z Skip to sidemenu -accesskey x Skip to main menu -accesskey m  
Français Contact Us Help Search Canada Site
PHAC Home Centres Publications Guidelines A-Z Index
Child Health Adult Health Seniors Health Surveillance Health Canada
   


Volume 21, No.3 - 2000  

 

Public Health Agency of Canada (PHAC)

Estimation of Youth Smoking Behaviours in Canada

William Pickett, Anita Koushik, Taron Faelker and K Stephen Brown


Volume 21, No. 3 - 2000 


Abstract

This study estimated the prevalence of current smoking and smoking initiation among Canadian youth. Logistic regression was used to relate socio-demographic predictors to the occurrence of the smoking indicators among youth (15-24 years) in the 1994/95 National Population Health Survey (NPHS). Models were then applied to provincial youth populations in the 1996/97 NPHS and the 1996 census of Canada. Model-generated estimates were compared with direct estimates obtained from NPHS data. The models accurately predicted provincial rates of current youth smoking for 1994/95. When applied to the 1996/97 NPHS, the current smoking models performed reasonably well, but were less predictive when applied to 1996 census data. Modelling of youth smoking initiation was not successful. This suggests that although simple estimation models of youth smoking can be derived, these models may not be portable across different populations or time periods.

Key words: population health surveys; small area estimation; smoking; tobacco control; youth



Introduction

Smoking is an important and preventable cause of death and illness.1,2 Among Canadians, smoking causes 45,200 deaths annually;3 it is a major cause of respiratory disease, cancer and circulatory disease;4-9 and it contributes enormous burdens to Canadian society in terms of lost economic productivity and health care expenditure.10,11 Despite considerable public health effort, rates of cessation in the general and youth populations are low, particularly among regular smokers.1 Smoking initiation occurs primarily among adolescents,1 and Canadian youth smoking rates appear to have been on the rise during the past few years.12,13 Programs that prevent the uptake of tobacco use by youth are therefore of considerable importance to public health.

To design effective tobacco control programs, current data on the incidence and prevalence of youth smoking are required. Population-based health program planning often requires smoking estimates that are specific to the particular region. General surveys such as the National Population Health Survey (NPHS) in Canada14 are designed to provide reliable estimates for a relatively large geographic area, such as an entire country or province. Health planning efforts generally target smaller areas, such as local health regions or units. Direct surveys of these smaller area populations can be expensive, and alternative techniques are required to obtain estimates of health indicators.

Traditional strategies used to obtain small area statistics include synthetic, multiple regression and combined estimation approaches, and these methods have been used to estimate rates of disability, cause-specific mortality and unemployment in small areas.15-20 Such techniques assume that strong and stable associations exist between socio-demographic variables and health-related characteristics in a population.16

Synthetic estimation involves the application of stratum-specific estimates of the health behaviour (e.g. smoking rates) to the population of the small area defined by the same socio-demographic strata.15,16 Multiple regression approaches to estimation use geographic subunits (e.g. counties, provinces) as the unit of analysis. Data from population-based surveys are used to develop a regression equation that relates area-level characteristics to rates of the health behaviour. Values for the small areas are then substituted into the regression equation to determine the prevalence/incidence of the health behaviour.19 Combined approaches to small area estimation incorporate both synthetic and regression properties.17 A common version of this approach involves determining national rates for subgroups by means of regression analysis and then applying these rates to the population distribution of the small area.18

The primary objectives of the present study were to use a regression approach along with data from the 1994/95 and 1996/97 NPHS14,21 and the Canada Census of Population (census)22 to develop mathematical models that would estimate the prevalence of current smoking and incidence (initiation) of daily smoking among different populations of youth in Canada in 1996. The predictive capability of the models was evaluated on the basis of pre-specified criteria using direct estimates from the 1996/97 version of the NPHS.21 If successful, this approach will eventually permit the estimation of youth smoking behaviours for a small area, based entirely upon demographic data that are available routinely from the population census. As a starting point, this estimation and evaluation was done at the provincial level, with further work planned for smaller area populations if the process worked successfully.


Methods


Overview

A regression approach to estimation was used. Youth aged 15-24 who participated in the 1994/95 NPHS (n = 2,597) formed the study population for model development. Logistic regression equations were derived to relate socio-demographic predictors to the occurrence of smoking outcomes of interest in this study population. Individuals were the unit of analysis, and standardized survey weights were incorporated into the modelling process. Eight separate multivariate regression models were fitted for each outcome. Individual predicted logits were back-transformed to predicted probabilities of the smoking outcome, given the relations between socio-demographic characteristics and smoking indicated by each model. Individual probabilities were then applied to population counts for strata similarly defined by the categories of the predictor variables, to determine prevalence (current smoking) and incidence (new daily smoking) estimates for each province of Canada in 1996. Population strata counts were obtained first from the 1996/97 NPHS (n = 9,601) and then the 1996 census. To evaluate the models, the model-generated estimates for each province were compared with direct estimates obtained from the 1996/97 NPHS.


Current Smoking Outcome

The first smoking indicator examined was the prevalence of current smoking, derived from question SMOK-Q2 (current smoking status) on the 1994/95 NPHS. Current smoking is the most comprehensive indicator of the prevalence of smoking23 and includes daily and non-daily smoking of cigarettes. This outcome was defined as the proportion of the population aged 15-24 that smoked daily or occasionally. The analogous variable at the individual level, which was used for the regression models, categorized individuals as either "currently smoking" (daily or occasionally) or "not at all."


Smoking Initiation Outcome

The second smoking indicator was the incidence (or uptake) of daily smoking, defined as the proportion of the population that became daily smokers in the previous year. This was also a derived outcome variable (questions SMOK-Q2 [current smoking status] and DVSMKY94 [number of years smoking]).14 Youth were categorized as "new/incident daily smokers" if they had been smoking daily for one year or less.


Predictors of Youth Smoking

The goal of this project was to develop methods for small area estimation that could be applied in any region of Canada for which census data were available. Thus the socio-demographic factors used as predictors were included only if they were available in both versions of the NPHS (1994/95 and 1996/97) and in cross-tabulated form from the 1996 census. Different combinations of the following predictor variables were available: age, sex, language, education and unemployment. Each of the predictor variables was dichotomized in order to simplify the cross-tabulations. For the logistic regression models, province was included as an independent variable in order to control for any provincial effects. All possible combinations of variables were considered in the selection of models for estimation. Five models that included only main effect terms were promising and were thus selected for presentation (models 1-5, Table 1). Three additional models (models 6-8) including interaction terms that were postulated a priori were also considered.


TABLE 1
Description of variables and models

Variable/model Description
Dependent variables
  Prevalence of current smokinga Proportion of current smokers (daily or occasional) in the population

  Incidence of daily smokingb

Proportion of persons who began smoking daily in the previous year

Predictor variablesc
 Age 15-19 and 20-24
 Sex Male and female
 Language English and/or French and other
 Unemployment Looking for work and all other labour force characteristics

 Education

Not currently attending school and currently attending school

Models
 Model 1 Age, sex, province
 Model 2 Sex, language, province
 Model 3 Age, sex, unemployment, province
 Model 4 Age, sex, education, province
 Model 5 Age, sex, unemployment, education, province
 Model 6 Age, sex, unemployment, age by unemployment, province
 Model 7 Age, sex, education, age by education, province
 Model 8 Age, sex, unemployment, education, age by unemployment, age by education, province
a In the regression models current smokers were compared with non-smokers.
b In the regression models new daily smokers were compared with all other respondents.
c All variables are dichotomized.

   

Estimation

For each of the eight models, logistic regression was used to model relations between the predictor variables and the outcome of interest and to generate maximum likelihood parameter estimates. With respect to the first study outcome (current smoking), for example, the logit of the probability that an individual currently smokes given his or her socio-demographic characteristics (e.g. age, sex, province) was estimated. Predicted probabilities were then calculated by back-transformation for each combination of socio-demographic characteristics and province. Under the assumption that all individuals with certain characteristics had equal probabilities of the outcome, the probabilities were applied to the socio-demographic structure of the youth populations. Population strata counts were obtained first from the 1996/97 NPHS and then the 1996 census.


Model Validation

Direct estimates of both study outcomes (with associated 95% confidence intervals [CI]) were calculated for each province from the 1996/97 NPHS. Model-generated estimates were then compared with the direct estimates. There was an a priori assumption that the models were reasonable if the modelled provincial estimates were in approximately the same rank order as the direct provincial estimates and if the mean of the absolute differences between the modelled and direct estimates was small (near zero) relative to the estimates themselves. If both of these criteria were met, this would indicate that the modelling process resulted in reasonably accurate estimates. Spearman correlation coefficients24 and associated two-tailed p values were also calculated to quantify correlations between provincial rankings obtained by direct and model estimation.


Results


General Patterns of Association

Parameter estimates obtained from the logistic regression modelling process were determined for both study outcomes. Unemployment and lower levels of education were found to be consistently associated with current smoking. For smoking initiation, age was the most consistent predictor. Education and unemployment were significantly associated with smoking initiation only in models that included both predictors simultaneously. These two predictors were positively correlated (higher education, higher unemployment), since full-time students were classified as unemployed within these data sources. Terms that described the interaction between age and both unemployment and education were not significantly associated with either study outcome.


Current Smoking

Provincial estimates of current smoking outcome (both direct and model-estimated) are shown in Table 2. These models relied upon stratum-specific population distribution figures from the 1996/97 NPHS. Direct estimates varied from 27.9% in Ontario to 40.1% in Quebec, and when Quebec was excluded the range of point estimates was quite small (27.9-33.6%). Model-generated estimates were in approximately the same order as the direct estimates, except for New Brunswick and Alberta, for which each of the eight models consistently ranked differently from the direct estimates. With respect to the second validation criterion, the mean of the differences between direct and model-generated estimates was smallest for models 4 and 7. For illustration and reference, Table 3 shows the beta coefficients and associated 95% CI derived for logistic regression model 4. Model 4 was considered the "best" of these two models because it is was parsimonious; it shows that the coefficients for age and sex were not statistically different from zero, and education was modestly associated with current smoking.


TABLE 2
Direct and model-generated estimates of the prevalence of current smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific population from the 1996/97 NPHS)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 1

Model 2

Model 3

Model 4

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

33.6 (26.1-41.1)

 3

32.3

 5

32.7

 5

22.8

 4

32.2

 5

NS

31.0 (23.4-38.6)

 6

29.7

 7

28.8

 9

21.1

 7

31.0

 7

PEI

33.6 (25.2-42.0)

 2

34.2

 4

35.1

 4

22.2

 5

32.3

 4

NB

30.7 (23.7-37.7)

 7

34.4

 3

35.3

 3

25.5

 3

33.0

 3

Que

40.1 (34.9-45.3)

 1

39.6

 1

39.2

 1

30.4

 1

40.3

 1

Ont

27.9 (26.6-29.2)

10

27.6

10

27.9

10

20.1

10

26.5

10

Man

31.5 (27.4-35.6)

 5

37.2

 2

37.3

 2

25.6

 2

35.4

 2

Sask

30.4 (23.0-37.8)

 8

31.2

 6

31.5

 6

21.5

 6

31.6

 6

Alb

31.9 (25.1-38.7)

 4

29.6

 8

30.1

 7

20.2

 9

30.0

 8

BC

29.2 (23.1-35.3)

 9

29.3

 9

29.2

 8

20.5

 8

29.1

 9

Correlation of ranks with direct estimates  (p value)

0.67 (0.03)

0.66 (0.04)

0.58 (0.08)

0.67 (0.04)

Mean of the absolute differences

1.7

1.9

9.0

1.4


TABLE 2 (cont'd)
Direct and model-generated estimates of the prevalence of current smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific population from the 1996/97 NPHS)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 5

Model 6

Model 7

Model 8

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

33.6 (26.1-41.1)

 3

24.9

 4

23.0

 4

32.3

 5

25.3

 4

NS

31.0 (23.4-38.6)

 6

23.3

 7

21.4

 7

30.9

 7

24.2

 6

PEI

33.6 (25.2-42.0)

 2

24.4

 5

22.4

 5

32.4

 4

24.8

 5

NB

30.7 (23.7-37.7)

 7

27.1

 3

25.7

 3

33.1

 3

27.6

 3

Que

40.1 (34.9-45.3)

 1

32.8

 1

30.7

 1

40.3

 1

33.6

 1

Ont

27.9 (26.6-29.2)

10

21.4

10

20.4

10

26.5

10

22.0

10

Man

31.5 (27.4-35.6)

 5

27.7

 2

25.9

 2

35.3

 2

28.3

 2

Sask

30.4 (23.0-37.8)

 8

23.8

 6

21.7

 6

31.6

 6

24.2

 7

Alb

31.9 (25.1-38.7)

 4

22.4

 9

20.5

 8

29.9

 8

22.9

 9

BC

29.2 (23.1-35.3)

 9

22.3

 8

20.8

 9

29.1

 9

22.9

 8

Correlation of ranks with direct estimates  (p value)

0.66 (0.04)

0.60 (0.07)

0.67 (0.03)

0.62 (0.06)

Mean of the absolute differences

7.0

8.7

1.4

6.4

TABLE 3
Parameter estimates
a from logistic regression model used to estimate the prevalence of current smoking (model 4)

 

Intercept

Age

Sex

Education

Baseline level

 

15-19

Males

Not attending school

Beta
(95% CI)

-1.384
(-1.666- -1.102)

-0.045
(-0.234-0.144 )

0.148
(-0.021-0.317)

0.926
(0.737-1.116)

a Although considered in the model, parameter estimates are not presented for individual provinces.

   


Table 4
presents direct and model-generated estimates of current smoking, but this time based upon stratum-specific population distributions from the 1996 census. The model-generated estimates followed a similar pattern to those estimated directly using the 1996/97 NPHS, although the strength of the correlation between the direct and model-generated estimates was not as strong.

Smoking Initiation

Direct and model-generated estimates of the incidence (or uptake) of daily smoking are shown in Table 5 (based on 1996/97 NPHS data) and Table 6 (based on the 1996 census). This outcome was quite rare, varying from 1.1% to 4.7% in the 10 provinces (direct estimates). In general and for both data sources, the direct and model-generated estimates were poorly correlated, and the mean differences between modelled and direct estimates were large relative to the proportions of youth who were initiating smoking.

Temporal Considerations

In Table 7, estimates and rankings from the best model in Table 2 (i.e. model 4) are presented along with direct estimates and rankings from the 1994/95 and 1996/97 NPHS. Direct estimates in 1994/95 and 1996/97 changed at least slightly in most provinces and quite substantially in New Brunswick and Manitoba. Similarly, the provincial rankings changed between the two NPHS survey years: 1996/97 model-generated estimates and ranks were consistent with those directly obtained from the 1994/95 NPHS and less strongly associated with the1996/97 direct ranks.


TABLE 4
Direct and model-generated estimates of the prevalence of current smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific population from the 1996 census of Canada)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 1

Model 2

Model 3

Model 4

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

33.6 (26.1-41.1)

3

32.9

5

33.3

5

25.2

4

30.0

6

NS

31.0 (23.4-38.6)

6

29.3

9

29.5

10

23.7

6

29.1

7

PEI

33.6 (25.2-42.0)

2

34.3

4

35.1

4

24.2

5

30.1

5

NB

30.7 (23.7-37.7)

7

35.1

3

35.4

3

27.7

2

33.7

3

Que

40.1 (34.9-45.3)

1

39.7

1

41.6

1

32.2

1

37.9

1

Ont

27.9 (26.6-29.2)

10

27.7

10

31.3

9

21.7

8

26.4

10

Man

31.5 (27.4-35.6)

5

37.3

2

39.9

2

27.4

3

35.6

2

Sask

30.4 (23.0-37.8)

8

31.6

6

32.8

7

23.2

7

31.9

4

Alb

31.9 (25.1-38.7)

4

29.9

7

32.0

8

21.5

10

29.1

8

BC

29.2 (23.1-35.3)

9

29.9

8

33.3

6

21.7

9

28.6

9

Correlation of ranks with direct estimates  (p value)

0.57 (0.09)

0.43 (0.21)

0.51 (0.14)

0.49 (0.15)

Mean of the absolute differences

1.7

2.8

7.1

2.4

TABLE 4 (cont'd)
Direct and model-generated estimates of the prevalence of current smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific population from the 1996 census of Canada)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 5

Model 6

Model 7

Model 8

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

33.6 (26.1-41.1)

3

25.9

4

25.4

4

30.1

6

26.3

4

NS

31.0 (23.4-38.6)

6

24.7

7

23.9

6

29.0

8

25.2

7

PEI

33.6 (25.2-42.0)

2

24.8

6

24.4

5

31.0

5

25.9

5

NB

30.7 (23.7-37.7)

7

28.9

3

28.1

2

33.8

3

29.5

3

Que

40.1 (34.9-45.3)

1

41.6

1

32.5

1

37.9

1

34.0

1

Ont

27.9 (26.6-29.2)

10

22.6

10

22.0

9

26.5

10

23.1

10

Man

31.5 (27.4-35.6)

5

29.1

2

27.7

3

35.6

2

30.0

2

Sask

30.4 (23.0-37.8)

8

25.1

6

23.5

7

31.9

4

25.6

6

Alb

31.9 (25.1-38.7)

4

23.0

8

21.7

10

29.1

7

23.5

8

BC

29.2 (23.1-35.3)

9

22.9

9

22.0

8

28.3

9

23.5

9

Correlation of ranks with direct estimates  (p value)

0.54 (0.11)

0.51 (0.14)

0.49 (0.15)

0.61 (0.06)

Mean of the absolute differences

5.9

6.9

2.4

5.4


TABLE 5
Direct and model-generated estimates of the incidence of daily smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific populations from the 1996/97 NPHS)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 1

Model 2

Model 3

Model 4

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

 2.8 (0.4-5.2)

5

1.9

8

1.9

8

2.0

7

1.9

8

NS

3.6 (0.4-6.8)

4

3.7

3

4.0

2

1.3

10

3.7

3

PEI

1.6 (0.4-2.8)

8

4.0

2

3.8

3

4.1

2

3.9

2

NB

4.7 (1.4-8.0)

1

1.6

9

1.5

9

1.7

8

1.6

9

Que

4.3 (1.7-6.9)

2

2.9

4

3.0

4

2.9

3

2.9

4

Ont

2.1 (1.7-2.5)

7

2.3

7

2.3

6

2.4

6

2.3

7

Man

1.4 (0.4-2.4)

9

2.5

5

2.5

5

2.6

4

2.4

5

Sask

3.7 (0.7-6.7)

3

6.3

1

6.3

1

6.5

1

6.4

1

Alb

2.7 (0.3-5.1)

6

1.5

10

1.5

10

1.5

9

1.5

10

BC

1.1 (0.0-2.6)

10

2.3

6

2.3

7

2.4

5

2.3

6

Correlation of ranks with direct estimates  (p value)

-0.01 (0.99)

-0.02 (0.96)

-0.15 (0.68)

-0.14 (0.70)

Mean of the absolute differences

1.4

1.4

1.7

1.4


TABLE 5 (cont'd)
Direct and model-generated estimates of the incidence of daily smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific populations from the 1996/97 NPHS)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 5

Model 6

Model 7

Model 8

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

 2.8 (0.4-5.2)

5

3.1

8

2.0

8

2.0

8

2.9

7

NS

3.6 (0.4-6.8)

4

5.7

3

3.9

3

3.6

3

4.7

3

PEI

1.6 (0.4-2.8)

8

6.4

2

4.2

2

4.0

2

6.1

2

NB

4.7 (1.4-8.0)

1

2.3

10

1.7

9

1.6

9

2.1

10

Que

4.3 (1.7-6.9)

2

4.3

4

3.0

4

2.9

4

3.9

4

Ont

2.1 (1.7-2.5)

7

3.1

7

2.4

7

2.3

7

2.8

8

Man

1.4 (0.4-2.4)

9

3.9

5

2.6

5

2.4

5

3.5

5

Sask

3.7 (0.7-6.7)

3

10.0

1

6.6

1

6.4

1

9.1

1

Alb

2.7 (0.3-5.1)

6

2.4

8

1.6

10

1.5

10

2.2

9

BC

1.1 (0.0-2.6)

10

3.6

6

2.4

6

2.3

6

3.3

6

Correlation of ranks with direct estimates  (p value)

-0.23 (0.52)

-0.01 (0.99)

-0.14 (0.70)

-0.04 (0.91)

Mean of the absolute differences

2.2

1.5

1.4

2.0


TABLE 6
Direct and model-generated estimates of the incidence of daily smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific populations from the 1996 census of Canada)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 1

Model 2

Model 3

Model 4

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

 2.8 (0.4-5.2)

5

1.8

8

1.9

8

1.9

7

1.8

8

NS

3.6 (0.4-6.8)

4

3.8

3

4.0

2

1.3

10

3.8

2

PEI

1.6 (0.4-2.8)

8

3.9

2

3.7

3

4.0

2

3.7

3

NB

4.7 (1.4-8.0)

1

1.5

9

1.5

10

1.5

8

1.5

9

Que

4.3 (1.7-6.9)

2

2.8

4

3.2

4

2.9

3

2.8

4

Ont

2.1 (1.7-2.5)

7

2.3

6

2.5

6

2.3

5

2.2

6

Man

1.4 (0.4-2.4)

9

2.5

5

2.6

5

2.5

4

2.4

5

Sask

3.7 (0.7-6.7)

3

6.3

1

6.5

1

6.5

1

6.4

1

Alb

2.7 (0.3-5.1)

6

1.5

10

1.6

9

1.5

9

1.4

10

BC

1.1 (0.0-2.6)

10

2.2

7

2.5

7

2.3

6

2.2

7

Correlation of ranks with direct estimates  (p value)

-0.04 (0.91)

0.02 (0.96)

-0.12 (0.75)

0.08 (0.83)

Mean of the absolute differences

1.4

1.5

1.7

1.4


TABLE 6 (cont'd)
Direct and model-generated estimates of the incidence of daily smoking
among Canadian youth aged 15-24 in 1996
(estimates generated using stratum-specific populations from the 1996 census of Canada)

Province

Direct and model-generated estimates and provincial ranking

Direct

Model 5

Model 6

Model 7

Model 8

%
(95% CI)

Rank

%

Rank

%

Rank

%

Rank

%

Rank

Nfld

 2.8 (0.4-5.2)

5

2.6

8

1.9

8

1.8

8

2.3

8

NS

3.6 (0.4-6.8)

4

5.2

3

4.0

3

3.8

3

4.6

3

PEI

1.6 (0.4-2.8)

8

5.5

2

4.1

2

3.8

3

5.0

2

NB

4.7 (1.4-8.0)

1

2.1

10

1.6

9

1.5

9

1.9

10

Que

4.3 (1.7-6.9)

2

3.7

5

3.0

4

2.8

4

3.4

5

Ont

2.1 (1.7-2.5)

7

3.0

7

2.4

6

2.3

6

2.7

7

Man

1.4 (0.4-2.4)

9

3.8

4

2.6

5

2.4

5

3.5

4

Sask

3.7 (0.7-6.7)

3

9.5

1

6.6

1

6.4

1

8.7

1

Alb

2.7 (0.3-5.1)

6

2.2

9

1.5

10

1.5

10

2.0

9

BC

1.1 (0.0-2.6)

10

3.2

6

2.3

7

2.2

7

3.0

6

Correlation of ranks with direct estimates  (p value)

-0.15 (0.68)

0.03 (0.93)

0.08 (0.83)

-0.15 (0.68)

Mean of the absolute differences

2.1

1.5

1.4

1.9


TABLE 7
Comparison of model 4-generated estimates/rankings and directestimates/rankings from the 1994/95 and 1996/97 NPHS for the prevalence of current smoking

Province

Model 4-generated
from 1996/97 NPHS

1994/95 direct

1996/97 direct

Estimate (%)

Rank

Estimate (%)

Rank

Estimate (%)

Rank

Nfld

32.2

 5

32.7

 5

33.6

 3

NS

31.0

 7

29.1

 9

31.0

 6

PEI

32.3

 4

34.7

 4

33.6

 2

NB

33.0

 3

35.0

 3

30.7

 7

Que

40.3

 1

39.1

 1

40.1

 1

Ont

26.5

10

27.6

10

27.9

10

Man

35.4

 2

37.4

 2

31.5

 5

Sask

31.6

 6

31.6

 6

30.4

 8

Alb

30.0

 8

29.7

 7

31.9

 4

BC

29.1

 9

29.5

 8

29.2

 9

Correlation of ranks with direct estimates (p value)

0.96 (< 0.0001)

0.67 (0.04)

Mean of the absolute differences

0.5

0.2


FIGURE 1
Comparison of direct and model 4-generated estimates of current smoking prevalence among Canadian youth aged 15-24 in 1996 (from the 1996/97 NPHS)


   

Discusson

In this study we attempted to estimate the prevalence and initiation of youth smoking in Canada using existing data from the NPHS and the Canadian census. A regression estimation approach was used. Analyses focused upon the derivation of estimates of youth smoking at the provincial level, and these were compared with direct estimates of youth smoking that were thought to be fairly stable and accurate.

There were a number of criteria established for the estimation process before this study was conducted. For a model to be acceptable, we felt that it had to produce accurate estimates. Second, the techniques used for estimation had to be fairly simple, as we felt they would be applied by personnel with basic statistical and spreadsheet training. The techniques themselves involved use of area-specific socio-demographic information from the census, in cross-tabulated form, in order to generate estimates of youth smoking. This meant that the predictors included in the models were limited to those available in the census, and that any model developed would ideally be parsimonious. Finally, it was hoped that the final models would be portable (i.e. applicable to different Canadian populations of youth both geographically and across time).

Parameter estimates for models of current smoking were derived from the 1994/95 NPHS data. When the latter estimates were applied to random subsets of socio-demographic data for youth in that survey, they performed almost perfectly (data not shown), as would be expected given that the models were generated from the 1994/95 data set. When they were applied to socio-demographic data taken from the 1996/97 NPHS, the models performed reasonably well, but not perfectly. Performance declined further when these same models were applied to 1996 socio-demographic data from the census. As seen in Table 7, the model-generated estimates based on the 1996/97 NPHS data tended to be more similar in magnitude and rankings to the direct estimates from 1994/95 than those of 1996/97. This casts doubt on whether even the best of the predictive models was portable across time.

Population distributions of the socio-demographic factors considered in the models were relatively stable and would not, themselves, account for the observed temporal variations in youth smoking. This indicates that associations between the predictors and smoking changed over time, or that one or more explanatory variables that were not considered in the models were unstable. Thus, in general, the socio-demographic variables available for use were not sufficient to accurately predict youth smoking behaviours and changes in these behaviours.

For current smoking, although the models performed reasonably well according to both validation criteria, performance was inconsistent by province. Models that included unemployment (models 3, 5, 6 and 8) tended to underestimate the prevalence of current smoking. One would expect the employment status of individuals in this age group to be inaccurate, particularly as most youth are not viewed as being in the traditional labour force. Models that included education (but not unemployment-models 4 and 7) provided estimates that were more accurate than those that considered solely age and sex (model 1). Inclusion of the age-by-education interaction term did not improve the estimates. (Here, accuracy and its improvement refer to the degree to which direct and model-generated estimates were similar in rank and magnitude.)

Modelling of the second youth smoking indicator (initiation of daily smoking) was not successful. In general, there was a poor correlation between the rank order of the provincial estimates obtained directly and the model-generated rank estimates. Because the outcome was quite rare, there was also considerable variability in the direct estimates as indicated by their relatively large confidence intervals.

The study was limited by the sample sizes available in the two versions of the NPHS (2,597 and 9,601 for the 1994/95 and 1996/97 versions respectively). Direct estimates of smoking and the predictive models were less stable in some provinces than was anticipated. This also meant that additional subanalyses (within various age/sex strata, for example) were not possible, and that a split-data approach to model development and validation could not be undertaken. On a population level, the lack of interprovincial variation in the two smoking indicators complicated the estimation process. Similarly, for many of the eight models considered in our analysis, there was little variation between provinces in the underlying distributions of salient predictors. This also contributed to our difficulties in developing stable, predictive models.

To illustrate these last points, Figure 1 provides a visual summary of the provincial rates obtained of current smoking, estimated both directly from the 1996/97 NPHS and then by application of model 4 to the stratum-specific population from that survey. The direct and model-generated estimates were generally quite close, but this figure shows the inherent difficulty in using province-level data to perform small area estimation. With the exception of Quebec, there was actually only minor variation between the provinces in the prevalence of current smoking. It is possible, and even likely, that within-province variations in rates of youth smoking actually exceeded the variations observed between the provinces. The modelling procedures and their imputations may have performed better had they been derived from smaller areas than provinces.

Existing studies that have used synthetic, regression and combined estimation approaches have met with mixed success, and our results are consistent with this. For example, MacKenzie et al.15 applied data from the United States National Health Interview Survey to local area data from the US census to estimate a variety of health indicators. Local estimates were then validated using a large, population-based telephone survey to generate "gold standard" values. Synthetic and regression estimates were found to approximate the gold standard results for some health-related variables, but not for others. There was no consistency with respect to the types of variables that produced accurate and poor approximations. Spasoff et al.16 used synthetic and regression estimation to perform similar procedures with the 1990/91 Ontario Health Survey and the 1986 Canadian census. Again, small area estimation did not perform well in approximating gold standard estimates.

The findings observed in both of these studies were attributed to limitations in the design of the evaluations. The latter included flawed choices of gold standards and inaccurate sources of data on which to base estimation models. Our experience suggests that, in contrast to the results of these efforts, simple estimation models of youth smoking can be derived. However, these models may not be portable across different populations and time periods, and it is perhaps unrealistic to expect that complex behaviours like youth smoking can be predicted solely on the basis of socio-demographic factors and simple estimation approaches. More sophisticated analytic methods (such as the newer Empirical Bayes approaches20) may be required, or else the conceptualization of models must involve use of predictors that are not made available routinely through the census.


Acknowledgements

This study was financially supported by grant 6605-6550-NPHS from the National Health Research and Development Program, Health Canada. Dr Pickett is a Career Scientist funded by the Ontario Ministry of Health and Long-term Care.


References

1. Marcus SE, Giovino GA, Pierce JP, Harel Y. Measuring tobacco use among adolescents. Public Health Rep 1993;108 (Suppl 1):20-4.

2. Stephens M, Siroonian J. Smoking prevalence, quit attempts and successes. Health Rep 1998;9(4):31-7.

3. Makomaski Illing EM, Kaiserman MJ. Mortality attributable to tobacco use in Canada and its regions, 1994 and 1996. Chronic Dis Can 1999;20(3):111-7.

4. Mao Y, Gibbons L, Wong T. The impact of decreased prevalence of smoking in Canada. Can J Public Health 1992;83:413-6.

5. Risch HA, Howe GR, Jain M, Burch JD, Holowaty EJ, Miller AB. Are female smokers at higher risk for lung cancer than male smokers? A case-control analysis by histologic type. Am J Epidemiol 1993;138:281-93.

6. Poulin C, Elliot D. Alcohol, tobacco and cannabis use among Nova Scotia adolescents: implications for prevention and harm reduction. Can Med Assoc J 1997;156:1387-93.

7. Miller AB. The brave new world-what can we realistically expect to achieve through cancer control early in the new millennium? [commentary]. Chronic Dis Can 1999;20(4):139-50.

8. Single E, Rehm J, Robson L, Truong MV. The relative risks and etiologic fractions of different causes of death and disease attributable to alcohol, tobacco and illicit drug use in Canada. Can Med Assoc J 2000;162:1669-75.

9. Single E, Robson L, Rehm J, Xie X. Morbidity and mortality attributable to alcohol, tobacco, and illicit drug use in Canada. Am J Public Health 1999;89:385-90.

10. Kaiserman MJ. The cost of smoking in Canada, 1991. Chronic Dis Can 1997;18(1):13-9.

11. Single E, Robson L, Xie X, Rehm J. The costs of substance abuse in Canada: a best estimation study. Ottawa: Canadian Centre on Substance Abuse, 1996.

12. Health Canada (Stephens T, Morin M, editors). Youth Smoking Survey, 1994: technical report. Ottawa, 1996; Cat H49-98/1-1994E.

13. Hobbs FM, Pickett W, Ferrence RG, Brown KS, Madill C, Adlaf EM. Youth smoking in Ontario 1981-87: a cause for concern. Can J Public Health 1999;90:80-2.

14. Statistics Canada. National Population Health Survey 1994-95. Public use microdata files. Ottawa, 1995; Cat 82F0001-XBD.

15. MacKenzie EJ, Shapiro S, Yaffe R. The utility of synthetic and regression estimation techniques for local health planning. Med Care 1985;23:1-13.

16. Spasoff RA, Strike CJ, Nair RC, Dunkley GC, Boulet JR. Small group estimation for public health. Can J Public Health 1996;87:130-4.

17. Levy PS, French DK. Synthetic estimation of state health characteristics based on the health interview survey. Washington: US Department of Health, Education and Welfare, 1977; Vital and Health Statistics, Series 2, No 75, DHEW Pub No (PHS)78-1349.

18. Lafata JE, Koch GG, Weissert WG. Estimating activity limitation in the noninstitutionalized population: a method for small areas. Am J Public Health 1994;84:1813-7.

19. Purcell NJ, Kish L. Estimation for small domains. Biometrics 1979;35:365-84.

20. Ghosh M, Rao JNK. Small area estimation: an appraisal. Stat Sci 1994;9:55-93.

21. Statistics Canada. National Population Health Survey 1996-97 [unpublished tabulations]. Ottawa, 1999.

22. Statistics Canada. 1996 census dictionary. Ottawa, 1997; Cat 92-351-XPE.

23. Mills C, Stephens T, Wilkins K. Summary report of the workshop on data for monitoring tobacco use [workshop report]. Chronic Dis Can 1994;15(3):105-10.

24. Rosner B. Fundamentals of biostatistics. 3rd ed. Belmont (CA): Duxbury, 1989:452-5. 


Author References

William Pickett, Departments of Community Health and Epidemiology and of Emergency Medicine, Queen's University, Kingston, Ontario; and Ontario Tobacco Research Unit, Toronto, Ontario

Anita Koushik, Department of Community Health and Epidemiology, Queen's University, Kingston, Ontario; and Department of Epidemiology and Biostatistics, McGill University, Montreal, Quebec

Taron Faelker, Department of Community Health and Epidemiology, Queen's University, Kingston, Ontario

K Stephen Brown, Ontario Tobacco Research Unit, Toronto, Ontario; and Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario

Correspondence: Dr William Pickett, Queen's University, c/o Emergency Medicine Research, Angada 3, Kingston General Hospital, 76 Stuart Street, Kingston, Ontario  K7L 2V7; Fax: (613) 548-1381; E-mail: PickettW@post.queensu.ca

[Previous][Table of Contents] [Next]

Last Updated: 2002-10-04 Top