Skip to content | Skip to institutional links

Common menu bar links

Share this page

Chronic Diseases and Injuries in Canada

Volume 32, no. 1, December 2011

Chronic Diseases and Injuries in Canada

Navigate This Article
Abstract
Introduction
Methods
Results
Discussion
Acknowledgements
References

[Previous] [Table of Contents] [Next page]

Validating the CANRISK prognostic model for assessing diabetes risk in Canada’s multi-ethnic population

C. A. Robinson, MA (1); G. Agarwal, MBBS (2); K. Nerenberg, MD, MSc (3)

This article has been peer reviewed.

Author references:

Centre for Chronic Disease Prevention and Control, Public Health Agency of Canada, Ottawa, Ontario, Canada
Department of Family Medicine, McMaster University, Hamilton, Ontario, Canada
Department of Medicine, Royal Alexandra Hospital, Edmonton, Alberta, Canada

Suggested citation: Robinson CA, Agarwal G, Nerenberg K. Validating the CANRISK prognostic model for assessing diabetes risk in Canada’s multi-ethnic population. Chronic Dis Inj Can. 2011;32(1):XX-XX.

Correspondence: Chris Robinson, Centre for Chronic Disease Prevention and Control, Public Health Agency of Canada, 785 Carling Avenue, Ottawa, ON K1A 0K9; Tel.: (613) 957-9874; Fax: (613) 941-2633; Email: chris.robinson@phac-aspc.gc.ca

Abstract

Introduction: Despite high rates of undiagnosed diabetes and prediabetes, suitable risk assessment tools for estimating personal diabetes risk in Canada are currently lacking.

Methods: We conducted a cross-sectional screening study that evaluated the accuracy and discrimination of the new Canadian Diabetes Risk Assessment Questionnaire (CANRISK) for detecting diabetes and prediabetes (dysglycemia) in 6223 adults of various ethnicities. All participants had their glycemic status confirmed with the oral glucose tolerance test (OGTT). We developed electronic and paper-based CANRISK scores using logistic regression, and then validated them against reference standard blood tests using test-set methods. We used area under the curve (AUC) summary statistics from receiver operating characteristic (ROC) analyses to compare CANRISK with other alternative risk-scoring models in terms of their ability to discern true dysglycemia.

Results: The AUC for electronic and paper-based CANRISK scores were 0.75 (95% CI: 0.73–0.78) and 0.75 (95% CI: 0.73–0.78) respectively, as compared with 0.66 (95% CI: 0.63–0.69) for the Finnish FINDRISC score and 0.69 (95% CI: 0.66–0.72) for a simple Obesity model that included age, BMI, waist circumference and sex.

Conclusion: CANRISK is a statistically valid tool that may be suitable for assessing diabetes risk in Canada’s multi-ethnic population. CANRISK was significantly more accurate than both the FINDRISC score and the simple Obesity model.

Keywords: diabetes, prediabetes, screening, risk assessment, FINDRISC, blood sugar, public health.

Introduction

Despite high rates of undiagnosed diabetes and prediabetes in Canada, the assessment tools currently used to estimate an individual’s risk of diabetes are lacking. It is clinically important to be able to identify individuals at risk for diabetes. First, undiagnosed diabetes often remains undetected for 4 to 7 years before clinical diagnosis, and many newly diagnosed patients already exhibit signs of microvascular and macrovascular complications.^1,2 Second, individuals with prediabetes (impaired fasting glucose [IFG] and/or impaired glucose tolerance [IGT]) have a high likelihood of developing type 2 diabetes—10 to 20 times that of normoglycemic persons.^3,4 As such, adults with prediabetes are the most likely to benefit from early interventions.^3,4

Large randomized experimental studies such as the Finnish Diabetes Prevention Study⁵ and the US Diabetes Prevention Program⁶ have demonstrated that lifestyle intervention can effectively reduce the incidence of diabetes among those with prediabetes. Risk-scoring questionnaires may be useful to enhance individual risk assessment and lifestyle education. They could also lead to more cost-effective diabetes screening approaches.

Several prognostic risk-scoring models for type 2 diabetes are currently available for clinical use.^7-14 However, most require specific blood test results, which presumes that a clinical encounter or diagnostic testing has already taken place. This limits widespread use of these models from a public health perspective. A diabetes risk assessment approach that relies only upon information a participant can self-complete without detailed knowledge of specific laboratory test values has been developed in Finland. The Finnish Diabetes Risk Score¹⁵ (FINDRISC) is a key element of Finland’s national FIN-D2D diabetes prevention program, which has successfully screened over 10% of the Finnish population so far. FINDRISC has been used in Finland to identify high-risk individuals who might benefit from interventions or who would merit further investigation using the oral glucose tolerance test (OGTT). Among those detected by the Finnish study as being at high risk of developing diabetes, 60% of men and 45% of women had already developed abnormal glucose intolerance at baseline.¹⁶ The incidence of diabetes at one-year follow-up was between 18% and 22% among those who had high-risk prediabetes (i.e. both IFG and IGT) at baseline. Of those who completed a lifestyle education program, 17% reduced their body weight by over 5%; as a result, their risk of developing diabetes was 69% lower than that of those with stable weight.¹⁷

However, the generalizability of FINDRISC is limited by the different ethnic make-up of Canada compared to that of Finland. As a result, Canadian diabetes experts adapted FINDRISC to include ethnicity and other key variables (sex, education, macrosomia) to create the Canadian Diabetes Risk Assessment Questionnaire (CANRISK).¹⁸ *

This paper describes three main objectives of our study: (1) to develop a risk-scoring prognostic model (similar to FINDRISC score) suitable for Canada’s multi-ethnic population (CANRISK); (2) to validate the resulting scoring model using a test-set methodology to assess dysglycemia from measured blood tests; and (3) to compare the predictive accuracy of the new CANRISK model to FINDRISC.

* http://www.diabetes.ca/documents/for-professionals/NBI-CANRISK.pdf.

Methods

Data source

Between 2007 and 2011, 6475 Canadian adults from seven provinces (British Columbia, Saskatchewan, Manitoba, Ontario, New Brunswick, Nova Scotia and Prince Edward Island) were recruited in a screening study to detect diabetes and prediabetes using the CANRISK questionnaire. Several large urban sites were deliberately included to ensure a diverse multi-ethnic sample of participants. All participants had their glycemic status confirmed with the oral glucose tolerance test (OGTT, i.e. fasting plasma glucose [FPG] and plasma glucose 2 hours after a 75 g glucose challenge). A subset of participants at three CANRISK sites also had their glycated hemoglobin (HbA_1c) measured.

Most participants were recruited through face-to-face encounters during opportunistic visits at community health centres;¹⁹ some were recruited through local mailouts.²⁰ Most participants were aged 40 to 74 years, although some sites chose to include younger Aboriginal participants and those from other non-White ethnic groups.

Eligibility criteria for inclusion in the study included the following: no previous diagnosis of diabetes (or prediabetes at some pilot sites); not currently pregnant; able to complete the CANRISK questionnaire in English or French, with assistance if required (most sites, although other language versions were also available at several urban pilot sites); not currently using metformin or other glucose-modifying prescription drugs (some pilot sites); and living within the local study area.

Data restrictions (core data)

For estimating the various prognostic models we restricted the CANRISK dataset to those participants who had complete data for key variables (blood test results, age, sex, ethnicity, height, weight). We imputed missing waist circumference (6% of core cases) from mean values obtained from participants with valid data, stratified by age, sex, and body mass index (BMI) (see Table 1). Missing family history was also imputed (i.e. assumed to be “no” for 13% of core cases). Cases with item-missing data for other variables were dropped from the final regression models.

**Table 1. Characteristics of core CANRISK participants (n = 6223)**
Q	Characteristics by response to CANRISK questions^a	Percentage, %	Valid number, n	Number with missing data
Abbreviations: BMI, body mass index; CANRISK, Canadian Diabetes Risk Assessment Questionnaire; Q, question number from CANRISK. ^a For the complete version of the CANRISK questions, see http://www.diabetes.ca/documents/for-professionals/NBI-CANRISK.pdf. ^b From self-reported weight and height. ^c Imputed missing waist circumference (6% of core cases) from mean values obtained from participants with valid data. ^d Missing family history (13% of core cases) was assumed to be “no”. ^e These responses come from selected pilot sites only.
3	Male	36.4	2263	0
1	Age, years (mean = 52.6; SD = 12.5)			0
	19–45	26.4	1644
	45–54	27.5	1712
	55–64	28.5	1774
	65–78	17.6	1093
2	BMI (kg/m ²)^b			0
	Normal/underweight (< 25)	42.8	2666
	Overweight (25–29.9)	33.0	2052
	Obese, non-morbid (30–34.9)	15.8	982
	Obese, morbid (35+)	8.4	523
3	Waist circumference (cm)			368^c
	Male < 94 / Female < 80	19.5	1213
	Male 94–102 / Female 80–88	26.4	1643
	Male > 102 / Female > 88	54.1	3367
4	Daily brisk physical activity = 30 minutes
4	No	37.8	2350	13
5	Daily consumption of fruit/vegetables
5	No	23.9	1484	4
6	High blood pressure diagnosed by a doctor or nurse / has taken medication for blood pressure
6	Yes	31.6	1954	46
7	High blood sugar confirmed by a blood test / during an illness / during pregnancy
7	Yes	13.5	822	141
8	Positive family history of diabetes^d
	Mother	25.7	1390	824
	Father	20.2	1039	1077
	Sibling	24.6	1301	933
	Child	2.5	148	326
	Other relatives	33.2	1795	824
9	Ethnicity (mother)
	White (Caucasian)	65.7	4089	0
	Aboriginal	12.1	756	0
	Black	3.5	220	0
	Latin American	2.8	175	0
	Soutd Asian	5.3	328	0
	East Asian	10.1	629	0
	Other	1.0	63	0
10	Ethnicity (father)
	White (Caucasian)	66.0	4084	34
	Aboriginal	11.3	698	31
	Black	3.6	222	31
	Latin American	2.7	169	30
	South Asian	5.3	327	30
	East Asian	10.2	632	30
	Other	1.2	72	34
11	Education			16
	Some high school or less	23.2	1443
	High school diploma	21.4	1330
	Some college or university	26.8	1669
	University or college degree	28.6	1781
12	Self-rated health status			27
	Excellent	10.4	648
	Very good	33.2	2067
	Good	42.1	2618
	Fair/poor	14.3	890
13	Smoking status^e
13	Daily cigarettes	13.6	534	2294
15	History of gestational diabetes (% females)	7.5	258	268
16	History of macrosomia (% females)	22.0	678	202

Predictor variables

We derived certain predictor variables from answers to the CANRISK questionnaire (e.g. BMI from weight and height). We converted continuous variables such as age and BMI into categorical variables and then adopted a dummy variable approach for logistic regression analysis. This allowed non-linearities in the predictor variables while still generating a practical scoring algorithm where scores can be summed using simple arithmetic (e.g. the paper-based version of the CANRISK scoring tool). Smoking status was only available for selected pilot sites (63% of total observations) since this question was added to the CANRISK questionnaire during the last phase of data collection. (The smoking variable was intended for use in other potential data linkage studies regarding cardiovascular risk. For this reason, and because of the large percentage of item-missing data, smoking was not included as a predictor in the CANRISK dysglycemia prognostic model.)

Outcome variable

For the purposes of validation, the outcome for the prognostic model was dysglycemia based on the collective results of participants’ blood tests (FPG and 2-hour 75 g OGTT value) according to standard World Health Organization 2006 criteria.^21,22

Model validation and performance: general approach

Following standard statistical methods, we validated the CANRISK model using the split-sample test-set approach.²³ This process of internal validation involved randomly splitting the core CANRISK dataset into a derivation “test” dataset made up of 70% of the available cases (n = 4366), with the remaining 30% “set” data (n = 1857) serving as the validation dataset. In the first step, we used the “test” training data to estimate the prognostic model using logistic regression. The Hosmer-Lemeshow summary statistic and the associated Brier score²⁴ were used to assess the goodness-of-fit of the model. We then used the resulting regression coefficients to predict dysglycemia in the “set” dataset. We assessed the accuracy of the regression model (i.e. discrimination in terms of correctly classifying true-positive cases with dysglycemia) using receiver operating characteristic (ROC) curves. For measuring the overall performance of the regression model in terms of predictive validity, we used the area under the curve (AUC) summary statistic (i.e. the concordance c statistic).

Finally, for various potential CANRISK score thresholds, we calculated standard measures of sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) in order to assess the diagnostic validity of the screening test at each threshold.

Creating the CANRISK prognostic model for dysglycemia

As the first step, we used data from the cross-sectional test subsample to estimate three logistic regression models to predict the dysglycemia outcome. These were (1) the Obesity model, using BMI, waist circumference, age and sex. (This basic model was intended to reflect observable risk factors commonly used for diabetes screening); (2) the FINDRISC Variables model, using the eight questions in FINDRISC (i.e. the first eight questions on CANRISK). (This model reflected how well the FINDRISC variables predicted dysglycemia in a cross-sectional analysis within the CANRISK dataset); and (3) the CANRISK model, using all the variables available from the CANRISK questionnaire. (This “full information” model reflected ethnicity and other variables added to the basic FINDRISC Variables model).

Statistical analysis

In developing the CANRISK prognostic model we recognized that the existing FINDRISC scores derived from 10-year cumulative incidence (i.e. definitive long-term diabetes outcome) should be retained and enhanced, rather than replaced with an entirely new prognostic model based on current dysglycemia (i.e. short-term risk condition from blood testing on one occasion). Our statistical methods therefore reflect our analytical objective to adapt the existing FINDRISC prognostic model by including ethnicity and other key variables to ensure generalizability to the Canadian population. Minimizing the number of predictor variables was not paramount in this case.

Using the “test” training dataset, we proceeded to develop the CANRISK prognostic model according to the following steps:

We assessed correlations between the dependent variable (dysglycemia) and various independent variables (predictors). We also assessed correlations between predictors to identify potential multicollinearity, which would violate the independent variable assumption.
We conducted univariate analyses to determine the strength of association between dysglycemia and individual predictors. We used these results to determine the order of entry of the Canadian predictors into the CANRISK model.
We forced FINDRISC’s eight questions into a logistic regression to create the FINDRISC Variables model, measuring its performance in terms of goodness-of-fit and accuracy.
We added ethnicity and other potential predictors to the basic FINDRISC Variables model in a series of steps, assessing gains in model performance at each step, and using the likelihood ratio to assess the added predictive power. Variable selection in the final CANRISK prognostic model therefore involved maximizing the correct classification of true-positive cases by the overall model, while ensuring goodness-of-fit as well as statistical significance of the overall model and individual predictors at α = 0.05. Each variable in the final CANRISK model was also subject to a priori expectations regarding the correct sign, meaning that a known risk factor should have a positive coefficient and a known protective factor should be negative. Statistical analyses were performed using SPSS version 15.0 for Windows.²⁵

Results

The study population

Figure 1 illustrates how the available data were organized for analysis. We excluded 3.9% of participants with missing data for key variables from the “core” dataset. Table 1 describes ethnicity and other key characteristics of the 6223 persons remaining in the core dataset and related item-missing data for individual variables.

Figure 1 - CANRISK data

Figure 1, Text Equivalent

Blood test results (Table 2) showed that 20.5% of the participants tested positive for dysglycemia (15.7% prediabetes; 4.8% newly detected diabetes). Of the 1273 cases of dysglycemia identified, only 545 (43%) would have been identified using fasting glucose alone.

**Table 2. Blood test results used for validating CANRISK prognostic model**
	Blood test results^a	Percentage of total,^b,c %	Cases detected, n
Abbreviations: FPG, fasting plasma glucose; HbA1c, glycated haemoglobin; IFG, impaired fasting glucose; IGT, impaired glucose tolerance; OGTT, 2-hour 75 g oral glucose tolerance test. ^a Results are based on standard 2006 World Health Organization diagnostic criteria.^15,16 ^b n = 6223 participants in the core dataset. ^c Values may not add up the total due to rounding. ^d Only selected pilot sites measured HbA1c.
A	Isolated IFG	3.8	238
B	Isolated IGT	9.2	573
C	High-risk prediabetes (IFG and IGT)	2.6	163
D	Total cases of prediabetes = A + B + C	15.7	974
E	Diabetes detected via FPG only	0.8	52
F	Diabetes detected via OGTT glucose challenge only	2.5	155
G	Diabetes detected via both FPG and OGTT glucose challenge	1.5	92
H	Total cases of screen-detected diabetes = E + F + G	4.8	299
	Total cases of dysglycemia = D + H	20.5	1273
	Cases with HbA1c > 6.5% from subset of 1057 participants^d	4.2	44

Estimation of the CANRISK prognostic model

Table 3 presents the three different prognostic models that we estimated using logistic regression methods applied to the core CANRISK data. In terms of goodness-of-fit and overall significance, all three models were highly significant based on likelihood ratio and Pearson chi-square (χ²) at p < .001. The Hosmer-Lemeshow summary statistic also indicated that each of the models was a good fit. The Brier score24 for the CANRISK prognostic model was 0.002; the typical range is 0 (perfect) to 0.25 (no predictive value).

**Table 3. Comparison of three estimated logistic regression models based on outcome of dysglycemia**
	CANRISK^a (n = 4091 test obs)				FINDRISC Variables^b (n = 4251 test obs)			Obesity^c (n = 4366 test obs)
	Logistic regression model
Abbreviations: BMI, body mass index; CANRISK, Canadian Diabetes Risk Assessment Questionnaire; CI, confidence interval; DM, diabetes mellitus; eCANRISK, electronic-based CANRISK score; F, female; FINDRISC, Finnish Diabetes Risk Score; M, male; N/A, not applicable; obs, observations; OR, odds ratio; pCANRISK, paper-based CANRISK score; ref, reference. Notes: Shaded cells in FINDRISC Variables and Obesity models were not part of the assessment. a Uses all the variables available from the CANRISK questionnaire. b Uses the eight questions in FINDRISC (i.e. the first eight questions on CANRISK) and reflects how well the FINDRISC variables predicted dysglycemia in a cross-sectional analysis within the CANRISK dataset. c Uses BMI, waist circumference, age and sex to reflect observable risk factors commonly used for diabetes screening. d Maximum pCANRISK score is 81 for females, 86 for males. e In the FINDRISC Variables model, this group is combined with BMI ≥ 35 to represent body mass index of 30+ (i.e. similar to FINDRISC score variables). f Not statistically significant but retained in the model for educational purposes. g In the CANRISK model, this group counts the number of categories of first-degree relatives affected, while in the FINDRISC model this group indicates whether any first-degree relative was affected. h Statistically insignificant in the CANRISK model and with the wrong sign (negative coefficient). i Black ethnicity was not statistically significant but showed the correct sign (positive coefficient) and was plausible based on other epidemiological studies,^29-31 and was therefore retained. j Having a high school diploma was not statistically significant but it was retained to reflect the increasing risk associated with patterns of low education. * p < .05
Number of dysglycemia events in each model subsample, n	852				873			902
	OR	95% CI	eCANRISK score (ß)*	pCANRISK^d score	OR	95% CI	ß coefficient*	OR	95% CI	ß coefficient*
Intercept			-3.84				-3.31			3.25
Variable
Age, years
19–44 (ref)	1.00				1.00			1.00
45–54	2.01	1.53–2.63	0.70	7	1.77	1.37–2.28	0.57	1.98	1.55–2.52	0.68
55–64	3.33	2.55–4.37	1.20	13	2.81	2.20–3.59	1.03	3.27	2.59–4.13	1.19
65–78	4.21	3.12–5.69	1.44	15	3.65	2.78–4.79	1.29	4.33	3.37–5.57	1.47
BMI, kg/m²
< 25 (ref)	1.00				1.00			1.00
25–29.9	1.43	1.10–1.86	0.36	4	1.43	1.12–1.83	0.36	1.29	1.01–1.64	0.25
30–34.9^e	2.43	1.78–3.33	0.89	9	2.74	2.07–3.63	1.01	2.12	1.59–2.82	0.75
35+	3.70	2.61–5.24	1.31	14				3.55	2.60–4.84	1.27
Waist circumference, cm
M < 94 / F < 80 (ref)	1.00				1.00			1.00
M 94–102/ F 80–88	1.51	1.11–2.06	0.41	4	1.27	0.94–1.70	0.24	1.46	1.10–1.95	0.38
M >102 / F > 88	1.74	1.24–2.45	0.56	6	1.29	0.95–1.76	0.26	1.77	1.30–2.42	0.57
Physical activity ≥ 30 min/day
Yes (ref)	1.00				1.00
No^f	1.12	0.94–1.33	0.11	1	1.09	0.92–1.29	0.09
Eats fruit/vegetables every day
Yes (ref)	1.00				1.00
No^f	1.16	0.95–1.43	0.15	2	1.30	1.07–1.57	0.26
History of high blood pressure
No (ref)	1.00				1.00
Yes	1.43	1.20–1.70	0.36	4	1.42	1.20–1.68	0.35
History of high blood glucose
No (ref)	1.00				1.00
Yes	3.88	3.14–4.79	1.36	14	3.72	3.04–4.55	1.31
Family history of diabetes
None (ref)	1.00				1.00
First-degree relative with DM^g	1.21	1.09–1.34	0.19	2	1.31	1.11–1.54	0.27
Any second degree relative affected^h	—	—	—	—	0.74	0.61–0.89	-0.31
Sex
Female (ref)	1.00							1.00
Male	1.68	1.39–2.04	0.52	6				1.56	1.32–1.84	0.44
Ethnicity
White (ref)	1.00
Aboriginal	1.35	1.004–1.82	0.30	3
Blackⁱ	1.53	0.92–2.54	0.43	5
East Asian	2.61	1.93–3.52	0.96	10
South Asian	2.69	1.90–3.82	0.99	11
Macrosomia (women) ^f
No or N/A (ref)	1.00
Yes	1.06	0.81–1.39	0.06	1
Education
Some college/university (ref)	1.00
High school diploma^j	1.13	0.91–1.40	0.12	1
Less than high school	1.60	1.31–1.96	0.47	5

The resulting CANRISK prognostic model includes several key risk factors—notably ethnicity—as well as family history, waist circumference, BMI and other key variables. As indicated by the odds ratios (ORs) in Table 3, non-White ethnicity was a significant risk factor compared to the White reference group (e.g. OR = 2.69 for South Asian people; 2.61 for East Asian people; 1.35 for Aboriginal people). Black ethnicity (OR = 1.53; 95% CI: 0.92–2.54) was not statistically significant but showed the correct sign (positive coefficient) and was plausible based on other epidemiological studies^26-28 and was therefore retained. Latin American ethnicity and Other ethnicity were both statistically insignificant. Compared to high educational attainment at the university or college level, low educational attainment (OR = 1.60 for less than high school) was statistically significant as a risk factor, although having only a high school diploma was not. We retained the latter to reflect the increasing risk associated with patterns of low education. Being male (OR = 1.68) was another significant risk factor in the CANRISK model. (It was excluded from the original FINDRISC model). Compared to no family history of diabetes, positive family history (i.e. OR = 1.21 for the number of categories of first-degree relatives affected with diabetes: mother, father, sibling, child) was also significant in the CANRISK model (family history of diabetes had not been directly estimated in FINDRISC). Family history for second-degree relatives was statistically insignificant and had the wrong sign (negative coefficient), and was therefore rejected. Diet and physical activity variables were not statistically significant but did generate the correct a priori sign (positive coefficient). In keeping with the FINDRISC approach, we retained these lifestyle variables in the model for educational purposes. For similar reasons, we also retained macrosomia (i.e. women who gave birth to a child weighing 4.1 kg or more) in the CANRISK model despite its statistical insignificance.

Other potential variables such as self-reported health status were tried but rejected due to implausible sign and statistical insignificance of the coefficient. Two variables were dropped due to multicollinearity: history of gestational diabetes was highly correlated with history of high blood sugar, and father’s ethnicity was highly correlated (0.92) with mother’s ethnicity. Including these variables in the model led to counterintuitive signs on the coefficients and decreased the goodness-of-fit in the model. (Note that this does not mean that father’s ethnicity is unimportant or should not be measured. Rather, it means that mother’s ethnicity can serve as a proxy measure for both parents when estimating the relevant model coefficient.)

Electronic and paper-based CANRISK scores

In order to implement the CANRISK model, specific threshold scores are required as potential credible cut-offs for determining broad categories of diabetes risk: low, medium and high. Because CANRISK scores may be applied in various public health and primary care settings, the scores have been calculated for two different formats: (1) a detailed “electronic” format (eCANRISK) suitable for programmed risk calculators (e.g. iPad App, online web calculator) and (2) a “paper-based” format (pCANRISK) based on simple arithmetic and rounded coefficients (such as FINDRISC). For the detailed electronic version, we calculated eCANRISK scores by summing the relevant beta coefficients from the logistic equation in Table 3 for applicable variables. For example, a 58-year-old White man with no other risk factors save for his mother having diabetes would have an eCANRISK score calculated as: −3.84 (intercept) + 1.20 (aged 55–64 years) + 0.52 (male) + 0.18 (multiplied by 1, since only one category of first-degree relative was affected with diabetes) + 0.00 (normal BMI, waist, etc.) = −1.94.

For the pCANRISK score, we followed the approach used by Sullivan et al.²⁹ The score was calculated based on a rescaled, rounded version of the detailed beta coefficients that make up the eCANRISK score. The basic eCANRISK values were rescaled using the formula beta∕0.09393 to total a maximum of 81 points for women and 86 points for men. Rescaling to a larger number was intended to minimize the effect of rounding error on the paper-based scores. Using the same example of a 58-year-old White man with no other risk factors except for his mother having diabetes the pCANRISK score would be calculated as: 13 (aged 55–64 years) + 6 (male) + 2 (multiplied by 1, since only one category of first-degree relative was affected with diabetes) = 21. This is low compared with the median paper-based pCANRISK score (28) for the entire study population. (See Appendix A for a detailed explanation of how electronic and paper-based CANRISK scores may be used to estimate the probability of dysglycemia.)

Figure 2 conveys the complex risk factor relationships underlying the CANRISK score and illustrates the strong positive relationship between CANRISK score and true dysglycemia outcome, where dysglycemia prevalence in the highest CANRISK decile (57%) is 25 times higher than in the lowest decile (2%).

Figure 2 - Dysglycemia by CANRISK decile

Figure 2, Text Equivalent

Assessing CANRISK’s overall performance: validating the model

We created CANRISK scores using the “test” training data, which were then applied using ROC analysis against the evaluation “set” dataset in order to validate the CANRISK logistic model against reference standard blood tests (FPG and 2-hour glucose challenge). This ROC analysis evaluated how well CANRISK is able to predict true dysglycemia (i.e. discrimination of true-positive and negative cases).

As shown in Table 4, the discriminating power of each CANRISK model across the full range of possible risk score cut-offs is indicated by the AUC summary statistic. (This is also illustrated graphically by the ROC curve in Figure 3.) Based on the 30% validation “set” data, the AUC for eCANRISK and pCANRISK were both 0.75.

**Table 4. AUC results for ROC curve analyses**
Model	Validation "set" data (n = 1676)
Model	AUC	95% CI
Abbreviations: AUC, area under the curve; CANRISK, Canadian Diabetes Risk Assessment Questionnaire; CI, confidence interval; FINDRISC, Finnish Diabetes Risk Score; ROC, receiver operating characteristic.
Electronic score (eCANRISK)	0.75	0.73–0.78
Paper-based score (pCANRISK)	0.75	0.73–0.78
FINDRISC Variables	0.73	0.70–0.76
Obesity model	0.69	0.66–0.72
FINDRISC score	0.66	0.63–0.69

Figure 3 - ROC curves

Figure 3, Text Equivalent

Comparing CANRISK and FINDRISC scores

As shown in Table 4 and Figure 3, the ROC results compare the performance of various models in terms of their ability to accurately detect true dysglycemia. AUC results indicate that both the pCANRISK (0.75) and eCANRISK scores (0.75) are significantly more accurate than the FINDRISC score (0.66) and the simple Obesity model (0.69) to greater than 95% confidence level. CANRISK appears to be slightly more accurate than the FINDRISC Variables model though their confidence intervals overlap.

Finally, we established the diagnostic validity of pCANRISK as a potential screening test using selected scoring thresholds for detecting dysglycemia in the validation dataset (Table 5). These selected threshold scores include three pCANRISK scores corresponding to FINDRISC cut-off scores in use in Finland, as well as a balanced score. This “optimal score”30 attempts to balance the sensitivity and specificity of the test where the point on the ROC curve is closest to the (0, 1)-point denoting perfect discrimination. It assumes that false positives are equally important as false negatives. The balanced cut-off for pCANRISK is 32.

**Table 5. Predictive accuracy of CANRISK model at various scoring thresholds**
pCANRISK score	Threshold score	Sensitivity (detecting true dysglycemia)	Specificity	False-positive rate (1−specificity)	PPV	NPV	Percent of total CANRISK participants with scores below threshold score (screened out), %
Abbreviations: CANRISK, Canadian Diabetes Risk Assessment Questionnaire; FINDRISC, Finnish Diabetes Risk Score; NPV, negative predictive value; pCANRISK, paper-based CANRISK; PPV, positive predictive value.
21	Slightly elevated	0.95	0.28	0.72	0.25	0.96	25
29	Moderate	0.80	0.55	0.45	0.31	0.92	50
32	Balanced	0.70	0.67	0.33	0.35	0.90	61
33	High	0.66	0.70	0.30	0.36	0.89	64
43	Very high	0.30	0.94	0.06	0.55	0.84	89

Table 5 shows the performance of pCANRISK at these five selected screening thresholds. (Note that these are arbitrary and do not necessarily indicate desirable screening thresholds). For a relatively low score equating with FINDRISC’s “slightly elevated” threshold, a pCANRISK score of 21 or higher would have sensitivity of 95% and specificity of 28% (72% false-positive rate). The positive predictive values (PPV) and negative predictive values (NPV) for this threshold would be 25% and 96% respectively. At the other extreme, restricting screening to those with a score of 43 or higher (i.e. FINDRISC’s “very high-risk” threshold) would markedly increase specificity and the proportion of CANRISK participants who would be screened out (for whom follow-up testing or intensive educational intervention would not be recommended), but would substantially decrease sensitivity and NPV. At the balanced cut-off score of 32, the sensitivity would be 70%, specificity 67%, PPV 35%, and NPV 90%.

Figure 4 illustrates the relationship between CANRISK and FINDRISC scores. For slightly elevated, moderate-risk, high-risk and very high-risk categories, the comparable (median) paper-based CANRISK cut-offs are 21, 29, 33 and 43 respectively. These correspond to FINDRISC scores of 7, 12, 15 and 21 respectively. For each FINDRISC category, Figure 4 shows the corresponding mean and 95% confidence interval for pCANRISK scores within the entire FINDRISC category (i.e. not the cut-off score itself). As expected, the CANRISK scores increase monotonically across the FINDRISC categories. This is useful for relating information about future diabetes incidence from the Finnish Diabetes Prevention Study5 to the CANRISK scores. According to FINDRISC,³¹ more than 1 in 3 high-risk cases would likely develop diabetes over the next 10 years, as compared with 1 in 6 for those with moderate-risk scores and 1 in 25 for slightly elevated-risk scores.

Figure 4 - pCANRISK score by FINDRISC category

Figure 4, Text Equivalent

Discussion

Model building

The CANRISK model includes terms for age, BMI, waist circumference, physical activity, fruit/vegetable consumption, history of high blood pressure, history of high blood glucose, family history of diabetes, sex, ethnicity, maternal history of macrosomia, and education. Four of these terms (sex, ethnicity, macrosomia and education) were not part of the original FINDRISC scoring metric. As anticipated, ethnicity was strongly predictive of dysglycemia. The OR associated with Aboriginal ethnicity was lower than for some other non-White ethnic groups, as some of this group’s excess risk has been partially captured in other predictors such as BMI, waist circumference and educational attainment.

Regarding predictive validity, the AUC for eCANRISK and pCANRISK were both 0.75, indicating that both electronic and paper-based CANRISK scores provide good discrimination³⁰ (i.e. an ability to distinguish true-positive and negative cases based on reference standard blood test results. This means that the predictive validity of both CANRISK scores is confirmed and that these scores provide “good” discrimination in this multi-ethnic study population. In other words, the AUC results indicate that these prognostic models can effectively distinguish low-risk from high-risk cases. An AUC of 1 would indicate perfect discrimination (100% accuracy), and an AUC of 0.5 would indicate discrimination no better than chance. (A recent review of prognostic models for predicting mortality³² found a median AUC of 0.77 among a total of 94 eligible studies.) The Brier score²⁴ for the model was 0.002, which also indicated good predictive accuracy.

These results also indicate that CANRISK is more accurate than the FINDRISC Score model and the simple Obesity model for detecting dysglycemia in this multi-ethnic Canadian population.

However, a statistically validated model need not be clinically valid,²³ and more research is necessary to establish the clinical utility of the model.

Screening thresholds

The aim of CANRISK was to develop a simple risk calculator that could be used both in the primary care setting and by individuals themselves. It is first necessary to select CANRISK scores as thresholds. The choice of threshold score will determine the accuracy of CANRISK at that particular cut-off. A lower cut-off score would tend to increase sensitivity but would also increase the number of false positives being referred for follow-up diagnostic testing. The choice of cut-point will also depend on the amount of available resources for subsequent diagnostic testing.

The choice of specific cut-off has both potential clinical and economic implications; in a clinical setting, the choice would affect the triaged portion referred for follow-up (i.e. diagnostic testing or lifestyle education). For instance, with a paper CANRISK score of 29 as a moderate cut-off, only 50% of CANRISK-assessed cases (i.e. scores 29+) would be referred for follow-up. The remaining 50% of screened-out cases might still receive diagnostic testing on an individual basis at a later date if their family doctor were to order further testing based on symptoms or other clinical indications. Note that these screened-out percentages would likely differ for the eventual target population because the age and ethnic distributions of the overall population would likely differ from those of the core CANRISK sample.

For the purpose of validation, the outcome for the prognostic model was based on the collective results of participants’ blood tests (FPG and 2-hour 75g OGTT value). Dysglycemia detection rates based on the FPG alone would have significantly underestimated prevalent dysglycemia: 59% of prediabetes and 52% of diabetes would have been missed without the 2-hour glucose challenge component of the OGTT. The CANRISK prognostic model therefore presumes that those referred by the risk assessment will receive a diagnostic assessment involving the OGTT. However, a recent Ontario study³³ noted that the reference standard OGTT test is underutilized in practice, being used in less than 1% of all diabetes screening tests among asymptomatic adults.

This same study³³ also found that a significant amount of opportunistic screening effort is already being expended each year to detect diabetes among asymptomatic Canadian adults. Over 63% of adults without diabetes had received a diabetes screening blood test within the previous 3 years. The large majority of this ad hoc screening involves FPG and increasingly HbA_1c. An organized triaged approach to screening involving CANRISK for initial risk assessment may help increase the cost-effectiveness of detection efforts.

We intend to confirm current CANRISK scores by following up the CANRISK cohort in order to assess cumulative diabetes incidence among various ethnic groups and risk categories. For now, the specific variables underlying the current dysglycemia-based CANRISK score aim to broaden the risk assessment discussion with screened participants by quantifying the risks posed by ethnicity, obesity, sex, family history of diabetes, macrosomia and other socio-economic factors.

Study limitations

Item-missing data was an issue for several variables, particularly for family history of diabetes. In the CANRISK model, it has been assumed that persons who either did not know or who provided no response for history of diabetes for their mother or a sibling were equivalent to “no.” This assumption requires further confirmation through additional data collection and analysis. Other potential sources of response bias may exist due to the self-reported nature of predictor variables (e.g. height and weight). A further limitation of the study was that individual study centres used different eligibility criteria regarding those with previously diagnosed prediabetes (all centres excluded those known to have diabetes). Similarly, during the second phase of their recruitment, one study site (PEI) excluded any persons with prediabetes who were being prescribed the drug metformin. (Most Canadian family physicians do not prescribe metformin for patients with prediabetes but use lifestyle treatment instead.³⁴)

Participants in this CANRISK study were recruited as volunteers, not as part of a randomly selected population-based sample. The resulting convenience sample of CANRISK participants does not reflect the proportions of the Canadian population at large. However, obtaining a representative sample was not the primary objective of the study. Rather, the study group was recruited in order to provide sufficient numbers from various major ethnic groups so as to provide adequate statistical power for analyzing ethnicity as a risk factor. As such, the convenience sample developed for this study represents the intended target groups. However, the fact that the sample is not representative of the Canadian population means that the overall performance of the model and the importance of ethnicity (and perhaps some other risk factors) in the general Canadian population may have been over-estimated.

Future research

Further work would be necessary to determine the acceptability of CANRISK in a clinical setting. For CANRISK to be applied in a clinical context, practical clinical decision rules based on specific cut-off scores will need to be determined by evaluating prospective economic trade-offs between likely resulting costs and health benefits. These decision rules would need to strike a balance between clinical priorities towards maximizing prevention and other practical operational constraints (e.g. testing capacity of local laboratories) concerning the cost of various diabetes screening scenarios. The actual cost of diabetes risk assessment with CANRISK will depend on local circumstances affecting economies of scale in implementation (i.e. scoring thresholds for specific follow-up and testing) and the mode of delivery. A further consideration needs to be the non-monetary costs of false positives (worry) and false negatives (false reassurance).
One potential use of CANRISK is in a non-clinical setting by individuals. The utility of CANRISK as an educational tool in this context needs to be investigated. Further research is also required to evaluate practical implementation issues in various settings. The model could be extended to address other specific ethnic groups, such as Latin Americans (i.e. non-White Hispanics), which would help to broaden the applicability of CANRISK to other North American jurisdictions. Current variables describing diet and physical activity could also be enhanced through further data collection and validation studies. The transportability of the CANRISK score to other geographic areas and to the Canadian population as a whole will help to further establish the external validity of this new prognostic model.

Successful implementation of the CANRISK scoring tool will depend not only on the successful uptake of the risk-scoring questionnaire itself but also on the creation of lifestyle intervention programs for those persons assessed at moderate or high risk of dysglycemia. Current evidence suggests that effective lifestyle change requires a “critical dose of prevention” involving 5 or 6 hours of facilitated discussion over the course of 8 to 12 months.^5,6 Based on current economic studies, diabetes prevention strategies involving group lifestyle interventions targeted to persons with prediabetes are cost-effective^35-37 and may even generate long-term cost savings for the health care system.

Conclusion

This study has demonstrated that CANRISK is a statistically valid tool that may prove to be suitable for assessing diabetes risk in Canada’s multi-ethnic population. The addition of ethnicity to the basic FINDRISC scoring model improves the ability to distinguish diabetes and prediabetes for early detection and intervention in a Canadian context. Because this new risk assessment tool is both inexpensive and evidence-based, CANRISK may help to enhance the efficiency and effectiveness of targeted diabetes prevention among those at moderate or high risk of developing type 2 diabetes.

Acknowledgements

The authors wish to acknowledge the contributions of several organizations, without whose assistance this research would have been impossible. These contributions reflect data-sharing agreements and research ethics approvals with the following provincial organizations: Health PEI (Charlottetown, Summerside and O’Leary sites), the Diabetes Care Program of Nova Scotia (Kentville and Antigonish sites), New Brunswick Health and Wellness (Fredericton and Lameque sites), Ontario Health and Wellness (the Mississauga site at Credit Valley Hospital), Manitoba Health and Wellness (Brandon and Winnipeg sites), the Saskatoon Regional Health Authority, and the Vancouver Coastal Health Authority. We also wish to acknowledge the valuable comments provided by researchers Markku Peltonen and Jaakko Tuomilehto from Finland’s national public health agency (THL) who reviewed our preliminary results. We also wish to thank the editors at Chronic Diseases and Injuries in Canada who provided useful comments in revising the article.

Appendix A: Estimating the probability of current dysglycemia based on CANRISK scores

The probability of current dysglycemia can be estimated for an individual based on either of the following two formulae, depending on whether the score is based on eCANRISK or pCANRISK:

(1) Using electronic scores (eCANRISK):

P_x = 1 ÷ [1 + e^−(z)]

where z = α₀ + β₁ X₁ + β₂ X₂ …+ β_n X_n , such that α₀= −3.842 for the intercept term for the logistic regression model, and β_i are the beta coefficients (eCANRISK scores) for each of the respective X_i predictors, from i = 1 to n. Based on the characteristics of the individual mentioned in the main text of the paper (a 58-year-old White man with no other risk factors other than his mother having diabetes), z = −1.929, yielding an absolute risk of 0.13.

(2) Using paper-based scores (pCANRISK):

P_x = 1 ÷ [1 + e^-(m)]

where m = α₀ + σ (P₁ X₁ + P₂ X₂ …+ P_n X_n), such that α₀ = −3.842 for the intercept term, and P_i are the paper-based scores (pCANRISK) for each of the respective X_i predictors, and σ = 0.09393 (i.e. the rescaling factor for converting betas into paper scores). In our example, m = −1.869, yielding an absolute probability of 0.13.

References

Hypertension in Diabetes Study (HDS): I. Prevalence of hypertension in newly presenting type 2 diabetic patients and the association with risk factors for cardiovascular and diabetic complications. J Hypertens. 1993;11(3):309-17.
Harris MI, Klein R, Welborn TA, Knuiman MW. Onset of NIDDM occurs at least 4-7 years before clinical diagnosis. Diabetes Care. 1992; 15(7):815-9.
Unwin N, Shaw J, Zimmet P, Alberti KG. Impaired glucose tolerance and impaired fasting glycaemia: the current status on definition and intervention. Diabet Med. 2002;19:708-23.
de Vegt F, Dekker JM, Jager A, Hienkens E, Kostense PJ, Stehouwer CD, et al; Finnish Diabetes Prevention Study Group. Relation of impaired fasting and postload glucose with incident type 2 diabetes in a Dutch population: the Hoorn Study. JAMA. 2001;285(16):2109-13.
Tuomilehto J, Lindstrom J, Eriksson JG, Valle TT, Hamalainen H, Ilanne-Parikka P, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med. 2001;344:1343-50.
Knowler WC, Barrett-Connor E, Fowler SE, Haman RF, Lachin JM, Walker EA, et al; The Diabetes Prevention Program Research Group. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 2002;346:393-403.
Buijsse B, Simmons RK, Griffin SJ, Schulze MB. Risk assessment tools for identifying individuals at risk of developing type 2 diabetes. Epidemiol Rev. 2011;33:46-62.
Griffin SJ, Little PS, Hales CN, Kinmonth AL, Wareham NJ. Diabetes risk score: towards earlier detection of type 2 diabetes in general practice. Diabetes Metab Res Rev. 2000 May-Jun;16(3):164-71.
He G, Sentell T, Schillinger D. A new public health tool for risk assessment of abnormal glucose levels. Prev Chronic Dis. 2010;7(2):1-9.
Heikes KE, Eddy DM, Arondekar B, Schlessinger L. Diabetes Risk Calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care. 2008;31:1040-5.
Koopman RJ, Mainous AG, Everett CJ, Carter RE. Tool to assess likelihood of fasting glucose impairment (TAG-IT). Ann Fam Med. 2008;6(6):555-61.
Nelson KM, Boyko EJ; Third National Health and Nutrition Examination Survey. Predicting impaired glucose tolerance using common clinical information: data from the third National Health and Nutrition Examination Survey. Diabetes Care. 2003;26(7):2058-62.
Park PJ, Griffin SJ, Sargeant L, Wareham NJ. The performance of a risk score in predicting undiagnosed hyperglycemia. Diabetes Care. 2002;25:984-8.
Schmidt MI, Duncan BB, Vigo A, Pankow J, Ballantyne CM, Couper D, et al. Detection of undiagnosed diabetes and other hyperglycemia states: the Atherosclerosis Risk in Communities Study. Diabetes Care. 2003;26(5):1338-43.
Lindstrom J, Tuomilehto J. The Diabetes Risk Score: a practical tool to predict type 2 diabetes risk. Diabetes Care. 2003;26(3):725-31.
Saaristo T, Moilanen L, Jokelainen J, Korpi-Hyovalti E, Vanhala M, Saltevo J, et al. Cardiometabolic profile of people screened for being at high risk of type 2 diabetes in a national diabetes prevention program (FIN-D2D). Prim Care Diabetes. 2010;4:231-9.
Saaristo T, Moilanen L, Korpi-Hyovalti E, Vanhala M, Saltevo J, Niskanen L, et al. Lifestyle intervention for prevention of type 2 diabetes in primary health care: one-year follow-up of the Finnish national diabetes prevention program (FIN-D2D). Diab Care 2010;33(10):2146-51.
Kaczorowski J, Robinson C, Nerenberg K. Development of the CANRISK questionnaire to screen for prediabetes and undiagnosed type 2 diabetes. Can J Diabetes. 2009; 33(4):381-5.
Papineau D, Fong M. Piloting the CANRISK tool in Vancouver Coastal Health. Chronic Dis Inj Can. 2011;32(1):14-20.
Talbot P, Dunbar M. Nova Scotia Prediabetes Project: upstream screening and community intervention for prediabetes and undiagnosed type 2 diabetes. Chronic Dis Inj Can. 2011;32(1):5-13.
World Health Organization; International Diabetes Federation. Definition and diagnosis of diabetes mellitus and intermediate hyperglycemia: report of a WHO/IDF consultation. Geneva (CH): World Health Organization; 2006.
Alberti K, Zimmet P; for a WHO Consultation. Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: Diagnosis and classification of diabetes mellitus. Geneva (CH): World Health Organization; 1999.
Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19:453-73.
Arkes HR, Dawson NV, Speroff T, Harrell FE Jr, Alzola C, Phillips R, et al. The covariance decomposition of the probability score and its use in evaluating prognostic estimates. Med Decis Making. 1995;15:120-31.
SPSS version 15.0 for Windows [statistical software]. Armonk (NY): SPSS Statistics; 2006.
Dreyer G, Hull S, Aitken Z, Chesser A, Yaqoob MM. The effect of ethnicity on the prevalence of diabetes and associated chronic kidney disease. QJM. 2009;102:261-9.
Cowie CC, Harris MI, Silverman RE, Johnson EW, Rust KF. Effect of multiple risk factors on differences between blacks and whites in the prevalence of non-insulin-dependent diabetes mellitus in the United States. Am J Epidemiol. 1993;137:719-32.
Brancati FL, Kao WH, Folsom AR, Watson RL, Szklo M. Incident type 2 diabetes mellitus in African American and white adults: the Atherosclerosis Risk in Communities Study. JAMA. 2000;283:2253-9.
Sullivan LM, Massaro JM, D'Agostino RB Sr. Presentation of multivariate data for clinical use: The Framingham Study risk score functions. Stat Med. 2004;23:1631-60.
Akobeng AK. Understanding diagnostic tests 3: receiver operating characteristics curves. Acta Pediatr. 2007;96:644-7.
Type 2 diabetes risk assessment form [Internet]. [place unknown]: Finnish Diabetes Association; [cited 2011 Sep 9]. Available from: http://www.diabetes.fi/files/1100/Type2diabetesRiskTest_.jpg
Siontis GC, Tzoulaki I, Ioannidis JP. Predicting death: an empirical evaluation of predictive tools for mortality. Arch Int Med. 2011 July 25. doi:10.1001/archinternmed.2011.334.
Wilson SE, Lipscombe LL, Rosella LC, Manuel DG. Trends in laboratory testing for diabetes in Ontario, Canada 1995-2005: a population-based study. BMC Health Serv Res. 2009;9:41.
Lily M, Godwin M. Treating prediabetes with metformin: systematic review and meta-analysis. Can Fam Physician. 2009;55:363-9.
Zhang P, Engelgau M, Norris SL, Gregg EW, Narayan KM. Application of economic analysis to diabetes and diabetes care. Ann Intern Med. 2004;140:972-7.
Williamson DF, Vinicor F, Bowman BA; Centers for Disease Control and Prevention Primary Prevention Working Group. Primary prevention of type 2 diabetes mellitus by lifestyle intervention: implications for health policy. Ann Intern Med. 2004;140:951-7.
Hoerger TJ, Hicks KA, Sorensen SW, Herman WH, Ratner RE, Ackermann RT, et al. Cost-effectiveness of screening for pre-diabetes among overweight and obese U.S. adults. Diabetes Care. 2007;30:2874-9.

[Previous] [Table of Contents] [Next page]