SYNTHETIC SAMPLE GENERATION REPRESENTING THE ENGLISH POPULATION USING SPEARMAN RANK CORRELATION AND CHOMSKY DECOMPOSITION

Author(s)

Martin C, Springate CE
Crystallise, East Tilbury, UK

OBJECTIVES: To generate a synthetic sample of 1 million individuals that reflect the characteristics of the population recorded in the Health Survey for England (HSE).

METHODS: We used data from the HSE to determine the age and gender-dependent distributions of continuous variable risk factors (height, weight, BMI, systolic blood pressure, total and HDL cholesterol and their ratio, number of cigarettes/day and units of alcohol/week) and prevalence of binary risk factors (smoking status, diabetes). Spearman rank correlations including age and gender were determined for these risk factors. A table of normally distributed random numbers was generated. Cholesky decomposition was used to replicate the observed Spearman rank correlations in the table of random numbers. Rank correlations that included binary variables were recalibrated to adjust for numerous tied values. The sample was then generated using a reverse look-up of the gamma distribution value using the random percentiles for continuous variables or setting a binary variable to 1 when the random percentile falls below the prevalence threshold.

RESULTS: Differences between coefficients were no more than 0.5% for any continuous variable. The prevalence of binary factors in the SS was very well matched with the HSE sample. Smoker incidence rates were 18.8% and 16.7% in the SS versus 18.4% and 16.5% in the HSE sample, for males and females respectively. Prevalence of diabetes in the SS was 13.3% and 7.7% versus 13.2% and 7.8%, and for cardiovascular disease was 17.6% and 14.1% versus 18.2% and 14.6%. Comparing 25th, 50th and 75th quantiles, the maximum difference between the original and synthetic values for BMI and TC/HDL ratio were 0.6Kg and 0.3 respectively.

CONCLUSIONS: Our new approach generates large synthetic samples with risk factor distributions very closely matching those of the real HSE population. This sample can be used to model the likely impact of new therapies or predict mortality.

Conference/Value in Health Info

2018-05, ISPOR 2018, Baltimore, MD, USA

Value in Health, Vol. 21, S1 (May 2018)

Code

PRM66

Topic

Methodological & Statistical Research

Topic Subcategory

Modeling and simulation

Disease

Multiple Diseases

Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×