PREDICTING FERTILITY INTENTIONS USING SOCIODEMOGRAPHIC AND REPRODUCTIVE FACTORS IN THE UNITED STATES: A MACHINE LEARNING APPROACH
Author(s)
Chi-Han Cheng, BSc, MS1, Benjamin Langworthy, PhD2, Wendy St. Peter, PharmD, FCCP, FASN, FNKF3;
1College of Pharmacy, University of Minnesota, Social and Administrative Pharmacy, Minneapolis, MN, USA, 2University of Minnesota School of Public Health, Division of Biostatistics & Health Data Science, Minneapolis, MN, USA, 3University of Minnesota, Department of Pharmaceutical Care & Health Systems, Minneapolis, MN, USA
1College of Pharmacy, University of Minnesota, Social and Administrative Pharmacy, Minneapolis, MN, USA, 2University of Minnesota School of Public Health, Division of Biostatistics & Health Data Science, Minneapolis, MN, USA, 3University of Minnesota, Department of Pharmaceutical Care & Health Systems, Minneapolis, MN, USA
OBJECTIVES: Declining fertility rates in developed countries have led to wide-ranging societal consequences, including reduced productivity and increased financial burdens on younger generations. Understanding fertility intention as a multifactorial phenomenon is therefore essential. This study applied machine learning methods to identify key determinants of fertility intention among U.S. adults.
METHODS: Data were obtained from the Panel study of Income Dynamics, a nationally representative U.S. survey. The primary outcome was a binary indicator whether participants reported wanting to have children. A total of 28 predictors were included, encompassing patient characteristics (age, gender, marital history), social determinants of health (health insurance, employment status, educational history), and reproductive factors (partner’s desire for children and contraceptive methods). The dataset was split into a 70% training set and a 30% test set. Five supervised models were developed: logistic regression, LASSO, random forest, XGBoost, and SVM. Model performance was evaluated on the test set using the AUC. Hyperparameters were tuned using 5-fold cross-validation, and variable importance analysis were conducted.
RESULTS: 5,639 US adults were included in the analysis, of whom 1,867 (33.1%) reported wanting to have children. Individuals expressing fertility intentions were younger, had shorter marriage durations, and had fewer prior marriages. Logistic regression achieved an AUC of 0.902, while LASSO demonstrated a comparable AUC of 0.904. The random forest, XGBoost, and SVM models showed slightly lower AUCs of 0.899, 0.890, and 0.888, respectively. The strongest predictors in the LASSO model included partner’s desire for children, vasectomy status, and lack of concern about pregnancy. Additional important predictors across random forest, XGBoost, and SVM models included age, years since highest education, and marital status.
CONCLUSIONS: Predictive performance was strong across all five models, with AUCs above 0.88. These findings may inform population-level reproductive health planning and targeted counseling, while future work incorporating longitudinal data and causal inference could clarify fertility determinants.
METHODS: Data were obtained from the Panel study of Income Dynamics, a nationally representative U.S. survey. The primary outcome was a binary indicator whether participants reported wanting to have children. A total of 28 predictors were included, encompassing patient characteristics (age, gender, marital history), social determinants of health (health insurance, employment status, educational history), and reproductive factors (partner’s desire for children and contraceptive methods). The dataset was split into a 70% training set and a 30% test set. Five supervised models were developed: logistic regression, LASSO, random forest, XGBoost, and SVM. Model performance was evaluated on the test set using the AUC. Hyperparameters were tuned using 5-fold cross-validation, and variable importance analysis were conducted.
RESULTS: 5,639 US adults were included in the analysis, of whom 1,867 (33.1%) reported wanting to have children. Individuals expressing fertility intentions were younger, had shorter marriage durations, and had fewer prior marriages. Logistic regression achieved an AUC of 0.902, while LASSO demonstrated a comparable AUC of 0.904. The random forest, XGBoost, and SVM models showed slightly lower AUCs of 0.899, 0.890, and 0.888, respectively. The strongest predictors in the LASSO model included partner’s desire for children, vasectomy status, and lack of concern about pregnancy. Additional important predictors across random forest, XGBoost, and SVM models included age, years since highest education, and marital status.
CONCLUSIONS: Predictive performance was strong across all five models, with AUCs above 0.88. These findings may inform population-level reproductive health planning and targeted counseling, while future work incorporating longitudinal data and causal inference could clarify fertility determinants.
Conference/Value in Health Info
2026-05, ISPOR 2026, Philadelphia, PA, USA
Value in Health, Volume 29, Issue S6
Code
EPH119
Topic
Epidemiology & Public Health
Topic Subcategory
Public Health
Disease
SDC: Reproductive & Sexual Health