Predicting the complex, multilevel ecology of US county-level diabetes prevalence: a cross-sectional, artificial intelligence analysis Journal Article uri icon
Overview
abstract
  • BACKGROUND: Diabetes and its risk factors are embedded in a complex multilevel ecology. Upstream factors (i.e., 'forcing factors' that refer to fundamental population-based social, economic, and political structures shaping health outcomes long before disease develops) and downstream risk factors (i.e., the individual-level characteristics and outcomes that result from upstream forcing factors) are to be considered when predicting diabetes. This study aims to predict diabetes prevalence at the United States' (US) county level using analytical methods that account for the contextual complexity of diabetes. METHODS: US county-level datasets incorporating 27 predictor variables were analysed cross-sectionally. A Light Gradient Boosting Machine (LightGBM) model was trained to predict county-level diabetes prevalence, after which model performance and feature importance were evaluated. FINDINGS: The final model retained 17 features and explained 95% (R(2) = 0.952) of the variance in county-level diabetes prevalence. Physical inactivity, racial and ethnic minority status, frequent physical distress, and obesity showed the highest Shapley (SHAP) importance values, which exceeded all remaining SHAP values. INTERPRETATION: This study delineates the most important predictors of diabetes prevalence based on a complex multilevel ecology. Efforts to reduce diabetes prevalence should address both upstream and downstream risk factors emphasising physical activity, obesity, equity, and cultural considerations.

  • Link to Article
    publication date
  • 2026
  • Research
    keywords
  • Artificial Intelligence
  • Cross-Sectional Studies
  • Diabetes
  • Forecasting
  • Health Promotion
  • Prevention
  • Risk Factors
  • Additional Document Info
    volume
  • 42
  • issue
  • 5