Predicting malaria outbreaks in Somaliland using an XGBoost machine learning frameworkShow others and affiliations
2026 (English)In: Discover public health, ISSN 3005-0774, Vol. 23, no 1, article id 718Article in journal (Refereed) Published
Abstract [en]
Background: Malaria remains a significant public health challenge in Somaliland. This study evaluates a preliminary machine learning approach—rather than a full operational system—to predict malaria outbreak years in a data-scarce environment using a limited historical dataset (2002–2021). Methods: A retrospective study was conducted using annual data. An Extreme Gradient Boosting (XGBoost) model performed binary classification of malaria incidence into ‘Outbreak’ and ‘Non-Outbreak’ years. To address the methodological constraints of the small sample size (N = 20) and mitigate the risk of overfitting, a Leave-One-Year-Out Cross-Validation (LOYOCV) strategy was employed, and results were compared against a Logistic Regression baseline. Predictor variables included temperature, rainfall, 1-year lagged rainfall, urbanization, and land-use patterns. Results: The XGBoost model achieved an AUC of 0,880, significantly outperforming the baseline (AUC 0,710). At the optimal threshold, the model yielded a sensitivity of 0,750 and a precision of 0,600. However, the discrete staircase appearance of the resulting ROC curve reflects the model’s high sensitivity to individual data points within the small sample, indicating that these performance metrics should be interpreted with caution. Conclusion: While promising, these results are preliminary. The small sample size and the temporal clustering of outbreaks in the early 2000s suggest that this work serves as a proof-of-concept for data-scarce regions rather than a definitive surveillance tool. Further prospective validation with higher-resolution temporal data is required to ensure the reliability and generalizability of these associations for operational early warning. © The Author(s) 2026.
Place, publisher, year, edition, pages
BioMed Central Ltd , 2026. Vol. 23, no 1, article id 718
Keywords [en]
Data-scarce modeling, Machine learning, Malaria, Outbreak prediction, Preliminary approach, Somaliland, XGBoost
National Category
Public Health, Global Health and Social Medicine
Identifiers
URN: urn:nbn:se:du-53764DOI: 10.1186/s12982-026-02070-2ISI: 001768011300001Scopus ID: 2-s2.0-105039112375OAI: oai:DiVA.org:du-53764DiVA, id: diva2:2064134
2026-06-012026-06-012026-06-01Bibliographically approved