Predicting Customer Churn in a Subscription-Based E-Commerce Platform Using Machine Learning Techniques
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
This study investigates the performance of Logistic Regression, k-Nearest Neighbors (KNN), and Random Forest algorithms in predicting customer churn within an e-commerce platform. The choice of the mentioned algorithms was due to the unique characteristics of the dataset and the unique perception and value provided by each algorithm. Iterative models ‘examinations, encompassing preprocessing techniques, feature engineering, and rigorous evaluations, were conducted. Logistic Regression showcased moderate predictive capabilities but lagged in accurately identifying potential churners due to its assumptions of linearity between log odds and predictors. KNN emerged as the most accurate classifier, achieving superior sensitivity and specificity (98.22% and 96.35%, respectively), outperforming other models. Random Forest, with sensitivity and specificity (91.75% and 95.83% respectively) excelled in specificity but slightly lagged in sensitivity. Feature importance analysis highlighted "Tenure" as the most impactful variable for churn prediction. Preprocessing techniques differed in performance across models, emphasizing the importance of tailored preprocessing. The study's findings underscore the significance of continuous model refinement and optimization in addressing complex business challenges like customer churn. The insights serve as a foundation for businesses to implement targeted retention strategies, mitigating customer attrition, and promote growth in e-commerce platforms.
Place, publisher, year, edition, pages
2024.
Keywords [en]
Customer churn prediction, E-commerce, Machine learning algorithms, Logistic Regression, k-Nearest Neighbors (KNN), Random Forest, Feature engineering, Preprocessing techniques, Model evaluation, performance measures, supervised machine learning, classification, confusion matrix.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:du-48495OAI: oai:DiVA.org:du-48495DiVA, id: diva2:1857189
Subject / course
Microdata Analysis
2024-05-132024-05-13