This project is an end-to-end machine learning system for customer churn prediction, designed not only to predict churn but to optimize business decisions under different operational strategies.
Instead of focusing solely on predictive performance, the system emphasizes:
- Probability calibration
- Profit-driven decision thresholds
- Business constraint trade-offs (coverage vs efficiency)
- Customer segmentation
- Model explainability (SHAP)
The final output is not just a model β it is a decision-making framework for customer retention.
The objective is to identify customers likely to churn and design a retention strategy that maximizes profit (ROI).
- π° Cost per retention action: β¬50
- π° Value of retained customer: β¬500
- Telco Customer Churn dataset
- Binary classification:
1β Customer churns0β Customer stays
- ROC-AUC: 0.8376
- Accuracy: 0.78
| Class | Precision | Recall | F1-score |
|---|---|---|---|
| 0 (No churn) | 0.82 | 0.91 | 0.86 |
| 1 (Churn) | 0.63 | 0.45 | 0.52 |
The model performs better at identifying non-churners than churners, which is typical in imbalanced classification problems.
- SHAP output shape:
(1407, 35) - Provides global feature importance
- Ensures interpretability of predictions
This project does NOT use a fixed 0.5 threshold.
Instead, it evaluates multiple operating points based on business value.
- Best threshold: 0.74
- Retained customers: 83
- Value saved: β¬41,500
- Campaign cost: β¬5,100
- ROI: 7.13
- Focuses only on highest-risk customers
- Maximizes efficiency per customer
- Low coverage, high precision strategy
- Best threshold: 0.10
- Retained customers: 345
- Value saved: β¬172,500
- Campaign cost: β¬42,050
- ROI: 3.10
- Captures more potential churners
- Higher total business impact
- Lower efficiency but higher coverage
This project demonstrates a fundamental business trade-off:
| Strategy | Precision | Recall | ROI per Customer | Total Impact |
|---|---|---|---|---|
| High Threshold (0.74) | High | Low | High | Medium |
| Low Threshold (0.10) | Low | High | Medium | High |
π There is no single βbest thresholdβ β only different business strategies
The ROI curve shows how business performance changes depending on the decision threshold.
Key observations:
- Higher thresholds reduce campaign size but increase efficiency
- Lower thresholds increase coverage but reduce ROI efficiency
- Optimal operating point depends on business constraints
Customers are grouped into risk segments based on predicted churn probability:
| Segment | Customers | Churn Rate | Avg Probability |
|---|---|---|---|
| Low | 356 | 1.7% | 0.02 |
| Medium | 354 | 11.8% | 0.11 |
| High | 368 | 32.6% | 0.33 |
| Very High | 329 | 62.6% | 0.65 |
- Risk is strongly separated across segments
- High and Very High segments represent the core revenue opportunity
The lift curve demonstrates how effectively the model prioritizes churners compared to random targeting.
- The model significantly outperforms random selection
- Highest lift is observed in the top-ranked customers
find_best_threshold_profit()β optimizes business profitbusiness_impact()β evaluates campaign ROIplot_roi_curve()β threshold sensitivity analysisplot_lift_curve()β targeting effectivenesssegment_analysis()β customer risk segmentation
- Accuracy is not sufficient for churn problems
- Threshold selection is a business decision, not a modeling decision
- Different thresholds correspond to different business strategies
- Profit optimization provides better decision-making than classification metrics
- Customer segmentation enables targeted retention strategies
- Python
- pandas, numpy
- scikit-learn
- matplotlib, seaborn
- SHAP
This project demonstrates a complete machine learning decision system, where predictions are directly translated into business actions.
Instead of building a static classifier, we build a dynamic decision framework that adapts to business goals such as:
- Maximizing ROI
- Maximizing coverage
- Balancing cost vs value
π The key takeaway:
In real-world data science, the best model is not the most accurate one β but the one that makes the best decisions.