M01 The Churn Model, In the Open

The customers
about to leave.

A churn model, built and run in public. Finds the customers who are about to cancel so the company can save them. Every saved account is revenue already paid to acquire.

Every chart on this page is computed live in your browser from the same JSON files the Python script writes to /assets/data/churn/. No hosted inference, no screenshots, no hand-picked numbers.

Filed

Dataset

Kaggle · Churn Modelling · 10,000 rows

Source

notebooks/churn_model.py ↗

§ I Why a churn model matters

Keeping a customer is cheaper than winning one back. About 1 in 5 bank customers will quietly leave in a given cycle. At roughly $500 of lost lifetime value per departing customer, that's $100,000 of revenue walking out the door for every 1,000 customers on the books, before anyone has a chance to intervene.

A churn model fixes that by scoring every customer on how likely they are to leave next. Retention teams then spend a small outreach budget on the customers most at risk, rather than blanketing everyone, or (more commonly) finding out too late. Tuned against real dollars (see § VI), the model on this page cuts the expected loss from about $100,000 per 1,000 customers down to $28,000, a 72% reduction. At a 10,000-customer book that's roughly $740,000 saved every decision cycle, with no other change to the business.

Everything below is computed live in your browser from the linked Python script. The point here isn't absolute accuracy, it's the shape of a responsible workflow: compare several models, explain the winner, audit it for disparate impact, and tune the threshold against the dollars that actually matter.

§ II Data Profile

§ III Three models, one test set

Class imbalance is handled on every model (class_weight='balanced' for LR and RF, scale_pos_weight for XGB). Winner per column highlighted.

Model	Accuracy	Precision	Recall	F1	ROC AUC	PR AUC

§ IV · M01.1 ROC curves · all three models overlaid

HOVER , for threshold / TPR / FPR

Fig. 01.1 · Higher = better. Diagonal would be a coin flip. Each curve traces the trade-off between catching churners (TPR) and wrongly flagging loyal customers (FPR) as the decision threshold sweeps from 1 to 0.

§ V · M01.2 Score a customer, live

Inputs are standardized with the same means and standard deviations the training scaler fit. The logit, probability, and feature contributions are all computed in your browser with the model's actual coefficients.

P(churn)

—

Top drivers, this customer

pushing toward churn

pushing toward staying

Live scoring uses the logistic regression model for interpretability. XGBoost performs better on aggregate metrics (see § III) but its prediction is a sum over 300 trees, not something a slider explains.

§ VI · M01.3 Threshold tuning against dollars

Real models ship with a decision threshold, not just a score. Move the slider to see how precision, recall, and per-thousand-customer cost shift. The accent dot marks the cost-minimizing threshold.

threshold = 0.50

Predicted

Actual

stayed

churned

stayed

true neg.

false pos.

churned

false neg.

true pos.

—

precision

—

recall

—

net cost / 1k cust.

—

flag rate

§ VII · M01.4 Feature importance · mean |SHAP|, XGBoost

Tree SHAP decomposes each XGBoost prediction into per-feature contributions and averages their absolute values across a 500-row test sample. Read it as: which features move the score the most, regardless of direction.

§ VIII · M01.5 SHAP dependence · top two features

X-axis: the raw feature value for one customer. Y-axis: how much that feature moved their churn score, positive or negative. Non-linearity shows up as curves, something a linear model can't see.

§ IX · M01.6 Fairness audit · by Geography & Gender

Disparate impact is a live concern in regulated banking, the OCC and FDIC examine retention models for fair-lending proxies. This panel shows the same model sliced by protected-attribute-adjacent fields. Even on a public dataset, gaps show up.

§ X Methodology & Colophon

Dataset

Kaggle · shrutimechlearn/churn-modelling ↗, 10,000 synthetic bank customers across France, Germany, Spain.

Pipeline script

notebooks/churn_model.py ↗, drop IDs, label-encode Gender, one-hot Geography, stratified 80/20 split, StandardScaler on numeric features, three models, SHAP on XGB, fairness slice on the best model, threshold sweep 0.05 → 0.95.

Reproducibility

random_state=42 everywhere. Last regenerated —. Clone the repo, run python notebooks/churn_model.py, regenerate all nine JSON files.

Limitations

Synthetic Kaggle dataset, small by production standards. Live scoring uses logistic regression for interpretability, which misses non-linear interactions the tree models catch (see § III for the gap). No retraining schedule or drift detection, both essential in anything that actually runs.

← Back to the portfolio View the script on GitHub ↗