Vol. XII · No. 04 · Apr 2026
Jake Cuth.

A production pipeline,
run in public.

Every chart on this page is computed live in your browser from the same JSON files the Python script writes to /assets/data/churn/. No hosted inference, no screenshots, no hand-picked numbers.


Keeping a customer is cheaper than winning one back. About 1 in 5 bank customers will quietly leave in a given cycle. At roughly $500 of lost lifetime value per departing customer, that's $100,000 of revenue walking out the door for every 1,000 customers on the books — before anyone has a chance to intervene.

A churn model fixes that by scoring every customer on how likely they are to leave next. Retention teams then spend a small outreach budget on the customers most at risk — rather than blanketing everyone, or (more commonly) finding out too late. Tuned against real dollars (see § VI), the model on this page cuts the expected loss from about $100,000 per 1,000 customers down to $28,000 — a 72% reduction. At a 10,000-customer book that's roughly $740,000 saved every decision cycle, with no other change to the business.

Everything below is computed live in your browser from the linked Python script. The production model I built at Hancock Whitney is under NDA and performs better than this public demo — so the point here isn't the absolute accuracy. It's the shape of a responsible workflow: compare several models, explain the winner, audit it for disparate impact, and tune the threshold against the dollars that actually matter.



Class imbalance is handled on every model (class_weight='balanced' for LR and RF, scale_pos_weight for XGB). Winner per column highlighted.

Model Accuracy Precision Recall F1 ROC AUC PR AUC

HOVER — for threshold / TPR / FPR

Fig. G.1 — Higher = better. Diagonal would be a coin flip. Each curve traces the trade-off between catching churners (TPR) and wrongly flagging loyal customers (FPR) as the decision threshold sweeps from 1 to 0.


Inputs are standardized with the same means and standard deviations the training scaler fit. The logit, probability, and feature contributions are all computed in your browser with the model's actual coefficients.

P(churn)
Top drivers, this customer
pushing toward churn
    pushing toward staying

      Live scoring uses the logistic regression model for interpretability. XGBoost performs better on aggregate metrics (see § III) but its prediction is a sum over 300 trees — not something a slider explains.


      Real models ship with a decision threshold, not just a score. Move the slider to see how precision, recall, and per-thousand-customer cost shift. The accent dot marks the cost-minimizing threshold.

      threshold = 0.50
      Predicted
      Actual
      stayed
      churned
      stayed
      0
      true neg.
      0
      false pos.
      churned
      0
      false neg.
      0
      true pos.
      precision
      recall
      net cost / 1k cust.
      flag rate

      Tree SHAP decomposes each XGBoost prediction into per-feature contributions and averages their absolute values across a 500-row test sample. Read it as: which features move the score the most, regardless of direction.


      X-axis: the raw feature value for one customer. Y-axis: how much that feature moved their churn score, positive or negative. Non-linearity shows up as curves — something a linear model can't see.


      Disparate impact is a live concern in regulated banking — the OCC and FDIC examine retention models for fair-lending proxies. This panel shows the same model sliced by protected-attribute-adjacent fields. Even on a public dataset, gaps show up.


      Dataset

      Kaggle · shrutimechlearn/churn-modelling ↗ — 10,000 synthetic bank customers across France, Germany, Spain.

      Pipeline script

      notebooks/churn_model.py ↗ — drop IDs, label-encode Gender, one-hot Geography, stratified 80/20 split, StandardScaler on numeric features, three models, SHAP on XGB, fairness slice on the best model, threshold sweep 0.05 → 0.95.

      Production model

      The Hancock Whitney churn model — its feature library, decision thresholds, and lift numbers — is under NDA. This page mirrors the workflow, not the proprietary model.

      Reproducibility

      random_state=42 everywhere. Last regenerated . Clone the repo, run python notebooks/churn_model.py, regenerate all nine JSON files.