Vol. XII · No. 04 · Apr 2026
Jake Cuth.

Who your customers actually are,
not who you wish they were.

Every chart on this page is computed live in your browser from the JSON files the Python script writes to /assets/data/segmentation/. No hosted inference, no screenshots, no hand-picked numbers.


Mass marketing dilutes both signal and budget. A 1% response rate from a blanket email is a 99% waste — and the 1% who responded were going to buy anyway. Segmentation replaces that with targeted outreach: the right message to the right subset, priced against the revenue each segment actually produces.

The 30+ dynamic segments I run at Hancock Whitney drive a ~30% lift in campaign ROI versus blanket sends. That work is under NDA. This page shows the same method — RFM features, three clustering algorithms compared, each segment profiled against its revenue contribution — run against the public UCI Online Retail II dataset so every number is reproducible from the linked script.

The winning model here is K-Means at k=4. Four segments unpack a real 1M-transaction book into groups with dramatically different economics — as you're about to see, one of them earns roughly 64% of revenue on 21% of customers.



Run on the same log+scaled R/F/M matrix. Higher silhouette and Calinski-Harabasz are better; lower Davies-Bouldin is better. No single metric decides — silhouette is the primary criterion here. The winner row is highlighted.

Method Best k Silhouette Calinski Harabasz Davies Bouldin Inertia


Left: silhouette score vs k — higher is tighter clusters. Accent dot marks the chosen operational k. Right: K-Means inertia elbow — diminishing returns after the operational k.

Fig. H.1 a · Silhouette vs k
Fig. H.1 b · K-Means inertia elbow

Each point is one customer. Axes are the top two principal components of the log-scaled R/F/M matrix. Hover a point for raw values. Click a legend entry to isolate a cluster.

Fig. H.2 · Segment scatter (PCA)
HOVER — for R / F / M on this customer

Inputs are log1p-transformed and standardized with the same scaler the model was trained on, then assigned to the nearest K-Means centroid by Euclidean distance. Move the sliders — assignment updates in real time.

Assigned segment
Customers
Of base
Of revenue

This is the same K-Means model used in Fig. H.2. The inset re-projects through the same PCA so the "you" dot lives in the same space as the scatter above.


One card per cluster. The small bars show where each segment's median customer lands on the overall distribution for Recency, Frequency, and Monetary. The revenue-vs-customers bar tells you whether the segment is pulling its weight.


Pick a segment, set a per-customer campaign cost and an expected lift multiplier. The baseline response rate is the segment's own measured 60-day repurchase rate — not a fabricated number. The ROI math and all intermediates are computed live.

contacted
baseline buyers
size × repurchase rate
with-campaign buyers
baseline × lift
incremental buyers
Incremental revenue
incremental buyers × AOV
Campaign cost
size × cost per customer
Net ROI

The 60-day repurchase rate is measured directly from the data — fraction of each segment's customers who bought in the last 60 days of the snapshot. The lift multiplier is user-adjustable — industry direct-response benchmarks typically land between 1.3× and 2.0× for win-back campaigns. Your mileage, as always, varies.


Dataset

UCI ML Repository · Online Retail II ↗ — two years of UK online retail transactions, roughly 1M rows before cleaning.

Pipeline script

notebooks/segmentation_model.py ↗ — RFM aggregation, log1p + StandardScaler, three clustering methods × k grid, silhouette / Calinski-Harabasz / Davies-Bouldin, PCA(2) for the map, deterministic percentile-band auto-namer.

Production program

The Hancock Whitney segmentation program — its feature library, third-party enrichment sources (Epsilon, Equifax), and activation playbook — is under NDA. This page mirrors the method, not the production segments.

Reproducibility

random_state=42 everywhere. Last regenerated . Running python notebooks/segmentation_model.py twice on the same CSV produces byte-identical JSON.