Who your customers actually are,
not who you wish they were.
Every chart on this page is computed live in your browser from the JSON files the Python script writes to /assets/data/segmentation/. No hosted inference, no screenshots, no hand-picked numbers.
Mass marketing dilutes both signal and budget. A 1% response rate from a blanket email is a 99% waste — and the 1% who responded were going to buy anyway. Segmentation replaces that with targeted outreach: the right message to the right subset, priced against the revenue each segment actually produces.
The 30+ dynamic segments I run at Hancock Whitney drive a ~30% lift in campaign ROI versus blanket sends. That work is under NDA. This page shows the same method — RFM features, three clustering algorithms compared, each segment profiled against its revenue contribution — run against the public UCI Online Retail II dataset so every number is reproducible from the linked script.
The winning model here is K-Means at k=4. Four segments unpack a real 1M-transaction book into groups with dramatically different economics — as you're about to see, one of them earns roughly 64% of revenue on 21% of customers.
Run on the same log+scaled R/F/M matrix. Higher silhouette and Calinski-Harabasz are better; lower Davies-Bouldin is better. No single metric decides — silhouette is the primary criterion here. The winner row is highlighted.
| Method | Best k | Silhouette | Calinski Harabasz | Davies Bouldin | Inertia |
|---|
Left: silhouette score vs k — higher is tighter clusters. Accent dot marks the chosen operational k. Right: K-Means inertia elbow — diminishing returns after the operational k.
Each point is one customer. Axes are the top two principal components of the log-scaled R/F/M matrix. Hover a point for raw values. Click a legend entry to isolate a cluster.
Inputs are log1p-transformed and standardized with the same scaler the model was trained on, then assigned to the nearest K-Means centroid by Euclidean distance. Move the sliders — assignment updates in real time.
- Customers
- —
- Of base
- —
- Of revenue
- —
This is the same K-Means model used in Fig. H.2. The inset re-projects through the same PCA so the "you" dot lives in the same space as the scatter above.
One card per cluster. The small bars show where each segment's median customer lands on the overall distribution for Recency, Frequency, and Monetary. The revenue-vs-customers bar tells you whether the segment is pulling its weight.
Pick a segment, set a per-customer campaign cost and an expected lift multiplier. The baseline response rate is the segment's own measured 60-day repurchase rate — not a fabricated number. The ROI math and all intermediates are computed live.
size × repurchase rate
baseline × lift
The 60-day repurchase rate is measured directly from the data — fraction of each segment's customers who bought in the last 60 days of the snapshot. The lift multiplier is user-adjustable — industry direct-response benchmarks typically land between 1.3× and 2.0× for win-back campaigns. Your mileage, as always, varies.
UCI ML Repository · Online Retail II ↗ — two years of UK online retail transactions, roughly 1M rows before cleaning.
notebooks/segmentation_model.py ↗ — RFM aggregation, log1p + StandardScaler, three clustering methods × k grid, silhouette / Calinski-Harabasz / Davies-Bouldin, PCA(2) for the map, deterministic percentile-band auto-namer.
The Hancock Whitney segmentation program — its feature library, third-party enrichment sources (Epsilon, Equifax), and activation playbook — is under NDA. This page mirrors the method, not the production segments.
random_state=42 everywhere. Last regenerated —. Running python notebooks/segmentation_model.py twice on the same CSV produces byte-identical JSON.