Top-20 Universe & the April Whipsaw — Diagnostic + Backtest ⛔ top-20 FAILS

Date: 2026-06-12 Analyst: Claude (for Jacques) Engine: /root/backtest_xsmom_PY-00027_100626.py (unchanged) Diagnostics: /root/xsmom_whipsaw_diag_120626.py, /root/xsmom_top20_120626.py Data outputs: /root/research/2026-06-12_april-whipsaw-weekly.csv Builds on: 2026-06-10_cross-sectional-momentum.md (the strategy that PASSED).

1. Question

The passing strategy (regime-gated long/short 21-day momentum, top-100 PIT universe) had one ugly month: April 2026, −20.9%, flagged as open question (a): what caused the April whipsaw, and can a second filter fix it without overfitting? Jacques asked to try a quality filter: trade only the top 20 coins "by market cap."

Data caveat (unchanged from prior reports): there is no market-cap field in any DB. The honest proxy used throughout the project is trailing-30d dollar volume, recomputed weekly point-in-time. "Top 20 by market cap" is therefore implemented as top 20 by dollar volume, PIT.

2. What the April whipsaw actually is

Per-week reconstruction of the headline variant over 2026 (full log in the CSV):

BTC rose +18.5% across April; the BTC regime gate read BULL every week. The filter was not wrong — the market was up. The damage was idiosyncratic to the picks.
The short leg made money in April (+94% summed net). The long leg lost −186%. The entire whipsaw is the long side.
The long leg buys the 3 strongest coins = last week's biggest pumpers, which then revert violently:

Week	BTC	Strategy	Long picks
Mar 25–Apr 01	+3.4%	−3.4%	SIREN −87%, ARIA +49%, BEAT −32%
Apr 08–15	+5.3%	−14.6%	ARIA −87%, NOM −47%, JCT −2%
Apr 15–22	+4.5%	−4.0%	RAVE −90%, LAB +13%, BIO −6%

SIREN went +132% → −87% in consecutive weeks. These same hype microcaps (RIVER, SIREN, ARIA, RAVE) also produced January's +107% and March's +132% wins. So the long leg is a high-variance lottery on a few extremely volatile coins, equal-weighted as if normal. April is the stretch where the lottery lost three weeks running.

Diagnosis: not a regime-filter miss — a position-sizing / pick-quality problem.

3. The top-20 test (Jacques' requested fix)

Same headline params (regime_ls, L21, skip0, N3, btc_ma50), full costs, max-8, PIT universe, only the universe size changed to 20. Tune and OOS reported separately.

Universe	Tune (<2026)	OOS 2026	April
TOP-20	PF 1.26, ret +57%, DD −13.6%, Sharpe 1.37	PF 0.94, ret −7.8%, DD −26.1%, Sharpe −0.32	−25.4%
TOP-100 (baseline)	PF 1.24, ret +62%, DD −31.8%, Sharpe 0.88	PF 1.26, ret +19.8%, DD −20.9%, Sharpe +1.05	−20.9%

Top-20 looks better in tuning (higher Sharpe, lower drawdown) and then fails OOS — the classic look-ahead trap. Its 2026 result (PF 0.94, −7.8%) sits right on top of the earlier top-10 failure (PF 0.99, −8.5%).
April got worse, not better: −25.4% vs −20.9%.
The bombs are still in the top-20: SIREN −87% (Mar 25), RAVE −90% (Apr 15).

Why concentrating doesn't help (the key lesson)

We rank by dollar volume, which explodes exactly when a coin is pumping. SIREN and RAVE trade enormous volume because they are being hyped, so a "top-20 by volume" screen pulls them in at the worst moment, not out. Even with true market-cap data the result would likely be similar — these coins' caps also balloon 5–10× during the pump. Concentration throws away the breadth that gave top-100 its edge while keeping the bombs — worst of both worlds, matching the top-10 report's conclusion.

4. Honest conclusion

Top-20 FAILS out-of-sample (PF 0.94) and worsens the April whipsaw. Do not pursue. The cross-sectional momentum edge lives in breadth (top-100), confirmed a second time.

The whipsaw is a long-leg sizing problem, not a universe problem: equal-weighting coins that can move ±90%/week. The unaddressed, more promising fix remains volatility- scaled position sizing (shrink the bet on wild coins) and/or a "don't buy the parabolic" rule (skip a long whose prior-week return is already in an extreme top percentile) — either targets the actual mechanism instead of the universe.

Status: top-20 = FAIL. Top-100 strategy stands. Next lead: vol-scaled sizing.