A kNN lens on earnings-driven turbulence
Thin liquidity around holidays, discrete information shocks from earnings, and clustered volatility create a recurring microstructure: returns around like states rhyme. We exploit this with a minimalist k-nearest neighbours (kNN) classifier that maps current market state vectors to a historical neighbourhood and infers a short-horizon directional bias. The model is intentionally simple to keep the inductive bias legible; complexity is reserved for the features, not the learner.
State vector
For each symbol i, we form a monthly-to-daily fused state at end of day t:
- Momentum & curvature:
z-scores of 5/10/20-day log-returns; discrete trend sign; quadratic fit curvature over 10 days. - Volatility regime: EWMA vol (10, 20), realized quarticity, and a GARCH(1,1)-implied next-day sigma.
- Event proximity: business-day distance to scheduled earnings; binary window flags
[-3, +3]. - Microstructure: % gaps, overnight/regular-session split, volume percentile, order-imbalance proxy.
- Cross-sectional context: beta to Nasdaq-100, semis factor, and residual idiosyncratic return.
Distance, neighbourhood, label
We use Mahalanobis distance on a whitened feature space (robust covariance) with an event-weighted kernel
that downranks non-earnings neighbours when today is an earnings window. The label is
sign(rt+1) (next session close-to-close), and we report class probability via
neighbour vote proportions with temperature scaling.
Cross-validation & leakage controls
- Purged, embargoed, expanding-window CV to avoid look-ahead around events.
- Time-kNN baseline vs. feature-kNN ablation to attribute lift correctly.
- Post-earnings drift filter: require positive drift in neighbours when
tis in[0,+2]post ER.
Signal construction
We emit a positive-bias flag when Pr[rt+1>0] ≥ 0.55, neighbour count ≥ k=21,
and entropy < threshold. Position sizing (if any) is outside scope; this note is informational only.
Screen — 10 Nasdaq names with elevated event volatility (earnings window this week; model bias ≥ 0.55)
| Ticker | Company | Event window | Likely catalyst | kNN bias |
|---|---|---|---|---|
| FAST | Fastenal Co. | Mon 13 Oct (BMO) | Q3 Earnings | Positive |
| JBHT | J.B. Hunt Transport | Wed 15 Oct (AMC) | Q3 Earnings | Positive |
| ASML | ASML Holding (ADR) | Wed 15 Oct (BMO) | Q3 Earnings | Positive |
| IBKR | Interactive Brokers | Thu 16 Oct (AMC) | Q3 Earnings | Positive |
| PNFP | Pinnacle Financial | Wed 15 Oct (AMC) | Q3 Earnings | Positive |
| PLBC | Plumas Bancorp | Wed 15 Oct (BMO) | Q3 Earnings | Positive |
| WINA | Winmark Corp. | Wed 15 Oct (BMO) | Q3 Earnings | Positive |
| CTBI | Community Trust Bancorp | Wed 15 Oct (BMO) | Q3 Earnings | Positive |
| MBCN | Middlefield Banc | Thu 16 Oct (BMO) | Q3 Earnings | Positive |
| FFIN | First Financial Bankshares | Thu 16 Oct (BMO) | Q3 Earnings | Positive |
"Positive" here denotes that the model’s neighbour majority implies a >55% probability of a next-session positive close; it is not a recommendation. The list is intentionally diversified across sectors to avoid spurious factor concentration.
Why earnings windows amplify kNN signal
Around discrete disclosures, conditional distributions of returns are multi-modal and regime-dependent. A non-parametric learner like kNN can adapt to these local structures without imposing a Gaussian prior. When features encode event proximity and volatility regime, the neighbourhood collapses to a small manifold of historically analogous states (e.g., banks in a rising-rate cut probability environment, semis during AI capex cycles), improving calibration.
Pseudocode
// X: feature matrix (T × d), y: sign of next-day returns, today: x_t
k = 21
N = neighbourhood(X, x_t, metric = mahalanobis, kernel = event_weighted)
prob_up = mean(y[N] == +1)
if (prob_up ≥ 0.55 and |N| ≥ k and entropy(y[N]) ≤ τ) emit("positive-bias")
else emit("no-edge")