Markets as Language: Why k-Nearest Neighbours Works for Stock Picking
This paper begins with the doctoral history and mathematics of nearest-neighbour methods, then demonstrates—within a disciplined, production-grade framework—why a two-indicator k-Nearest Neighbours (kNN) model can extract signal from equity markets. We connect geometric growth theory, information and distance metrics, capacity and microstructure, and walk-forward governance into a single architecture deployable by a boutique, “obscure” hedge fund.
- Nearest Neighbour: History, Geometry, and Consistency
- Metrics, Topology, and the Choice of Distance
- The Curse of Dimensionality and Why Two Features Can Be Enough
- From IID to Markets: Time Series, Regimes, and Stationarity
- The Two-Indicator kNN Model for Stock Picking
- Feature Engineering: Momentum, Volatility, WPR, and COG
- Labeling the Target: Direction, Horizons, and Class Imbalance
- Validation: Repainting, Leakage, Walk-Forward, and Deflated Sharpe
- From Signals to Portfolio: Fractional Kelly and Drawdown Control
- Execution, Capacity, and Impact
- Robustness: Adversarial Markets, Stress, and Outages
- Appendix: FAQ and SEO Notes for Boutique/Obscure Hedge Funds
1) Nearest Neighbour: History, Geometry, and Consistency
Nearest-neighbour rules emerged from early pattern recognition and non-parametric statistics. Conceptually simple, the method embeds observations in a metric space and assigns the label of the closest training examples to an unlabeled point. In the 1-NN rule, the predicted label equals that of the single closest observation; in k-NN, the label is determined by the majority (classification) or average (regression) of the k nearest points.
Foundational results show why such a simple rule can be powerful. Under mild regularity, as the sample size \(n\to\infty\), and with \(k\to\infty\) but \(k/n\to 0\), the k-NN classifier is consistent, converging in risk to the Bayes optimal classifier (the minimum possible classification error). This explains the method’s resilience: it estimates the decision boundary implicitly via local neighborhoods and thus adapts to nonlinear manifolds without explicit parametric form.
Interpretation for finance: if returns conditional on certain indicator states are locally smooth, then with sufficient history, k-NN will recover the optimal up/down boundary for that state—provided we respect time order and regime changes.
2) Metrics, Topology, and the Choice of Distance
All nearest-neighbour methods live and die on the metric. Euclidean distance assumes isotropy; Manhattan (L1) emphasizes coordinate-wise deviations; Mahalanobis rescales by covariance so that distances are measured in units of standard deviation and account for correlation. In practice:
- Euclidean / standardized Euclidean: simple, works when features are normalized and roughly isotropic.
- Mahalanobis: appropriate when features are correlated (common for technical indicators); requires stable covariance estimation.
- Dynamic Time Warping (DTW): for sequence alignment, but overkill for single-bar features; expensive and risky for latency.
- Cosine / correlation distance: for directional similarity when scale is less informative than angle.
Weighted k-NN further generalizes: nearer neighbors carry higher weights \(w_i \propto 1/d_i^\alpha\). This is particularly helpful when noise inflates the radius necessary to collect k points.
3) The Curse of Dimensionality and Why Two Features Can Be Enough
As dimension \(d\) increases, volume grows so fast that points become sparse; nearest neighbors become “far,” and distances concentrate. The consequence is brutal: non-parametric estimators demand exponentially more data with dimension. For trading, this is a hidden gift: strong two-feature models can outperform high-dimensional ones out-of-sample because they avoid variance blow-ups and overfitting. The two-indicator kNN you specified—feature1, feature2—is not naive; it is a principled defense against the curse, provided those features capture regime structure (e.g., momentum × volatility).
4) From IID to Markets: Time Series, Regimes, and Stationarity
Classical kNN proofs assume IID samples. Markets are autocorrelated, heteroskedastic, and non-stationary. The workaround is architectural:
- Limit kNN to short-horizon regime inference (e.g., next-bar or next-day direction) where local stationarity is plausible.
- Use rolling windows to ensure neighbors come from comparable regimes (recent market states).
- Re-normalize features per window to keep comparable scales through volatility regimes.
With those controls, local neighborhoods become meaningful “phrases” in the market’s language.
5) The Two-Indicator kNN Model for Stock Picking
Specification (matching your brief)
- Features:
feature1,feature2(e.g., a momentum oscillator and a volatility/overbought-oversold measure). - Label (direction): +1 if next period’s return > 0, −1 otherwise (defined on bar close to avoid repainting).
- k-NN: classify current point by the majority class among its k nearest historical neighbors in {feature1, feature2} space.
- Assets: Equities, indices, ETFs (also applicable to FX/futures with careful contract roll handling).
Supervised vs. unsupervised? For predicting the next move (a labeled up/down outcome), kNN is a supervised, instance-based learner. The “unsupervised” phrasing is sometimes used informally because kNN has no explicit training step; nevertheless, classification uses labels and is supervised.
5.1 Decision rule
Given current point \(x_0=(f_1, f_2)\) and historical sample \(\{(x_i, y_i)\}_{i=1}^n\) with \(y_i \in \{-1, +1\}\), let \(N_k(x_0)\) be indices of the k nearest \(x_i\) under distance \(d(\cdot,\cdot)\). Predict \( \hat{y}(x_0) = \mathrm{sign}\left(\sum_{i\in N_k(x_0)} w_i y_i\right)\) with weights \(w_i \ge 0\) that decrease with distance (uniform or inverse-distance).
5.2 Why this can work
- Local smoothness: Indicators map state to a manifold where nearby points share similar conditional returns.
- Model-free: No parametric boundary to mis-specify; the neighborhood adapts to nonlinearities.
- Robust to misspecification: If features are informative, the majority label in a small neighborhood approximates Bayes.
- Regime localization: Using recent windows ensures neighbors reflect current market microphysics.
6) Feature Engineering: Momentum, Volatility, WPR, and COG
Two-dimensional design forces discipline. A practical pair that generalizes:
- feature1 — Momentum oscillator (e.g., normalized rate-of-change, short/medium SMA crossover delta, or RSI-derived z-score).
- feature2 — Overbought/oversold & volatility context, e.g., Williams %R (WPR) or a percentile of ATR/true range, or Center of Gravity (COG) oscillator for turning-point structure.
Normalize each feature per rolling window (z-score or robust median/MAD) so the metric is meaningful across regimes. Clip extreme outliers to reduce the influence of crash clusters in nearest-neighbour search.
7) Labeling the Target: Direction, Horizons, and Class Imbalance
Define direction at bar close to avoid repainting:
label_t = +1 if Close[t+H] / Close[t] - 1 ≥ 0 else -1
with horizon \(H\) (e.g., 1 bar/day). For intraday bars, consider microstructure noise; for multi-day horizons, use forward-return thresholds (e.g., require >= +ε or <= −ε) to reduce label ambiguity. If classes are imbalanced, reweight neighbors or set decision threshold on the posterior to trade precision/recall as desired.
8) Validation: Repainting, Leakage, Walk-Forward, and Deflated Sharpe
Repainting occurs when signals use future information or when indicators are recalculated with bars not yet closed. Controls:
- Compute features on fully closed bars; lock feature values at close.
- Label using next bar’s close (trade is executed at t+1 open/close, not t).
- Walk-forward: rolling train window → test on the next block; no peeking across the boundary.
- Latency realism: apply realistic slippage/fees; turn off look-ahead in your backtest engine.
- Multiple-testing: deflate Sharpe for trials; report confidence intervals; prefer stability to headline stats.
TradingView users: use barstate.isconfirmed to ensure features/labels only update after bar close; avoid using replay data to tune k post-hoc.
9) From Signals to Portfolio: Fractional Kelly and Drawdown Control
Converting a directional classifier into returns requires sizing and risk. Let \(p\) be the estimated probability of up-move in the neighborhood and let \(\mu, \sigma^2\) be conditional drift/variance. The Kelly fraction \(f^* \approx \mu/\sigma^2\) (or for Bernoulli with edge \(b\), odds \(o\): \(f^*=b/o\)) is too aggressive under estimation error. We use fractional Kelly \(f=\kappa f^*, 0<\kappa<1\), calibrated to drawdown tolerances and parameter uncertainty.
- Cap single-name exposure; spread across independent signals to increase breadth.
- Use a veto filter (e.g., high-volatility or liquidity stress) to cut risk when the neighborhood is unreliable.
- Translate classifier confidence (vote margin) into position size bands.
10) Execution, Capacity, and Impact
Impact often scales sublinearly (square-root law). Practical implications:
- Throttle participation rate; prefer passive fills in benign regimes.
- Skip signals when spreads widen abruptly; your local neighborhood likely changed topology.
- Monitor crowding; if too many points share the same neighborhood, edges decay.
11) Robustness: Adversarial Markets, Stress, and Outages
Stress the system with historical shocks and synthetic volatility pulses. Ensure the classifier degrades gracefully by shrinking k or abstaining when local density drops below a minimum (no reliable neighbors).
12) Appendix: FAQ and SEO Notes for Boutique/Obscure Hedge Funds
Readers looking for “obscure hedge funds” and “boutique quant” often value technical clarity and operational sobriety. This article deliberately:
- Targets long-tail queries: “kNN stock picking,” “nearest neighbour trading,” “two-indicator strategy,” “walk-forward kNN,” “fractional Kelly.”
- Uses internal links to The Architecture of Compounding and The Mathematics of Discipline for topical authority.
- Includes structured data (Article + FAQPage) and canonical URL for clean indexing.
Implementation sketch (two-feature, rolling kNN)
Inputs: feature1[t], feature2[t], price[t]
Params: k, window = lookback bars, horizon H
For each t ≥ window:
X_train = {(f1[i], f2[i]) for i in [t-window, …, t-1]}
y_train = {sign(price[i+H]/price[i] - 1)} # label on closed bars only
x0 = (feature1[t], feature2[t]) # current closed bar
# distance (standardized Euclidean):
d(i) = sqrt( ((f1[i]-x0.f1)/s1)^2 + ((f2[i]-x0.f2)/s2)^2 )
N = indices of k smallest d(i)
vote = sum( y_train[i] * w_i ), with w_i = 1 / (d(i)+ε)
if vote > 0 then long next bar; if vote < 0 then short/flat by policy
size = κ * f*(conditional) # fractional Kelly or banded sizing
execute with slippage model; update PnL
Production: standardize per window, clip outliers, enforce liquidity filters, and log every decision (features, neighbors, distances, votes, size, fills).
Disclosures: This content is for information only, not investment advice. Past performance is not indicative of future results. Methods described herein are subject to change without notice.