← Back to Blog
ResearchDeep DiveApprox. long-form read

Markets as Language: Why k-Nearest Neighbours Works for Stock Picking

This paper begins with the doctoral history and mathematics of nearest-neighbour methods, then demonstrates—within a disciplined, production-grade framework—why a two-indicator k-Nearest Neighbours (kNN) model can extract signal from equity markets. We connect geometric growth theory, information and distance metrics, capacity and microstructure, and walk-forward governance into a single architecture deployable by a boutique, “obscure” hedge fund.

1) Nearest Neighbour: History, Geometry, and Consistency

Nearest-neighbour rules emerged from early pattern recognition and non-parametric statistics. Conceptually simple, the method embeds observations in a metric space and assigns the label of the closest training examples to an unlabeled point. In the 1-NN rule, the predicted label equals that of the single closest observation; in k-NN, the label is determined by the majority (classification) or average (regression) of the k nearest points.

Foundational results show why such a simple rule can be powerful. Under mild regularity, as the sample size \(n\to\infty\), and with \(k\to\infty\) but \(k/n\to 0\), the k-NN classifier is consistent, converging in risk to the Bayes optimal classifier (the minimum possible classification error). This explains the method’s resilience: it estimates the decision boundary implicitly via local neighborhoods and thus adapts to nonlinear manifolds without explicit parametric form.

Interpretation for finance: if returns conditional on certain indicator states are locally smooth, then with sufficient history, k-NN will recover the optimal up/down boundary for that state—provided we respect time order and regime changes.

2) Metrics, Topology, and the Choice of Distance

All nearest-neighbour methods live and die on the metric. Euclidean distance assumes isotropy; Manhattan (L1) emphasizes coordinate-wise deviations; Mahalanobis rescales by covariance so that distances are measured in units of standard deviation and account for correlation. In practice:

Weighted k-NN further generalizes: nearer neighbors carry higher weights \(w_i \propto 1/d_i^\alpha\). This is particularly helpful when noise inflates the radius necessary to collect k points.

3) The Curse of Dimensionality and Why Two Features Can Be Enough

As dimension \(d\) increases, volume grows so fast that points become sparse; nearest neighbors become “far,” and distances concentrate. The consequence is brutal: non-parametric estimators demand exponentially more data with dimension. For trading, this is a hidden gift: strong two-feature models can outperform high-dimensional ones out-of-sample because they avoid variance blow-ups and overfitting. The two-indicator kNN you specified—feature1, feature2—is not naive; it is a principled defense against the curse, provided those features capture regime structure (e.g., momentum × volatility).

4) From IID to Markets: Time Series, Regimes, and Stationarity

Classical kNN proofs assume IID samples. Markets are autocorrelated, heteroskedastic, and non-stationary. The workaround is architectural:

With those controls, local neighborhoods become meaningful “phrases” in the market’s language.

5) The Two-Indicator kNN Model for Stock Picking

Specification (matching your brief)

  • Features: feature1, feature2 (e.g., a momentum oscillator and a volatility/overbought-oversold measure).
  • Label (direction): +1 if next period’s return > 0, −1 otherwise (defined on bar close to avoid repainting).
  • k-NN: classify current point by the majority class among its k nearest historical neighbors in {feature1, feature2} space.
  • Assets: Equities, indices, ETFs (also applicable to FX/futures with careful contract roll handling).

Supervised vs. unsupervised? For predicting the next move (a labeled up/down outcome), kNN is a supervised, instance-based learner. The “unsupervised” phrasing is sometimes used informally because kNN has no explicit training step; nevertheless, classification uses labels and is supervised.

5.1 Decision rule

Given current point \(x_0=(f_1, f_2)\) and historical sample \(\{(x_i, y_i)\}_{i=1}^n\) with \(y_i \in \{-1, +1\}\), let \(N_k(x_0)\) be indices of the k nearest \(x_i\) under distance \(d(\cdot,\cdot)\). Predict \( \hat{y}(x_0) = \mathrm{sign}\left(\sum_{i\in N_k(x_0)} w_i y_i\right)\) with weights \(w_i \ge 0\) that decrease with distance (uniform or inverse-distance).

5.2 Why this can work

6) Feature Engineering: Momentum, Volatility, WPR, and COG

Two-dimensional design forces discipline. A practical pair that generalizes:

Normalize each feature per rolling window (z-score or robust median/MAD) so the metric is meaningful across regimes. Clip extreme outliers to reduce the influence of crash clusters in nearest-neighbour search.

7) Labeling the Target: Direction, Horizons, and Class Imbalance

Define direction at bar close to avoid repainting:

label_t = +1 if Close[t+H] / Close[t] - 1 ≥ 0 else -1

with horizon \(H\) (e.g., 1 bar/day). For intraday bars, consider microstructure noise; for multi-day horizons, use forward-return thresholds (e.g., require >= +ε or <= −ε) to reduce label ambiguity. If classes are imbalanced, reweight neighbors or set decision threshold on the posterior to trade precision/recall as desired.

8) Validation: Repainting, Leakage, Walk-Forward, and Deflated Sharpe

Repainting occurs when signals use future information or when indicators are recalculated with bars not yet closed. Controls:

TradingView users: use barstate.isconfirmed to ensure features/labels only update after bar close; avoid using replay data to tune k post-hoc.

9) From Signals to Portfolio: Fractional Kelly and Drawdown Control

Converting a directional classifier into returns requires sizing and risk. Let \(p\) be the estimated probability of up-move in the neighborhood and let \(\mu, \sigma^2\) be conditional drift/variance. The Kelly fraction \(f^* \approx \mu/\sigma^2\) (or for Bernoulli with edge \(b\), odds \(o\): \(f^*=b/o\)) is too aggressive under estimation error. We use fractional Kelly \(f=\kappa f^*, 0<\kappa<1\), calibrated to drawdown tolerances and parameter uncertainty.

10) Execution, Capacity, and Impact

Impact often scales sublinearly (square-root law). Practical implications:

11) Robustness: Adversarial Markets, Stress, and Outages

Stress the system with historical shocks and synthetic volatility pulses. Ensure the classifier degrades gracefully by shrinking k or abstaining when local density drops below a minimum (no reliable neighbors).

12) Appendix: FAQ and SEO Notes for Boutique/Obscure Hedge Funds

Readers looking for “obscure hedge funds” and “boutique quant” often value technical clarity and operational sobriety. This article deliberately:


Implementation sketch (two-feature, rolling kNN)

Inputs: feature1[t], feature2[t], price[t]
Params: k, window = lookback bars, horizon H

For each t ≥ window:
  X_train = {(f1[i], f2[i]) for i in [t-window, …, t-1]}
  y_train = {sign(price[i+H]/price[i] - 1)}  # label on closed bars only
  x0 = (feature1[t], feature2[t])            # current closed bar
  # distance (standardized Euclidean):
  d(i) = sqrt( ((f1[i]-x0.f1)/s1)^2 + ((f2[i]-x0.f2)/s2)^2 )
  N = indices of k smallest d(i)
  vote = sum( y_train[i] * w_i ), with w_i = 1 / (d(i)+ε)
  if vote > 0 then long next bar; if vote < 0 then short/flat by policy
  size = κ * f*(conditional)  # fractional Kelly or banded sizing
  execute with slippage model; update PnL

Production: standardize per window, clip outliers, enforce liquidity filters, and log every decision (features, neighbors, distances, votes, size, fills).

Disclosures: This content is for information only, not investment advice. Past performance is not indicative of future results. Methods described herein are subject to change without notice.