Is kNN supervised or unsupervised in stock prediction?

For direction or return prediction, kNN is a supervised, instance-based learner: you use labeled examples (features, direction) and classify new points by their nearest neighbors. Unsupervised kNN variants exist for density/outlier tasks, but trading signals that predict up/down are supervised.

Why can simple two-feature kNN work?

When markets revisit similar regimes, local neighborhoods in a well-chosen indicator space contain conditional information about the next move. Under smoothness and stationarity over short horizons, kNN approximates the Bayes classifier; with proper validation and risk controls it can add geometric growth.

How do you avoid repainting and look-ahead bias?

Define labels on fully closed bars, lock indicator values at bar close, use walk-forward evaluation, delay trades to the next bar, and ensure feature computation uses only past information (no future peeks).

ResearchDeep DiveApprox. long-form read

Markets as Language: Why k-Nearest Neighbours Works for Stock Picking

This paper begins with the doctoral history and mathematics of nearest-neighbour methods, then demonstrates—within a disciplined, production-grade framework—why a two-indicator k-Nearest Neighbours (kNN) model can extract signal from equity markets. We connect geometric growth theory, information and distance metrics, capacity and microstructure, and walk-forward governance into a single architecture deployable by a boutique, “obscure” hedge fund.

Contents

Nearest Neighbour: History, Geometry, and Consistency
Metrics, Topology, and the Choice of Distance
The Curse of Dimensionality and Why Two Features Can Be Enough
From IID to Markets: Time Series, Regimes, and Stationarity
The Two-Indicator kNN Model for Stock Picking
Feature Engineering: Momentum, Volatility, WPR, and COG
Labeling the Target: Direction, Horizons, and Class Imbalance
Validation: Repainting, Leakage, Walk-Forward, and Deflated Sharpe
From Signals to Portfolio: Fractional Kelly and Drawdown Control
Execution, Capacity, and Impact
Robustness: Adversarial Markets, Stress, and Outages
Appendix: FAQ and SEO Notes for Boutique/Obscure Hedge Funds

1) Nearest Neighbour: History, Geometry, and Consistency

Nearest-neighbour rules emerged from early pattern recognition and non-parametric statistics. Conceptually simple, the method embeds observations in a metric space and assigns the label of the closest training examples to an unlabeled point. In the 1-NN rule, the predicted label equals that of the single closest observation; in k-NN, the label is determined by the majority (classification) or average (regression) of the k nearest points.

Foundational results show why such a simple rule can be powerful. Under mild regularity, as the sample size \(n\to\infty\), and with \(k\to\infty\) but \(k/n\to 0\), the k-NN classifier is consistent, converging in risk to the Bayes optimal classifier (the minimum possible classification error). This explains the method’s resilience: it estimates the decision boundary implicitly via local neighborhoods and thus adapts to nonlinear manifolds without explicit parametric form.

Interpretation for finance: if returns conditional on certain indicator states are locally smooth, then with sufficient history, k-NN will recover the optimal up/down boundary for that state—provided we respect time order and regime changes.

2) Metrics, Topology, and the Choice of Distance

All nearest-neighbour methods live and die on the metric. Euclidean distance assumes isotropy; Manhattan (L1) emphasizes coordinate-wise deviations; Mahalanobis rescales by covariance so that distances are measured in units of standard deviation and account for correlation. In practice:

Euclidean / standardized Euclidean: simple, works when features are normalized and roughly isotropic.
Mahalanobis: appropriate when features are correlated (common for technical indicators); requires stable covariance estimation.
Dynamic Time Warping (DTW): for sequence alignment, but overkill for single-bar features; expensive and risky for latency.
Cosine / correlation distance: for directional similarity when scale is less informative than angle.

Weighted k-NN further generalizes: nearer neighbors carry higher weights \(w_i \propto 1/d_i^\alpha\). This is particularly helpful when noise inflates the radius necessary to collect k points.

3) The Curse of Dimensionality and Why Two Features Can Be Enough

As dimension \(d\) increases, volume grows so fast that points become sparse; nearest neighbors become “far,” and distances concentrate. The consequence is brutal: non-parametric estimators demand exponentially more data with dimension. For trading, this is a hidden gift: strong two-feature models can outperform high-dimensional ones out-of-sample because they avoid variance blow-ups and overfitting. The two-indicator kNN you specified—feature1, feature2—is not naive; it is a principled defense against the curse, provided those features capture regime structure (e.g., momentum × volatility).

4) From IID to Markets: Time Series, Regimes, and Stationarity

Classical kNN proofs assume IID samples. Markets are autocorrelated, heteroskedastic, and non-stationary. The workaround is architectural:

Limit kNN to short-horizon regime inference (e.g., next-bar or next-day direction) where local stationarity is plausible.
Use rolling windows to ensure neighbors come from comparable regimes (recent market states).
Re-normalize features per window to keep comparable scales through volatility regimes.

With those controls, local neighborhoods become meaningful “phrases” in the market’s language.

5) The Two-Indicator kNN Model for Stock Picking

Specification (matching your brief)

Features: feature1, feature2 (e.g., a momentum oscillator and a volatility/overbought-oversold measure).
Label (direction): +1 if next period’s return > 0, −1 otherwise (defined on bar close to avoid repainting).
k-NN: classify current point by the majority class among its k nearest historical neighbors in {feature1, feature2} space.
Assets: Equities, indices, ETFs (also applicable to FX/futures with careful contract roll handling).

Supervised vs. unsupervised? For predicting the next move (a labeled up/down outcome), kNN is a supervised, instance-based learner. The “unsupervised” phrasing is sometimes used informally because kNN has no explicit training step; nevertheless, classification uses labels and is supervised.

5.1 Decision rule

Given current point \(x_0=(f_1, f_2)\) and historical sample \(\{(x_i, y_i)\}_{i=1}^n\) with \(y_i \in \{-1, +1\}\), let \(N_k(x_0)\) be indices of the k nearest \(x_i\) under distance \(d(\cdot,\cdot)\). Predict \( \hat{y}(x_0) = \mathrm{sign}\left(\sum_{i\in N_k(x_0)} w_i y_i\right)\) with weights \(w_i \ge 0\) that decrease with distance (uniform or inverse-distance).

5.2 Why this can work

Local smoothness: Indicators map state to a manifold where nearby points share similar conditional returns.
Model-free: No parametric boundary to mis-specify; the neighborhood adapts to nonlinearities.
Robust to misspecification: If features are informative, the majority label in a small neighborhood approximates Bayes.
Regime localization: Using recent windows ensures neighbors reflect current market microphysics.

6) Feature Engineering: Momentum, Volatility, WPR, and COG

Two-dimensional design forces discipline. A practical pair that generalizes:

feature1 — Momentum oscillator (e.g., normalized rate-of-change, short/medium SMA crossover delta, or RSI-derived z-score).
feature2 — Overbought/oversold & volatility context, e.g., Williams %R (WPR) or a percentile of ATR/true range, or Center of Gravity (COG) oscillator for turning-point structure.

Normalize each feature per rolling window (z-score or robust median/MAD) so the metric is meaningful across regimes. Clip extreme outliers to reduce the influence of crash clusters in nearest-neighbour search.

7) Labeling the Target: Direction, Horizons, and Class Imbalance

Define direction at bar close to avoid repainting:

label_t = +1 if Close[t+H] / Close[t] - 1 ≥ 0 else -1

with horizon \(H\) (e.g., 1 bar/day). For intraday bars, consider microstructure noise; for multi-day horizons, use forward-return thresholds (e.g., require >= +ε or <= −ε) to reduce label ambiguity. If classes are imbalanced, reweight neighbors or set decision threshold on the posterior to trade precision/recall as desired.

8) Validation: Repainting, Leakage, Walk-Forward, and Deflated Sharpe

Repainting occurs when signals use future information or when indicators are recalculated with bars not yet closed. Controls:

Compute features on fully closed bars; lock feature values at close.
Label using next bar’s close (trade is executed at t+1 open/close, not t).
Walk-forward: rolling train window → test on the next block; no peeking across the boundary.
Latency realism: apply realistic slippage/fees; turn off look-ahead in your backtest engine.
Multiple-testing: deflate Sharpe for trials; report confidence intervals; prefer stability to headline stats.

TradingView users: use barstate.isconfirmed to ensure features/labels only update after bar close; avoid using replay data to tune k post-hoc.

9) From Signals to Portfolio: Fractional Kelly and Drawdown Control

Converting a directional classifier into returns requires sizing and risk. Let \(p\) be the estimated probability of up-move in the neighborhood and let \(\mu, \sigma^2\) be conditional drift/variance. The Kelly fraction \(f^* \approx \mu/\sigma^2\) (or for Bernoulli with edge \(b\), odds \(o\): \(f^*=b/o\)) is too aggressive under estimation error. We use fractional Kelly \(f=\kappa f^*, 0<\kappa<1\), calibrated to drawdown tolerances and parameter uncertainty.

Cap single-name exposure; spread across independent signals to increase breadth.
Use a veto filter (e.g., high-volatility or liquidity stress) to cut risk when the neighborhood is unreliable.
Translate classifier confidence (vote margin) into position size bands.

10) Execution, Capacity, and Impact

Impact often scales sublinearly (square-root law). Practical implications:

Throttle participation rate; prefer passive fills in benign regimes.
Skip signals when spreads widen abruptly; your local neighborhood likely changed topology.
Monitor crowding; if too many points share the same neighborhood, edges decay.

11) Robustness: Adversarial Markets, Stress, and Outages

Stress the system with historical shocks and synthetic volatility pulses. Ensure the classifier degrades gracefully by shrinking k or abstaining when local density drops below a minimum (no reliable neighbors).

12) Appendix: FAQ and SEO Notes for Boutique/Obscure Hedge Funds

Readers looking for “obscure hedge funds” and “boutique quant” often value technical clarity and operational sobriety. This article deliberately:

Targets long-tail queries: “kNN stock picking,” “nearest neighbour trading,” “two-indicator strategy,” “walk-forward kNN,” “fractional Kelly.”
Uses internal links to The Architecture of Compounding and The Mathematics of Discipline for topical authority.
Includes structured data (Article + FAQPage) and canonical URL for clean indexing.

Implementation sketch (two-feature, rolling kNN)

Inputs: feature1[t], feature2[t], price[t]
Params: k, window = lookback bars, horizon H

For each t ≥ window:
  X_train = {(f1[i], f2[i]) for i in [t-window, …, t-1]}
  y_train = {sign(price[i+H]/price[i] - 1)}  # label on closed bars only
  x0 = (feature1[t], feature2[t])            # current closed bar
  # distance (standardized Euclidean):
  d(i) = sqrt( ((f1[i]-x0.f1)/s1)^2 + ((f2[i]-x0.f2)/s2)^2 )
  N = indices of k smallest d(i)
  vote = sum( y_train[i] * w_i ), with w_i = 1 / (d(i)+ε)
  if vote > 0 then long next bar; if vote < 0 then short/flat by policy
  size = κ * f*(conditional)  # fractional Kelly or banded sizing
  execute with slippage model; update PnL

Production: standardize per window, clip outliers, enforce liquidity filters, and log every decision (features, neighbors, distances, votes, size, fills).

← Back to Signal in the Noise Next: The Mathematics of Discipline →

Disclosures: This content is for information only, not investment advice. Past performance is not indicative of future results. Methods described herein are subject to change without notice.