Back to projects
01 — Context

The problem

The core question: are daily closing trajectories correlated — evidence of real trends — or effectively random? Answering it needed both rigorous statistical testing and a predictive model to quantify how much signal is actually recoverable.

02 — Method

Approach

The workflow runs from data preprocessing through exploratory analysis, clustering, correlation testing, and finally predictive modeling.

  • Imputation of missing values in the far_price and near_price columns.
  • Min-Max scaling and feature engineering on bid-ask spreads and reference prices.
  • K-means clustering chosen via the Elbow method, with t-SNE and hierarchical clustering for structure.
  • Permutation testing and daily-correlation heatmaps to test the trend-versus-random hypothesis.
03 — Build

Tech deep-dive

Four regression models were compared under 5-fold cross-validation: Linear Regression, Ridge, Lasso, and HistGradientBoosting.

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.model_selection import cross_val_score

model = HistGradientBoostingRegressor()
scores = cross_val_score(model, X, y, cv=5)
04 — Result

Outcomes & learnings

HistGradientBoostingRegressor was the clear winner, outperforming the linear baselines at predicting the target. The correlation and permutation analysis showed structure in daily closing prices that is not fully explained by randomness — modest, exploitable signal rather than pure noise.