I ended up being sick all weekend and couldn’t get much done on MAG. 🙁
What to change for MAG
- Target:
y_mag = (close_1100[T+2] - close_1100[T+1]) / close_1100[T+1]
(a 1-day forward return; scale as bps if you prefer). - Preprocess: winsorize y at 1/99 pct to tame tails; optionally standardize.
- Model:
XGBRegressor
(not Classifier). Start with Huber loss (loss_function='huber'
viaobjective='reg:squarederror'
+alpha
not exposed; practical alternative: keepreg:squarederror
+ robust y). If you want uncertainty bands, add quantile heads (two extra regressors withobjective='reg:quantileerror'
,quantile_alpha=0.15/0.85
on newer XGB; if unavailable, train with pinball loss manually using LightGBM or GradientBoostingQuantile). - Weights: optionally down-weight high-vol days with
sample_weight = 1 / (rolling_vol + eps)
. - Eval: MAE (in bps), median AE, and Spearman corr with realized return. Don’t use accuracy.
Some solid default params:
xgb.XGBRegressor(
n_estimators=200,
learning_rate=0.08,
max_depth=5,
min_child_weight=5,
subsample=0.9,
colsample_bytree=0.9,
reg_lambda=1.0,
reg_alpha=0.0,
tree_method='hist',
max_bin=256,
n_jobs=-1,
random_state=42,
objective='reg:squarederror'
)
Minimal MAG forecaster (drop-in shape)
class MagForecaster:
def __init__(self):
self.model = None
self.feature_columns = None
def _make_target(self, df):
y = np.full(len(df), np.nan, dtype=float)
for i in range(len(df) - 2):
t1 = df.iloc[i+1]['close_1100']
t2 = df.iloc[i+2]['close_1100']
if pd.notna(t1) and pd.notna(t2) and t1 != 0:
y[i] = (t2 - t1) / t1 # signed return
return y
def train(self, df):
self.feature_columns = [c for c in df.columns if c.startswith((
'close_','volume_','ema_','rsi_','macd_','bb_','obv_','pbf_',
'vwap','price_vs_vwap','volume_imbalance','intraday_realized_vol',
'vol_regime_change','open_gap','open_volume_spike',
'volume_concentration','volume_acceleration',
'uptick_','high_rejection','low_rejection',
'volume_price_divergence','divergence_strength',
'cumulative_delta','delta_acceleration',
'range_position','range_size','range_expansion',
'momentum_acceleration','momentum_consistency',
'volume_price_confirmation','volume_surge_momentum',
'price_extension','consecutive_up','consecutive_down','exhaustion_score',
'daily_return','failed_breakout','failed_breakdown',
'recent_range_position','new_3d_high','new_3d_low'
)) or c in ['intraday_momentum','mfi','day_of_week','is_monday','is_friday','month','is_month_end']]
y = self._make_target(df)
mask = ~np.isnan(y)
if mask.sum() < 30:
raise ValueError("Not enough samples for magnitude regression")
# winsorize target (1/99 pct)
y_train = pd.Series(y[mask]).clip(lower=pd.Series(y[mask]).quantile(0.01),
upper=pd.Series(y[mask]).quantile(0.99)).to_numpy()
X_train = df.loc[mask, self.feature_columns].astype(np.float32)
# optional inverse-vol weights
vol = df.get('intraday_realized_vol', pd.Series(index=df.index, data=np.nan)).fillna(method='ffill').fillna(0.0)
w = 1.0 / (vol.loc[mask].to_numpy() + 1e-6)
self.model = xgb.XGBRegressor(
n_estimators=200, learning_rate=0.08, max_depth=5,
min_child_weight=5, subsample=0.9, colsample_bytree=0.9,
reg_lambda=1.0, reg_alpha=0.0, tree_method='hist', max_bin=256,
n_jobs=-1, random_state=42, objective='reg:squarederror'
)
self.model.fit(X_train, y_train, sample_weight=w)
# keep data for forecast alignment
self.data = df.copy()
return {'trained': True}
def forecast(self, df, preserve_history=False):
if self.model is None or self.feature_columns is None:
raise ValueError("Model not fitted. Call train() first.")
X_last2 = self.data[self.feature_columns].iloc[-2:].astype(np.float32)
preds = self.model.predict(X_last2)
# write back into a copy for inspection
out = self.data.copy()
out.loc[out.index[-2:], 'predicted_mag'] = preds
return out if preserve_history else out.iloc[-2:]
Variants worth trying (often help MAG)
- Two-stage: classify sign (your current Forecaster), then regress |return| and reapply sign.
- Quantile heads: predict P15/P50/P85 to get an interval; use P50 for point, band for risk.
- Heteroskedastic modeling: second regressor for absolute error (uncertainty), use it to size positions.
I just don’t have enough time to tie it all together in between all the coughing and sneezing and blech’ing.