Tetsuo MKV Centurion is dead. Long live Tetsuo.
The build went well, but, it did not perform as simulated and lost a shocking amount of money very rapidly. I’m not even going to post the graph.
I pulled the whole thing apart and found two massive flaws in:
- the way in which forecasts were being calculated
- the way in which quality metrics for forecasts were being generated
These are expected growing pains, I guess. It peaked with about a 53% real accuracy and would have long spurts of high performance, only to be followed with short spurts of catastrophic performance that defied the numbers. Then, as I started building correlations between forecast metrics and historical performance it became clear that metrics like weighted directional accuracy, as the most simple, did not correlate with actual directional accuracy when compared to actual historical performance.
The algorithms for generating these metrics were correct. Impressive, even. The data being fed into them was not. I will explain:
# we will populate a tmp_forecast with the forecasted log returns for the next two trading days as 'y'
# Predict the next two trading days using the volume regressor
tmp_forecast_result_dataframe = prophet.predict( tmp_forecast_training_dataframe )
# this creates a huge dataframe with a lot of columns, but we only care about 'ds' and 'yhat' for the forecasted
# values
That data is eventually fed to:
def calculate_forecast_errors( original_data, forecast_data ):
# original_data:
# contains these columns:
# ds - date
# y - log return
# volume - log volume
# y_raw_price - raw price
# volume_raw - raw volume
# forecast_data:
# contains these columns:
# ds - date
# y - forecasted log return
# we need to align the forecast_data y values (log return) with the original_data t_raw_return
# Align the forecast data with the test data
y_true = original_data['y_raw_price'].values
y_pred = forecast_data['y_raw_price'].values
# disable the scaling for now
scale_factor = 1
y_true_scaled = y_true * scale_factor
y_pred_scaled = y_pred * scale_factor
# Calculate RMSE
rmse = calculate_rmse(y_true_scaled, y_pred_scaled)
# Calculate NRMSE (normalized by price range)
nrmse = calculate_nrmse(y_true_scaled, y_pred_scaled)
...
In these snippets, it’s populating a dataframe with forecasts. Except, values for historical dates in the returned dataframe are not a “walking back-filled forecast to calculate metrics from” but fitted historical data. This creates artificially high quality metrics that aren’t going to reflect the accuracy of the forecast so much as alignment between the historical data and the fitted historical data. That’s not very useful, and explains why I’ve had such a hell of a time making sense of these forecast metrics being out of alignment with performance.
It also explains the behaviour. We’re gathering 2000+ symbols’ of data from the NYSE going back like 700 days, so, when the market is stable and moving well the fitted historical data will closely align to the actual historical data, so, a sorted and filtered set of forecasts by error metrics calculated between these two sets is going to yield a positive return. And, when the market is volatile or losing overall, the reverse happens, and it loses money. So what I’ve been observing is samples of market fluctuations as opposed to actual forecast sorts. The use of backfilled forecasts for y_pred here is critical because the forecast horizon is only 2 days even if the training data (y_true) is 300-700.
It’s a tough pill to swallow and it’s honestly almost entirely to blame by facebook prophet’s complete and utter lack of comprehensive documentation. It cost me hundreds of hours of work and now requires a full project rewrite and an entirely new approach to forecasting. I’m kind of pissed off that these idiots stirred up this much hype over such a poorly documented project.
In any case, it’s back to the drawing board. I’ve got forecasts down to 2-20 minutes a set for all 2000 forecasts, not including calibration runs. If I introduce walking forecasts, I guess I could do a 20 step reverse walk of forecasts to calculate a better y_pred. That’s going to make even optimized, multi-threaded runs be 40-400 minutes. So, worst case scenario a 20 minute process becomes a 6 hour process. And calibration runs for filter analytics need 20 days of forecasts so my cycle time for one adjustment might become as high as 5 days unless I learn something fantastic about this library or another. It’s a real pile of bullshit that was entirely avoidable by these so-called engineers actually documenting their bullshit instead of requiring everyone to reverse engineer it and pretending they were useful.
But, here’s the bright side. The general framework I built around Prophet works. Every aspect of it. The problem is the use of the prophet library itself. These setbacks are actually critical milestones in getting a working system going. It just means the next version will be more functional, even if so slow that it isn’t feasible to do all the things it needs to every day (like aligning changepoint prior scale to weighted directional accuracy based on recency).
I’m going to investigate SARIMAX models for a bit and then start designing next steps.