Sneak peak at an early draft after bringing in CatBoost support:
https://paste.silogroup.org/xoqayipuni.py
The new model is slow as shit, but it works. At 700 trading days, regressor forecasts take about a minute per symbol. With some clipping optimizations that can be reduced to about 3 seconds per symbol.
The actual walking price forecast using those regressors though, takes about 50 seconds per symbol.
So in this early crude state of the new design, while metric relevance has gone way way up in a more verifiable way, we’re looking at just under 2 minutes per symbol before re-introducing multithreading.
With multithreading, on the server we’re running on, I’m expecting about a 2x speedup without upgrading the server it’s running on with my hosting provider.
There are over 2200 symbols on the NYSE, so, 4400 minutes for a full run on a single thread. That’s 73 hours and 20 minutes.
So, I need to cut that runtime in half a few times. This is with 700 training days, which isn’t ideal for next day forecasts with a 2 day horizon and there’s a linear correlation between number of training days and forecast run time per symbol. So, if I reduce to, say, 200 training days, that would expect to reduce it to 34 seconds per symbol. With the optimization I have in mind for regressor forecasting, that would bring it to 20 seconds per symbol on a single thread. That’s about 12 hours. Better.
So after reintroduction of mult-threading that will bring it to 6 hours. That’s just enough time for full market analysis on the new catboost model.
Bear in mind, this new design negates the need for these bulky “rollback simulations” I’ve been performing because it’s building that walking forecast into the forecast process itself, so, one run will be using much more useful metrics than 20 on the old Centurion and Vulcan builds, and, while it will require a rework of the retroanalytics framework that does back forecasting for calibration, this will be able to use that walking forecast data produced to go as far back as I want without any additional wait time beyond the forecast process itself.
This also establishes a performance profile for the other models I’ll be adding, and absolutely rules out any form of ensemble (multiple models) forecasting without doubling, tripling or quadrupling hosting costs.
There are some caveats to the new design. The prophet forecasts will not be anywhere near as fast as CatBoost, for example. It’s unlikely that a full walking forecast will be possible with some models. So there are abstraction issues to work out when I get to the slower models if I still want good observability and performance.
I’m not going to lie, this was kind of a bitch to flesh out.