SIG is coming along.
It configurates by symbol to use the right features for each individual symbol.
It also forecasts for all symbols in inventory.
It isn’t anywhere near as slow as I thought, but, there are kinks to be on the lookout for. Apparently XGBoost in python doesn’t lend itself well to parallelism. I’ve found some ways to get around that but I have concerns about cache collision that I need to look into to use the multithreaded features.
In multithreaded mode, on my machine at least, which is much faster than the server it’ll be running on when it’s done, it takes 10m20s to forecast. Outfuckingstanding. To configure, takes longer.
The problem is that it takes like 37 hours in series for configuring. That’s a long enough wait that it needs more closely observed. Nothing serious, though, just need to understand the ballpark of times it’ll need to work to set up a proper cadence and make sure it’s doing everything it’s supposed to with no degradation in regressor accuracy.
There was a reassuring moment, though. I picked a random symbol on the NYSE last night from its results after a test run. I checked its forecast. I did my regular office wok for the day and just kept an eye on the chart throughout the day.
It was right.
It’s far from polished enough to deploy. This is a first draft prototype. Based on how I usually go with these things I’ll probably end up rewriting it like 3 times before it’s production stable, two of which occur after deploying it.
10 minute sign forecasts for whole market analysis ain’t bad. Ain’t bad at all. In terms of accuracy, I’m seeing a good healthy distribution:
On a set of 451 symbols doing 10 day incremental walkbacks (trim off the more recent dates and forecast and compare to what really happened on those days):
- 2 were at 40% over those 10 days (meaning 10 days of forecasts was only right 40% of the time)
- 20 were at 50%
- 54 were at 60%
- 135 were at 70%
- 142 were at 80%
- 82 were at 90%
- 16 were at 100%
So the distribution is around the 75-85% range with a strong healthy amount higher than that. That’s good enough to forecast on if it’s not a result of overfitting on a rolling window.
I need to revisit this multithreading issue. It might just not be doable due to constraints on that library. If so, I’ll have to likely dedicated a sidecar server to just the SIG piece.