I’ve got a rebuild of the SIG tool, I’ll call it SIG-R until its been burned in for a while. It’s live at:
Sometimes stuff just clicks together if you hammer it enough.
I pulled out multithreading entirely. XGBoost in Python is multithreaded natively so it doesn’t behave well if you try to multithread it. It was going to turn a 35 hour process into a 900 hour process.
Ultimately there were wins and losses: Configuration is going to take about 36 hours worst case scenario because it’s running a battery of 10 day walking forecasts for each of 29 feature sets available to configurate. It’s a lightweight forecasting system, but, that’s alot of spinning.
Forecasting takes like 30 minutes worst case scenario. Whole different scale of time.
So, it’ll have two modes. One to configure, and one to forecast using those configurations.
Unless it’s overfitting to get the numbers I’m seeing, configuration won’t need to be done every day, so, it taking longer than 24 hours isn’t an issue. I can run it on Saturday and then the 30 minute sign forecasting is easy timewise.
One of the ways I got a nice boost in performance is was to stop caching all data locally to operate off of. This grabs it as it goes from TDM since TDM is public. It runs on the same host, but all the pieces are designed to operate independently. It’s kinder to the disks and it creates a mode where I’m operating from a pipeline perspective instead of different points talking to the same list, which was low in accuracy due to some symbols not having adequate history on the market to do forecasting on, and those needed weeded out from TDM.
It generates a daily report of its forecasts so I’ll be able to have like a monthly report generate that displays accuracy metrics as it goes.
From here it’s a little less direct. I need to wait for the configuration run to finish before I can even really start the next piece, which is magnitude forecasting, and yes, it will be called MAG.
MAG will read SIG’s daily forecast file for that day and grab the ones it thinks are going up, and of those, forecast how much it thinks each one will go up.
After MAG, unless the direction changes, that’s when we have a piece that does things like bringing in SAP results (sentiment numbers) and EXP results (expectations/upsets forecasting), confidence scores from SIG and MAG, and then use those sets of data finally to rank forecasts from MAG to select the best 5 stocks to buy that day in a tidy little “winners” report.
Still alot of work to do, but there’s great momentum going at this point and the numbers are looking alot more coherent than last year’s crazy back and forth learning every lesson about this kind of thing the hard way. I just didn’t have enough data and the data I had wasn’t very clean, and my implementations were rushed and messy in testing. I was also using the wrong model for that use case. Prophet isn’t good at short horizon forecasting.