So far, Tetsuo is doing rather well, but, as I’ve discussed in recent posts, it only pulled maybe 2.5% a week. That’s pretty good. Really it is. But that’s a leaky faucet. I need it to rain.
As also discussed, there needed to be some filters for junk data and it needed to pull both NASDAQ and NYSE symbols, which presented a problem: There are too many stocks between both exchanges for my current hardware to be able to process this way unless I want to scale the order of distribution in my topology which requires much, much more hardware investment than I’m willing to do until its paying for itself.
So, first, I designed, built and deployed UIP (Universal Inventory Provider) to serve as the starting inventory and provide some sanity filters to drop obviously junk symbols or symbols that hadn’t been on the exchange long enough to trade on.
This required a modification to MAG and SIG (I’ve suspended SAP and EXP until this round of things is done). Then we ran into another issue. It would take 120 hours to run feature selection battery forecasts in MAG and 60 hours for SIG. This is a significant problem as it would prevent the system from doing daily forecasts if there were enough symbol/exchange listing changes that day. It turned out, I was using thread-based parallelism, which, in Python, for heavily CPU bound tasks is problematic due to the Global Interpreter Lock, so I had to rebuild the concurrency strategy for both components. This brought my time for MAG to about 40 hours and SIG down to about 13, which is still too long, but, I did that processing on my local and moved the artifacts to buy some time. The only way this will work over long periods of time is to have a sidecar system for feature selection reconfiguration over time to keep the batteries needed on the daily down to a manageable level.
Because I anticipated this bottleneck being a possibility later the design allowed me to just stage the feature lists once I generated them locally to save a massive amount of time (24:4, so about a 600% speedup after bringing it down to a third of its time requirement already, so technically an 1800% overall speed improvement).
As to the sidecar configurator, I have just the fix. The old NAS server, which I’ve rebuilt 3 times and need to rebuild again now due to a spilled soda problem. After rearranging my office to prevent that from ever happening again, I took the hit and bought the replacement hardware and am waiting on it to arrive. The changes I’ve made, unless something goes horribly wrong, bought me likely months of time, but, it will need that server doing support configuration in the long run to stay performant.
It’s a good segue to taking a few days off, cleaning the house, especially my destroyed office, and getting ready to pick at this here or there as it comes together. I’ve got drinks to drink, weight to lose, muscles to rehabilitate, and people to reconnect with. If you follow my adventures you can probably tell I just go all in when I’m focused on a project.
What’s in place looks like is already profitable, so, it’s all incremental improvements from here with the rebuilding of EXP and SAP, and the sidecar configuration on the NAS machine, which has just as much firepower as my local, and then some basic reporting to start analysis to see if there are other improvements I can make further down the line. These are all very easy things. Time to relax for a while.