Distributed Tetsuo

I’ve started to revisit the problems needing solved behind tetsuo, learning from previous failures.

At this point I have a good proof of concept for how a system like this could be built, without any real concern placed on topology optimization or component responsibility.

That can help speed things up for a POC but it’s starting to become kind of burdensome when I need to build in a different direction. So far I’ve been using a very closed-minded approach to the whole thing, even if there were lots of great lessons learned getting that to that point that it is in today. It’s still not successful, and I’m no longer convinced it is guaranteed to be able to be at all, let alone in its current form — hence the major slowdown in development even when I do have free time from my dayjob.

Now that I know what I’m doing a little better, I’m going to start breaking this up into dedicated components with specific responsibilities so that I’m not rebuilding the whole solution every time I want to change fundamental aspects of the forecasting pipeline.

Currently, I’m comparing global models to the individual ranked forecasts we’ve been doing, and Precision@K is demonstrably higher with global models using the same training data. What this means is that the forecasts perform better on whole market datasets to forecast all of the T+2 prices needed for a 1-trade-per-trading-day duty cycle. LightBGM in some of its forms appears to be doing well in this area.

What is not improved is directional accuracy, which is a problem that has plagued this project from day 1. Well, it’s improved a little by the global model, by about 5%, but, it remains within the margin of error for random selection. The directional accuracy of the forecasts this has been generating is just simply unacceptable for trading based on ranked forecasts.

Breaking the forecast engine into two components, one for directional forecast and one for %Δ forecast will allow specializing each for one of the two separate aspects needed to get a better final result, as opposed to trying to do both at the same time and getting a result that is technically impressive but isn’t adequate for trading.

This should also improve efficiency, as directional forecasting should be less computationally expensive this way, and %Δ forecasts for symbols with a negative directional forecast won’t be necessary, so we will be able to skip those.

This also allows me to move away from the buffet bar I’ve been building in previous versions to compare model performance, allowing each component to be much simpler at the cost of some component orchestration overhead.

Then there’s the data management. I don’t want to rebuild it again. So, to fix that need, I’m going to rebuild it again to be a standalone component ready for a larger pipeline to consume its output.

Beyond forecasting components, the data management component, there’s also the trading aspect itself, which can be standardized and isolated in responsibility.

So:

Training Data Management
Directional Accuracy Forecast Engine
%Δ Forecast Engine
Trading Engine
Orchestration Engine

TDM (Training Data Manager)

Has two storage pools.
- One for minute-by-minute price and volume data by day partitioned by symbol. This will be the Interday storage pool.
- One for 1100 ET price and volume by day partitioned by symbol. This will be the Intraday storage pool.
Maintains a list of currently active trading symbols

Normal Operation Mode:

Grabs the data for today, organizes and annexes to storage for retrieval by other components.

Determine if today is a trading day. If it’s not, assume the data to grab is for the last trading day prior to today.
1630: update its list of active symbols on the NYSE, stripping all historical data from interday and intraday storage pools for symbols no longer on the list after the update.
1635: pull the minute-by-minute price and volume of each symbol for the day from the data broker API, and annex that to its interday pool, and trim off the 700th (configurable limit) day back if there is one, so that the interday storage pool never has more than 700 days of data for each symbol . As it does this it will pull the 1100 price and volume for each symbol for that day and annex that to the Intraday storage pool with the same 700 day limit.

Full Pull Mode:

An ad-hoc execution that refreshes all interday and intraday data after updating its active trading symbols list.

After this is fleshed out, I’ll post about the directional accuracy forecast engine as it will be next in the list. I will call this distributed variation of Tetsuo “Sol”.

Distributed Tetsuo

TDM (Training Data Manager)

Normal Operation Mode:

Full Pull Mode:

Tetsuo MK-VII Goes Live: A New Dawn

Deploying MAG

MAG: Progress in Sight