Premium Market Analysis, Trader Education, Software, and Trading Strategies. Thirty Years Of Skin In The Game

DLPAL software

Machine Learning With DLPAL PRO: Part Two

In this article we show how to prepare historical files with continues features and a discrete target for an ensemble of securities using DLPAL PRO. Then, we apply machine learning on a random sample of half of the data and test the results on the remaining half. The results are promising since the quality of generated features is high.

Data generation and preparation

The most important step in any machine leaning application is gathering data and generating relevant features with an appropriate target class. Although most work is usually focused on the use of algorithms, generating the proper data is more of an art and where the actual edge is found.

In a nutshell, to use machine learning a data file is required with a sufficient number of rows (instances) and columns that correspond to attributes. In trading applications, the simplest form of machine learning involves a group of continuous attributes and a discrete attribute for the target. The attributes (features predictors, etc.) can be values of technical indicators and fundamental data. The target is usually a binary class of the next period return in the timeframe under consideration. The objective is to determine the probability of the sign of the returns. Then, we can rank predictions according to objectives. But without the proper attributes, this is an exercise in futility.

One advantage of DLPAL PRO is that it generates features based on the proprietary p-indicator. These features are unique and offer opportunities for identification of price action anomalies. In Part One, we presented feature generation for a single security. In this article, we use features generated from a group of securities and specifically from historical data of D0w-30 stocks.

We select data for Dow-30 stocks since 01/2000 and we create a file with 500 instances (rows) of features using DLPAL PRO’s P-Indicator history tool, as shown below:

dlpalpro_wks

We used percent exits of 2%, entry at the open, a Normal cluster and the detrend option for the feature calculation. It took the program a few hours to generate the history file due to the enormous amount of calculations involved.

We then combine the generated file with features and historical data for SPY. This is how the file looks like:

spy_file

There are five features: avgPL, avgPS, Pratio, avgPLS, avgPSS. We employ some feature engineering to reduce these to three by taking the difference of the first two and last two, since it represents the directional bias. We also add a target based on two day returns. This is how the resulting file looks like:

spy_file_ml

We use the 2-day return because we know from experience that features from ensembles of securities predict returns two days in the future. This is where experience comes into play. However, we will also use machine leaning to determine if this experience is biased.

Training and testing

The objective of supervised machine learning is to determine the probability of class “Yes”. We start with binary logistic regression and we randomly sample 50% of the instances to use in training and the remaining 50% to use for testing.  Below is the confusion matrix from the training sample:

Yes No Sum
Yes 82 45 127
No 60 61 121
Sum 142 106 248

The error rate is 0.4234. Next we are interested in the error in the test set. The confusion matrix is shown below:

Yes No Sum
Yes 81 70 151
No 47 50 97
Sum 128 120 248

The test sample error rate is 0.4718.

We next train the classifier in the whole sample and then use bootstrap to determine the average test set error. This comes out to 0.4522.  We can then conclude that there is some prediction potential with this simple classifier.

We also used Support Vector Machines with RBF Kernel on the whole sample and then bootstrap. The resulting average test set error was 0.4343.  The results look promising and we can proceed with backtesting.

We implement the logistic function in Amibroker with the coefficients calculated from the training sample.  The backtest for the combined sample is shown below:

spy_eq

In the above chart, Volume is difavg, open interest is Pratio and AUX1 is difavgs. The logistic function BLR is also shown.  In a period from  11/05/2014 to 10/26/2016, CAR is 27.3%, net return is 61%, max DD is -9.2%, the win rate is 59% and profit factor is 1.82. The stop is two bars after the open and all positions are at the open. Commission of $0.01 per share was used and equity was fully invested.

Of course, half of the backtest is on training data but sampling was random to reduce bias. Although this is not a strategy that can be used in actual trading, the results show that the concept is promising but additional work is required to get to the stage where there is something usable. However, the ability of DLPAL PRO to generate unique features provides a good starting point.

If you have any questions or comments, happy to connect on Twitter: @priceactionlab

You can download a demo of DLPAL from this link.

Subscribe via RSS or Email, or follow us on Twitter.

Disclaimer

Copyright Notice