Fooled by Machine Learning Applied to Trading Algo Development

The chances of finding a profitable trading algo by applying machine learning are for all practical purposes zero. There is always a small probability of success  when using machine learning but one of the problems is that the results cannot be easily evaluated for significance due to multiple comparisons. Only in hindsight one can determine that an algo found by applying machine learning is profitable but that can be costly.

  • Mistake No. 1: Data-mining bias is not only due to curve-fitting (see related post)
  • Mistake No. 2: Conventional statistical tests cannot be used with machine learning
  • Mistake No. 3: Ignoring that machine learning is a method of the past

Leda Braga, manager of the well-known BlueTrend Fund, and one of the most successful trading system developers made the following statement in a recent interview:

“There’s a creative moment when you think of a hypothesis, maybe it’s that interest rate data drives currency rates. So we think about that first before mining the data. We don’t mine the data to come up with ideas.”

Why did Leda Braga say that? Those who use machine learning to develop trading algos should pay close attention. She said that because when using data-mining to generate algos this is essentially a data-fishing process that has an extremely low probability of success. Trying different combinations of technical and fundamental indicators relentlessly is a process that involves multiple comparisons. As the number of trials increases, that guarantees a result that even performs well in an out-of-sample and even passes all cross-validation tests.

In general, the probability of finding a trading algo by mining the data that also performs well in an out-of-sample, denoted here as P(algo) , approaches 1 as the number of tests gets very high:

P(algo) = lim[1 – a(n)/n], as n goes to infinity

where n is the number of tests made and a(n) is a real function of n. In the case that a(n) also tends to infinity, the probability is undefined.

Mistake No. 1: Data-mining bias is not only due to curve-fitting

Some trading system developers think that data-mining bias arises due to curve-fitting. Although the result of data-mining can be a curve-fitted model, in general a result that works by chance on an out-of-sample need not be curve-fitted in any particular sense. Data-mining arises from the practice of reusing data to test many different algos. The final selection of some algo that cross-validates on an out-of-sample ignores that many other algos did not pass the tests and increases the probability of it being a fluke, not necessarily fitted in some particular sense. It could be an algo that survived by chance.

Mistake No. 2: Conventional statistical tests cannot be used with machine learning

Most trading system developers do not avoid machine leaning because they think it leads to flukes necessarily but because they know than conventional statistical tests cannot be used when there are multiple hypotheses involved. One cannot simply take the Sharpe ratio and multiply it by the square root of the number of years in an out-of-sample test to calculate a t-statistic. Adjustments must be made for the fact that tests are not independent but they are the result of multiple comparisons.

For example, a strict test involves dividing the required significance level by the number of uncorrelated rules, call that m. If the significance level is set at 5%, the selected algo must be significant at the 5%/m level. Given that some machine learning programs test billions or even trillions of trading rules, one can imagine that we are looking for a quite unusual result. There are ways of relaxing those tests but the idea is the same: corrections must be made and that limits the chances of a significant result.

Mistake No. 3: Ignoring that machine learning is a method of the past

Machine learning was used extensively in the 1980s for predicting stock returns and exchange rates with dismal results. It was abandoned by many quants in favor of the process Leda Braga described above that involves independent hypothesis testing. However, even that process is not perfect as I note in the mentioned blog:

“…this is true only if the same data are not used again and the same analyst does not use the knowledge obtained from a failed test to come up with another idea. Otherwise, there is no difference between an algo that mines data and a human mind that mines the same data other than the speed at which this is achieved.”

The lack of creativity of machine learning

It is easy to get fooled by machine learning when developing trading algos. Only those with an understanding of the impact of multiple comparisons on statistical significance and with knowledge of the appropriate tests can effectively use this process although success is not guaranteed because all tests are conditioned on past data and the future can be different from the past. However, a more important consideration is that most machine learning algorithms do not discover anything more fundamental than their inputs. If the indicators used are ineffective in capturing returns in the first place, it is quite likely that any higher complexity algo that uses them will also be ineffective. In other words, machine leaning lacks creativity and that limits its effectiveness besides the other problems related to multiple comparisons.

I abandoned machine learning in the mid 1990s after determining that it was impossible to assign any significance to performance results. Instead, I decided to resort to deterministic price action analysis.  By deterministic I mean analysis that produces the exact same results each time it runs with the same data. Note that most machine learning algorithms have inherent randomness that is used to select initial conditions and it is almost impossible to reproduce their results except in some special cases. Furthermore, I decided to use a simple set of metrics and deal only with one class of indicators, price patters. That reduces data-mining bias because there is no parameter optimization and significantly less degrees of freedom. However, there is still the issue of selection bias that must be dealt with. Since statistical significance tests are useless for all practical purposes, I have even abandoned out-of-sample cross validation and I rely on portfolio backtests. Although the evaluation of performance on portfolios of securities is a strict test, it is nevertheless the only one that in my opinion makes sense in the context of data-mining.

As far as independent hypothesis testing, I also develop systems based on that but the process is slow because uniqueness of the hypothesis is a prerequisite for it to succeed. I also wrote in my blog on the BlueTrend fund the following:

“One may avoid getting fooled by randomness by making sure that the hypothesis to test is unique and could not have been obtained by a computerized data-mining process. This is the key here and proving that is half the battle and also half of any potential trading edge.”

You can subscribe here to notifications of new posts by email.


© 2015 Michael Harris. All Rights Reserved. We grant a revocable permission to create a hyperlink to this blog subject to certain terms and conditions. Any unauthorized copy, reproduction, distribution, publication, display, modification, or transmission of any part of this blog is strictly prohibited without prior written permission. 

This entry was posted in Quantitative trading, Strategy Synthesis, Trading Strategies and tagged , , , , , , , . Bookmark the permalink.