Premium Market Analysis

Quantitative trading, Trader education

Fooled by Machine Learning Applied to Trading Algo Development

The chances of finding a profitable trading strategy via the use of machine learning are extremely low. There is always a small probability of success when using machine learning but one of the problems is that the results cannot be easily evaluated for significance due to multiple comparisons. Only in hindsight one can determine whether a strategy found by applying machine learning is profitable but that can be costly.

  • Mistake No. 1: Data-mining bias is not only due to curve-fitting (see related article)
  • Mistake No. 2: Conventional statistical tests cannot be used with machine learning
  • Mistake No. 3:  Not being aware that machine learning effectiveness depends on quality of features

Leda Braga, a well-known fund manager and a trading system developer, made the following statement in an interview:

“There’s a creative moment when you think of a hypothesis, maybe it’s that interest rate data drives currency rates. So we think about that first before mining the data. We don’t mine the data to come up with ideas.”

Why did Leda Braga say that? Those who use machine learning to develop trading algos should pay close attention: She said that because when using data-mining to generate algos this is essentially a data-fishing process that has a low probability of success. Trying different combinations of technical and fundamental indicators relentlessly is a process that involves multiple comparisons. As the number of trials increases, that guarantees a result that even performs well in an out-of-sample and even passes all cross-validation tests.

In general, the probability of finding a trading algo by mining the data that also performs well in an out-of-sample, denoted here as P(algo) , approaches 1 as the number of tests gets very large:

P(algo) = lim[1 – a(n)/n], as n goes to infinity

where n is the number of tests made and a(n) is a real function of n. In the case that a(n) also tends to infinity, the probability is undefined.

Mistake No. 1: Data-mining bias is not only due to curve-fitting

Some trading system developers think that data-mining bias arises due to curve-fitting. Although the result of data-mining can be a curve-fitted strategy, in general a result that works by chance on an out-of-sample need not be curve-fitted in any particular sense. Data-mining arises from the practice of reusing data to test many different algos. The final selection of some algo that cross-validates on an out-of-sample ignores that many other algos did not pass the validation tests and increases the probability of it being a fluke, not necessarily fitted in some particular sense. It could be an algo that survived until market conditions changed.

Mistake No. 2: Conventional statistical tests cannot be used with machine learning

Most trading system developers do not avoid machine leaning because they think it leads to flukes necessarily but because they know that conventional statistical tests cannot be used when there are multiple hypotheses involved. One cannot take the Sharpe ratio and multiply it by the square root of the number of years in an out-of-sample test to calculate a t-statistic and test for significance of the observed results. Some serious adjustments must be made for the fact that tests are not independent but they are the result of multiple comparisons.

For example, a strict test involves dividing the required significance level by the number of uncorrelated rules, call that m. If the significance level is set at 5%, the selected algo must be significant at the 5%/m level. Given that some machine learning programs test billions or even trillions of trading rules, one can imagine that we are looking for a quite unusual result. There are ways of relaxing those tests but the idea is the same: corrections must be made and that limits the chances of a significant result.

Mistake No. 3: Not being aware that machine learning effectiveness depends on quality of features

Machine learning was used extensively in the 1980s for predicting stock returns and exchange rates with dismal results. It was abandoned by quants in favor of the process Leda Braga described above that involves independent hypothesis testing. However, even that process is not perfect as I note in the mentioned blog:

However, the major factor that determines the results from machine learning is feature engineering. This is an excerpt from another article:

Feature engineering is the hardest aspect of machine learning and algorithmic trading. If the features (predictors or factors) used do not have economic value, performance is unlikely to be satisfactory. Algorithmic trading and machine learning cannot find gold where there is none. The use of widely known features is unlikely to produce anything of value. Developing an algo and applying machine learning is the easy part of this process despite some common misconceptions. A few operators of platforms where aspiring traders gather to test their programming skills offer known features that are “tortured until they confess to anything”. These approaches will probably fail because of data-mining bias. Note that this bias is cumulative and at some point grows out of control.

The lack of creativity of machine learning

It is easy to get fooled by machine learning when developing trading strategies. Only those with an understanding of the impact of multiple comparisons on statistical significance and with knowledge of the appropriate tests can effectively use this process although success is not guaranteed because all tests are conditioned on past data and the future can be different from the past. However, a more important consideration is that most machine learning algorithms do not discover anything more fundamental than their inputs. If the indicators/features used are ineffective in capturing returns in the first place, it is quite likely that any higher complexity algo that uses them will also be ineffective. In other words, machine leaning lacks creativity and that limits its effectiveness besides the other problems related to multiple comparisons.


If you found this article interesting, I invite you follow this blog via any of these methods: RSS or Email, or follow us on Twitter

If you have any questions or comments, happy to connect on Twitter: @mikeharrisNY

Charting and backtesting program: Amibroker


Technical and quantitative analysis of Dow-30 stocks and 30 popular ETFs is included in our Weekly Premium Report. Market signals for longer-term traders are offered by our premium Market Signals service. Mean-reversion signals for short-term SPY traders are provided in our Mean Reversion report.

Copyright Notice