Selection and data-mining bias in trading system development often lead to random results. In a nutshell, selection bias tricks the process of validation and falsely attributes significance to random results. The only chance of identifying robust trading systems that exploit robust market anomalies and provide an edge is by minimizing selection and data-mining bias. Even then, there may be other factors, known or unknown, that limit system potential.
In the book Hedge Fund Market Wizards (May 2012), there is an interview of Jaffray Woodriff, the founder of QIM. When he was asked during the interview about data-mining he said among other things that
“…the human tendency is to take the models that continue to do well in the out-of-sample data and choose those models for trading… this process simply turns the out-of-sample data into part of the training data because it cherry-picks the models that did best in the out-of-sample period.”
He then argued that instead one must look for patterns
“…where, on average, all the models out-of-sample continue to do well… you are really getting somewhere if the out-of-sample results are more than 50 percent of the in-sample.”
The widely (ab)used practice of selection bias
Among other things, selection bias involves ranking results according to some metrics and then selecting the best performer (most stock rotation strategies fall other this category and this suffer from bias.) If the selection is made only from results from the in-sample, then this is just selection bias. But if the selection is also made from results obtained from the out-of-sample, then data-snooping is introduced in addition. In both cases, selection is bad practice. In the former case, the process usually continues until a system selected from the in-sample happens to perform well in the out-of-sample. Trading system developers often get fooled by randomness through selection bias. If the trading system development process involves testing a large combination of alternatives (securities, indicators, factors, etc.,) then the probability that some random selection will validate successfully is for all practical purposes 1,, or in different words this processes guarantees that something will be found to satisfy the performance criteria defined by a set of metrics in the in-sample and out-of-sample.
Minimizing selection bias
Selection bias can be minimized by actually not selecting anything and accepting all results. This is the simplest and most straightforward solution although there is still possibility of curvefitting. However, this is not a practical approach in most cases and also this is not possible with some processes that can internally generate billions or even trillions of combinations, such as machine learning algorithms. In those cases we can relax the strict condition of no selection bias by requiring that ALL systems that satisfy the development objectives are validated as a single system. Although this does not eliminate completely the effects of selection bias and curvefitting, it may nevertheless minimize their negative bias effects.
Selection bias is not involved when a specific system is conceived as a hypothesis without looking at the data. This must be a specific system with specific parameters that completely describe its entry and exit points. An example of such system is the 50-200 SMA cross. Although the process that conceived the system is possibly subject to data-mining bias, there is no selection bias involved. Three important points must be made here: (A) Some confuse data-mining and selection bias, attributing the latter to a single, well-defined, trading system hypothesis. However, selection bias is only involved when the same set of well-defined criteria give rise to multiple hypotheses and a selection is made based on maximum performance, minimum drawdown, etc. (B) The assumption that all conceivable systems, even in the form of single arbitrary hypotheses, are the result of selection bias is unsound because (1) such claim is a universally quantified proposition of the form All swans are white and it cannot be proven but only falsified, (2) the process that gives rise to the hypotheses must be explicitly defined and in the absence of such definition no such claim can be made, i.e., it is a vacuous claim. (C) The data mining bias involved in the case of a single hypotheses can be minimized using out-of-sample validation. According to some notable experts in the field, for example, Jaffray Woodriff, the founder of QIM mentioned above, if the out-of-sample results are more than 50 percent of the in-sample, then there is a good chance that the hypothesis may not be random. But this only holds in the case of a unique hypothesis and not in the case of multiple testing.
It is also very important in my opinion that the data-mining process is inherently deterministic in the sense that it always generates the same output when it is supplied with the same data. Otherwise, the results are by design random and any application of cross-validation may be affected by the randomness. If one tests trillions of combinations of patterns, it is guaranteed that several random combinations will pass cross-validation and even portfolio backtesting tests. Every random combination essentially amounts to a random selection and introduces bias. One way to minimize data-mining bias and eliminate selection bias involves identifying all possible combinations and apply cross -validation to the combined set. But this is an impossible task, especially in the case of evolutionary algorithms. A solution may involve limiting the available set to just a few indicators and patterns and trying to establish convergence of the algorithm. But this defeats the purpose for which some of these algorithms were developed, essentially rendering them useless because something like that can be done either manually or with application of combinatorics.