Five Myths About Data-Mining Bias [Premium Articles]

Data-mining is widely used nowadays for trading algo development. There are several myths about how to deal with data-mining bias. This blog exposes five such myths.

For access to premium content, you must be a subscriber. Please login if you are already a subscriber or subscribe to continue reading...

This entry was posted in Trader education and tagged , , , , . Bookmark the permalink.

8 Responses to Five Myths About Data-Mining Bias [Premium Articles]

  1. David says:

    Awesome post. I've had numerous arguments with peers along similar lines, especially regarding walk forward testing.

  2. Krzysztof says:


    You said about myth 5

    "One of the most naive things one could do nowadays is to try to combine some indicators with exit rules using some machine learning algorithm hoping to find gold. Although some results may work for a period of time, inherently they are random because when one adjusts for the multiple comparisons, the significance of the results is diminished."

    So the question is: How you can prove it and distinguish that the system stopped to work because of data-mining bias and not e.g. decreased signal/noise ratio of market signal ?


    • Hello Krzysztof,

      Remember what I wrote: data-mining bias refers to the quality of the process. The adverse effect of data-mining bias is that one may select a random model with high probability that will latter fail due to any reasons, including the low signal/noise ratio you mentioned. Therefore, what I'm trying to clarify here is that data-mining bias is not some specific cause but an inherent flaw of a process of developing trading systems.

  3. Nikita says:

    Great post, thanks! So, how can someone avoid data mining bias?

  4. Nikita says:

    Disregard my last comment, I didn't see the last part of the article.

    • Hello Nikita,

      Data-mining bias cannot be avoided. It is there always as an integral part of the process and continuously increases. One could try to minimize it. For example, minimizing the number of different rules tried has a large positive impact. Also, trying only rules of similar "nature", for example moving averages. Price Action Lab, my software, also suffers from data-mining bias but the effect is orders of magnitude smaller than in cases where many different indicators are available or price patterns are identified by algos with more degrees of freedom, such as some genetic programming and NN tools. See the first link in the article text for more details:

      • Robert Begley says:


        I too agree that DMB cannot be measured, even approximately.

        I also looked briefly at Price Action Lab and I'm impressed with it. Lacks some bells and whistles of other programs but it’s based on a sound approach.

        Thanks for the interesting articles!

Comments are closed.