The more features added or engineered, the higher the chances of a random strategy that passes all validation tests. Some combinations of features will eventually fool even the most intelligent validation algo. Hundreds of features possibly means thousands ways to be fooled.
I wrote “Fooled by Randomness, Over-fitting And Selection Bias” in June 2012 and before the subject of over-fitting on market data started trending in social and financial media. In that article, I offered a simple definition of data-mining bias for practitioners:
This [data-mining] bias is the result of the dangerous practice of using data from the in-sample multiple times with many combinations of indicators and heuristic rules until an acceptable equity curve is obtained.
This should make sense to those even not familiar with the subject: the more the options that are available, the easier it becomes to generate a model that can fool any validation tests. In fact, it has been shown that for 100 random features, about 5 of them on the average pass validation tests even on random samples. In other words, if you want to set strategy developers to fail, offer them many features to choose from. Eventually, some strategy will appear to generalize well but it will be essentially due to chance.
The above probably describes what happened to a popular platform that set to democratize trading strategy design by offering data and tools online to millions worldwide. Eventually, every indicator written by someone on that website and published in their forum turned to a feature used by thousands of others until some were lucky to come up with strategies that passed all validation tests and even won money in contests organized by the platform. It looked like a ticket to major success.
Actually, it was the beginning of the end. The models that won contests were funded but soon they found out they weren’t making any money. This is puzzling to some but this is why:
If you have a large population of strategy developers and thousands of features, then some strategies based on some of these features will pass all validation tests and even forward tests by chance alone.
This is similar to what happens in forex and stock investing contests and even in highly marketed forecasting contests. There are too many participants who are trying to win prizes and too many tools to use. At the end of the day someone will win. But the models may not be profitable or applicable to other series. Some of those forecasting competitions claim to be raising the level of understanding in the field of forecasting but they are also raising the data-mining bias. After every new competition based on new data, they find a different set of models that works. This is because the models are fitted on the data, or extracted from the data and are data-dependent, by machine learning algos or human trial-and-error, and at the end some lucky contestants win although they may also be capable in their field. But there is rarely gain for the forecasting field in general.
One possible solution for trading strategy development
It may make sense to use a minimalistic approach with a very small set of features that have economic value. How the economic value is determined is part of the edge. More importantly, only those who have a good understanding of the market and actual experience should be developing strategies. Letting amateurs try will rarely produce good results no matter how good their programming skills are. The combination of human experience and a small set of features may offer the best chances for success.
Furthermore, it may be a good idea to avoid tools or platforms that claim to offer hundreds of features. This could be an indication they may be lacking understanding of data-mining bias. It is easy to engineer hundreds or even thousands of features. It is hard to identify features that have economic value and can serve as the foundation of successful trading strategies. Even in this case, they may be periods of underperformance or even negative performance. Developing trading strategies is hard and success if not guaranteed.
Finally, a brief reference to my own work. When I started developing DLPAL LS software I decided to only work with a small set of features. There are actually only three features with two of them providing a measure of the directional probability of long and short positions and the other is a measure of significance level. These, if significant, are all you need to forecast the sign of the return for next period (daily or weekly but some customers have figured out ways to use it with intraday data) and develop directional and long/short strategies. Further additions of features will certainly increase data-mining bias. Note that there is no way to eliminate data-mining bias. The objective is to minimize it as much as possible. When trying many features, data-mining bias grows out of bounds and there is little that can be done to prevent Type-I errors (false positives) in validation samples.