Last year, I devoted substantial time and effort in upgrading the capabilities of my data-mining algorithms. I thought that was necessary because the rate at which signals are exploited has increased due to the widespread availability and use of quantitative tools. This blog provides an example of how unexploited opportunities are identified and evaluated in quantitative trading.
An unexploited opportunity in the market is a well-defined pattern of some kind that generates a profit. One way of identifying such opportunities is via back-testing but such process is in principle a dangerous practice because, among other things, when it is misused it leads to results plagued by data-mining bias. For example, if billions of combinations of indicators and exit signals are backtested on historical data, one will find that several, or even many of them, that satisfy some strict performance criteria. However, besides the fact that the future can be different from that past, acceptance of a result based on this practice ignores billions of other results that were not accepted, i.e., performance could be the result of chance, or even due to a fit on the data, something that is common when using genetic programming and neural network algorithms because they involve optimization of some objective functions. However, there is also the possibility of a result that is significant and represents an actual unexploited opportunity. How to we determine whether that is the case?
There are several approaches to determine whether a price pattern represents an unexploited opportunity. Here I show just one procedure for identifying patterns and then testing them to determine if they are significant.
To identify patterns, I use the program I have developed, Price Action Lab. Late last year, a new version was released that included a massive expansion of the data-mining capability of this program. Below is the output of the scan function of the program after the close of Monday, February 2, 2015. Daily unadjusted data since inception were used with the program to scan 12 popular ETFs.
In the screenshot above each line corresponds to an exact price pattern. P is the win rate, P1 is the 1-Bar win rate, Trades is the number of historical trades, CL is the maximum number of consecutive losers and Target and Stop the values of the profit target and stop-loss. C indicates the type of target and stop-loss, in this case it is a percentage added to entry price that is shown under Trade On as the open of next bar.
Patterns in three ETFs were found, a long in IWM, two short patterns in DBC and one short pattern in QQQ. However, when data-mining for patterns, the following problems are common:
- Data samples are usually small
- Patterns may be fitted to the data
- Data-mining bias may be large
One must deal with the above three issues before using a pattern. Unfortunately, most traders that use backtesting do not deal with such issues.
Evaluating the significance of patterns
The first step in the evaluation involves increasing the data sample. Normally, hundreds, or even thousands, of trades are required for statistical significance of trading signals. This is partly due to the fact that trade data samples may not be representative of the actual population. For example, 30 trades in 1-minute data do not represent a significant sample and thousands of trades are required to get a representative sample. In the example of this blog we use backtesting on a large group of comparable securities to increase the data sample. The idea is that if a pattern is significant, then this should be reflected in the form of a large bias in portfolio backtests. For this purpose the S&P 500 group of stocks was used with data since 01/2000 and the results are shown below.
The Last and First Date columns in the original results were replaced by the portfolio expectation and success rate (the number of profitable tickers out of the 500) and the P1 column was replaced by the portfolio profit factor, i.e., the ratio of the sum of winners to the sum of losers. It may be seen from the above results that the pattern in IWM was profitable in 60.68% of the stocks and the profit factor is 1.11. The number of trades increased to 11,844. The results for the other patterns show a negative expectation and profit factor less than 1. Are the results for IWM significant?
We can never be 100% sure of significance. Even if probability is high, the next trade can be a loser. However, if the original win rate of the pattern is high, then we can maximize any chances of success. This is one reason that win rate is the most important performance metric of trading strategies, in spite of some popular misconceptions.
One way of minimizing the chances of a curve-fit, even on 500% securities, which is a result of large data-mining bias, is via a robustness test. There are many ways of doing robustness test but here we use one that varies the profit target and stop-loss of the pattern, in this case both are 2%, and then evaluates an index that is equal to the ratio of the number of positive expectations to the total number of tests. We will test only the IWM pattern because the others were already rejected:
The Robustness Index is 84.21. As it is also shown on the generate graphs, expectation was negative only for profit target and stop-loss less that 0.63%. But what does this mean?
The above analysis means that the patterns may not be rejected due to curve-fitting but still this does not allow us to accept the patterns based only on these results. The reason for that is that the curve-fitting may be so intelligent that the robustness test cannot identify it. As a matter of fact, over-optimized systems usually tend to be robust to parameter variation by the very own nature. Actually, one may be easily fooled by robustness tests of they are not placed in proper perspective.
We have identified some potential opportunities in the market as of the close of February 2, 2015 and we have performed analysis for the purpose of:
- increasing data samples
- testing robustness
We concluded that the pattern in IWM had some potential because it showed a high positive bias in portfolio backtests that increased the number of trades from 37 to 11,844. Also, due to its high win rate of 78.38%, it offered a high-probability setup. This long signal in IWM hit its profit target on Thursday, as shown on the chart below.
Again, the procedure followed above for identifying and testing trading opportunities is just one out of several possible. It involved algorithmic and deterministic data-mining, i.e., data-mining that on same data always produces same results. Reproducibility is important otherwise there is no testing that can be done that conforms to scientific standards.
You can subscribe here to notifications of new posts by email.
Disclosure: no relevant positions.
Charting program: Amibroker