An Example of the Perils of Backtesting

I have discussed the perils of backtesting numerous times in this blog. It is important that all traders realize that backtesting hides many pitfalls and it is one of the three main contributing factors to massive losses by retail traders in the last 25 years.

I would like to thank Joshua Ulrich for talking the time to answer in his blog in detail my 3 notes about backtesting in R. Joshua is an expert in algorithmic trading and R and his input is invaluable.

Backtesting using any computer language can be a dangerous practice due to many factors, including coding errors and data snooping bias. Backtesting should be done only by qualified professionals that have an understanding of the complexity of this process. The idea that a retail investor or someone unskilled in this area can test algorithmic trading strategies and use them to profit is refuted almost every day by just considering the numerous errors and unrealistic assumptions in backtests presented in quantitative trading blogs.

Below is an example of real discrepancies between hypothetical results. Note that I will not discuss in detail the reasons for these discrepancies. This is left as an exercise for the reader.

The example is about backtesting the WR2 system. Long positions are entered after two consecutive down days and exited after two consecutive up days. Below are the YTD backtest results in SPY for $100K initial equity, fully invested capital and $0.01/share commission.


The WR2 trading strategy has an annualized return of 3.41% (2.96% return) at -9.8% maximum drawdown. Sharpe is 0.60.

Next, these are the results if the S&P 500 index is used instead of SPY:


The WR2 trading strategy in this case has an annualized return of 9.84% (8.52% return) at -11% maximum drawdown. Sharpe is 1.44. The differences from the previous test case are significant.

Note that the difference in CAR due to commissions is only about 50 basis points. However, the CAR for S&P 500 is larger than that for SPY by about 640 basis points and Sharpe values differ significantly.

As I already said I will not discuss in detail the reasons for these wide discrepancies but it suffices to say that overnight gains in SPY are different than those in S&P 500. In SPY and since the strategy buys at the open after a signal is generated it misses any overnight gains that are present in S&P 500 price data since the open is set at the close of the previous day. Overnight trading gains for SPY are about 5% year-to-date and that nearly explains the discrepancy in the results.

So what about all these backtests of various strategies, including momentum, using non-tradable indexes and often going back hundreds of years?

The short answer is that they are unrealistic. They could also be quite misleading. In most cases there are no instruments we can use to measure the discrepancies due to non tradability. These backtests should not be presented anyway. However, there are blogs and popular trading books that are based on such backtests. The examples above provided an idea of the magnitude of discrepancies that can arise from the use of non-tradable indexes with some strategies. Use of unrealistic trade entry points may make the difference between stellar performance and a mediocre or even bad one.  All strategies should be analyzed on a case-by-case basis. No general conclusions can be drawn in advance.

