Inaccurate Backtest Results Are Common

The financial blogosphere is flooded with inaccurate backtests of trading strategies and portfolio allocations. The empirical rule is to never trust any backtests unless verified by expert consensus.

Inaccurate backtests is not only a frequent phenomenon in the financial blogosphere but they can also be found in peer-reviewed academic papers and websites of financial advisers. Below I offer one example for each category and then I briefly discuss some of the sources of the errors. Note that we are not talking here about backtests of random strategies or allocations that usually fail after employed in actual trading or investing but about backtests that are inaccurate because of basic errors in implementing the strategies or allocations.

Blog example

Yesterday I noticed a peculiar result in a blog article of a website that sells volatility signals and strategies. The article had a chart of a backtest of the trivial long only strategy that trades XIV or VXX based on the VIX term stricture. The article stated the strategy rules and has an equity chart indicating that the maximum drawdown level was -77%.

However, the equity chart was wrong. In addition, the maximum drawdown of this strategy in the period of the backtests is about -56%, not -77%, as shown in the charts below. Click on Images to enlarge:

How widespread is this phenomenon of inaccurate backtests in financial blogosphere? Based on my personal experience, one in every three backtests is inaccurate or completely wrong save the fact that most backtests reflect over-fitting, selection bias and data-snooping and the end result is random. This is especially true with portfolio allocations where selections are made from a large universe until desired performance criteria are satisfied in backtests. It is also true with data-mining applications that synthesize strategies but are not carefully designed.

Fund manager example

A fund manager who invests customer money in quantitative strategies wrote an article about a year and a half ago that applied a simple trend-following strategy to SPY data. This fund manager who is assumed to be an expert in quantitative trading claimed that CAGR for the strategy was about 18%.

I have been backtesting trading strategies for over 25 years and I am aware of most potential errors and pitfalls but I could not think that a fund manager would not recognize look-ahead bias error. This is what he did actually: look-ahead. I described this in an article last year. Basically, when you see a nice looking exponential equity curve in a backtest, similar to the one below, you should always be suspicious about the presence of look-ahead bias. It’s not always the case but highly possible.

Academic paper example

In fact, look-ahead bias was present in results of a paper by a professor that was later used my many analysts in financial blogosphere as evidence of the high potential of price series momentum strategies. Another professor discovered the error as I reported in this blog article.

I expected this to cause shockwaves in the academic community fabric but no major reactions occurred. No one would accept that the peer review process failed. There are reports that results in 9 out of every 10 papers in social science that involve statistical analysis cannot be replicated. I see backtests of very complex strategies involving learners that cannot be replicated because the descriptions are vague. I suspect in most cases the results are inaccurate or even completely wrong but the papers suffice for the authors to get tenure or a job at a major firm. The end result is that most funds that rely on academic types fail. Since LTCM they have not learnt the lesson. More and more quantitative funds rely on complex results from academics when in fact in most cases these results are either inaccurate of completely wrong.

Backtesting: What can go wrong (besides data-mining bias)

1 Errors in backtesting software  

These are not uncommon. In the late 1990s, two of the most widely used backtesters had certain flaws that caused inaccurate or even false results for a large class of strategies. I documented those in an extensive study of about 20 pages but I never published it for obvious reasons.

2. Failure to understand assumptions and specifications

Most backtesters make certain assumptions and conform to certain specifications. For example, in a very popular backtester there is no provision for certain order types and all orders must be at market or limit. More importantly, backtesters employ different conventions for handling arrays of prices and variables. Failure to understand the assumptions and specifications may lead to trivial but also very severe errors, such as look-ahead bias.

3. Use of languages developed for different purposes

An example is using R or Python to backtest. Python is a general purpose language and R is a statistical analysis package. I have no idea who thought these should be used for backtesting, especially in R case. The probability of errors when using libraries developed by third-parties is high because all the assumptions made may not be known. Most experts in quantitative trading use languages like AFL by Amibroker.com that was developed by a top quant team in Poland or Easylanguage by Tradestation. These have endured the test of time. However, the academic community is impressed mostly by R or Python backtests. This is probably because the academic community wants to differentiate itself from practitioners and offer a sense of superiority. In my opinion, the price paid for this arrogance is high and includes inaccuracies and errors.

The empirical rule is to be suspicious of all backtests no matter who has produced them. Besides data-mining bias there are many other sources of errors that can lead to inaccurate or even false results. The only robust validation of backtests is forward trading.

If you found this article interesting, I invite you follow this blog via any of these methods: RSS or Email, or follow us on Twitter

If you have any questions or comments, happy to connect on Twitter: @mikeharrisNY

Charting and backtesting program: Amibroker

Disclaimer

Technical and quantitative analysis of Dow-30 stocks and 30 popular ETFs is included in our Weekly Premium Report. Market signals for longer-term traders are offered by our premium Market Signals service. Mean-reversion signals for short-term SPY traders are provided in our Mean Reversion report.

Copyright Notice

This entry was posted in Quantitative trading, Trading Strategies and tagged , , . Bookmark the permalink.

Leave a Reply