是否有通用的程序可以在事前或事後進行回測，以確保定量交易策略具有真正的預測能力，而這不僅僅是過去靠運氣實現的事情之一嗎？當然，如果我們對工作策略進行足夠長的搜索，我們最終將找到一個。即使採用向前走的方法，它本身也不會告訴我們有關該策略的任何信息。

有人談論白人的現實檢查，但在這件事上沒有達成共識。

37

是否有通用的程序可以在事前或事後進行回測，以確保定量交易策略具有真正的預測能力，而這不僅僅是過去靠運氣實現的事情之一嗎？當然，如果我們對工作策略進行足夠長的搜索，我們最終將找到一個。即使採用向前走的方法，它本身也不會告訴我們有關該策略的任何信息。

有人談論白人的現實檢查，但在這件事上沒有達成共識。

backtestingstrategy

18

Building an effective backtest is not significantly different than building any other kind of predictive model. The goal is to have similar behavior *out of sample* as you have *in sample*. As such, there are methodologies developed in statistics and machine learning that can be useful:

- Understand the bias/variance tradeoff. This is covered in many places. For a technical discussion, see lecture 9 of Andrew Ng's machine learning class at Stanford.
- You can certainly use a training and test dataset. But there are also other kinds of approaches that can be used. To list two common options: cross-validation (similar to having segmented data, but can help with parameter selection) and ensemble methods (using multiple models can outperform just one and further reduce the curve-fitting problem).

So a few general recommendations:

- Your guiding principle should be
**Einstein's razor**: 'Everything should be kept as simple as possible, but no simpler.' In other words, less degrees of freedom in your model equates to less chance for overfitting. In the statistics world, this can involve eliminating unnecessary parameters through a selection or regularization method. **Robustness**(in every respect) is also critical. Parameters that result in sharp changes in expected prediction error will be more open to the risk of overfitting. Similarly, if the model has no fundamental basis, then it should be applicable to a wide number of assets.- Lastly, this applies to any kind of model: understand your data, your model, your objectives, assumptions, etc. There have been countless mistakes made over time from people not understanding the meaning of their models, implications, and risks. This includes things like execution assumptions and transaction costs. Make sure that you take everything into account. Lead by being skeptical of your data, constantly asking what can go wrong, or how can the future be different. Is there any survivorship bias in your data, and if so, how can you control for it? Have you introduced any look-ahead bias?

7

The output of your model will be a realization of your assumptions. Shane's given you a great answer. Besides doing out of sample testing (i.e., calibrating on period X then testing in period Y only using info available at the time of each trade), I would add that you should test it in sub-periods. If you have a big chunk of data, break it up and see how it works on each subset of the data.

-2

Thanks for the answer as it tackles a lot of backtesting flaws, model parsimony, overfitting, survivorship bias, look ahead... But actually one can look at thousands of technical trading rules and other more sofisticated strategies, and maybe find the few ones that will answer all these problems. Nevetheless we would still be left with data snooping ie we have used our data set untill we find a satisfactory result.

9

I have seen Hansen's SPA ('Superior Predictive Ability') test and stepwise variants used for this purpose. Hansen's test is a Studentized version of White's Reality Check. The stepwise variants allow one to accept or reject the null of no predictive ability on a subset of some tested strategies while maintaining a familywise error rate.

In his book, 'Evidence-Based Technical Analysis,' David Aronson discusses the overfit bias very well, although I believe his techniques for minimizing the bias may only apply to technical strategies, because they rely on Monte Carlo simulations.

**References**

- P. R. Hansen, 'A Test for Superior Predictive Ability,' Journal of Business & Economic Statistics, vol 23, no 4, 2005, http://pubs.amstat.org/doi/abs/10.1198/07350010 5000000063.
- SPA google group
- Hsu, Po-Hsuan, Hsu, Yu-Chin and Kuan, Chung-Ming, 'Testing the Predictive Ability of Technical Analysis Using a New Stepwise Test Without Data Snooping Bias,' 2008, http://ssrn.com/abstract=1087044
- Hsu, Po-Hsuan and Hsu, Yu-Chin, 'A Stepwise SPA Test for Data Snooping and its Application on Fund Performance Evaluation,' 2006, http://ssrn.com/abstract=885364
- David Aronson's Evidence-Based TA.

23

Strictly speaking, data snooping is *not* the same as in-sample vs out-of-sample model selection and testing, but has to deal with *sequential* or *multiple* tests of hypothesis based on the *same* data set. To quote Halbert White:

Data snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory results obtained may simply be due to chance rather than to any merit inherent in the methody yielding the results.

Let me provide an example. Suppose that you have a time series of returns for a single asset, and that you have a large number of candidate model families. You fit each of these models, on a test data set, and then check the performance of the model prediction on a hold-out sample. If the number of models is high enough, there is a non-negligible probability that the predictions provided by one model will be considered good. This has nothing to do with bias-variance trade-offs. In fact, each model may have been fitted using cross-validation on the training set, or other in-sample criteria like AIC, BIC, Mallows etc. For examples of a typical protocol and criteria, check Ch.7 of Hastie-Friedman-Tibshirani's "The Elements of Statistical Learning". Rather the problem is that implicitly multiple tests of hypothesis are being run *at the same time*. Intuitively, the criterion to evaluate multiple models should be more stringent, and a naive approach would be to apply a Bonferroni correction. It turns out that this criterion is too stringent. That's where Benjamini-Hochberg, White, and Romano-Wolf kick in. They provide efficient criteria for model selection. The papers are too involved to describe here, but to get a sense of the problem, I recommend Benjamini-Hochberg first, which is both easier to read and truly seminal.

7

This blog post points to a presentation about backtesting and data snooping: http://www.portfolioprobe.com/2010/11/05/backtesting-almost-wordless/

I think the only non-datasnooping method there is is to trade live. But the problem of data snooping can be reduced by seeing how significant the backtest result is compared to what would have happened if the trades were random. Using this technology also makes it clear that backtesting results can easily be deceiving.