機器學習與回歸和/或為什麼仍使用後者？

PS-我潛伏了一段時間。感謝您提出的所有令人敬畏的問題，答案和討論。

I was going to comment but it turned out to be quite elaborate.

My experience with certain AI/ML methods is that they're not deterministic. Take RBM for instance, a very wide-spread paradigm. To train such a machine you have two approaches, backpropagation or Kullback-Leibler divergence. Both require you to initialise the machine to a random state. And that makes them non-deterministic.

Even more problematic for instance are methods like Simulated Annealing or Genetic Programming to find maxima or minima in a landscape where every single step contains a non-deterministic component.

In general there's nothing wrong with a hint of non-determinism, especially in our racket where nearly everything is a stochastic process. It just becomes a problem when you need a reproducible result, for instance when comparing a live session with some backtest. It's just very hard to decompose tracking errors (e.g. divergence between paper session and live session) into systemic errors and the divergence of, say, the random generators (or more general the sources of non-determinism).

Edit:
Conversely, if you try to turn a non-deterministic algorithm into a deterministic one by using a fixed (or deterministic) sequence of ``non-determinism'' the algorithm will collapse to a variant that's probably less efficient than a good deterministic one (e.g. deterministic SA vs. greedy search); of course I can't prove that here but if I had to I'd show that the probability to pick a sequence of randomness out of all possible sequences of randomness that reflects exactly the invariants before or after each step of a greedy round is abyssmally small.

Because of:

• The (extreme) dominance of noise over signal
• The prevalence of non-repeating patterns (many of which we know are not going to repeat)
• A pathetic sample size for cross-validation
• Regime changes due to exogenous events. These are typically in the cross-val window which makes it even worse. (GFC, financial integration, trade law changes, interest rate adjustments by central banks, some idiot in a bank was hiding trades and loses 5 billions dollars, etc).
• It is well known that non-linear relationships are generally just artefacts of the in sample dataset

There is also the following:

• Much price changes are driven by news such as a plane crashing or a merger announcement. Are you trying to forecast news (!?) by getting your model to learn non-linear relationships on price data? It should be clear that, if American Airlines price falls due to a terrorist hijacking, it is not going to be useful to have a random forest learn any patterns that result since it will not repeat.

Because of these factors many (econometricians and practicioners) will try to use a priori knowledge to select features and impose constraints on the model in an attempt to improve generalization. This is perceived as necessary by econometricians since the data is too thin, noisy and nonstationary (i.e., the above reasons).

This is not to say that "machine learning" methods such as Lasso, NNG, Elastic Nets or Ridge can't be applied. They result in essentially linear models and you can impose whatever a priori constraints on it through the metaparameters in the loss function or by using a variant that preserves hierarchies when using indicator function interactions (Tibshirani 2013...). Edit: You will still need to select which features go into the algorithm (as a prior imposition) but you can use these to achieve slightly more sparsity than you would otherwise have and introduce some bias into your conditional expectation (or state probability if you're doing multinomial categorical GLM) for improvement in variance of sampling distribution.

I am however open to random forests with the right a priori constraints in place.

There are indeed hundreds of papers that use machine learning to forecast financial markets. Just google something silly like "fuzzy bayesian expert adaptive learners with PSO training S&P 500" and you will get a lesson in the file-drawer effect, publication bias and substandard research methodologies (e.g. selecting 3 of 50 algorithms and 2 of 50 indices and hoping it convinces people).

However, the above is an optimist's view of the industry. From those I've spoken to at low frequency funds they are simply ignorant of machine learning and couldn't apply it because they lack the knowledge and skills. If they were actually interested in being true quants, who knows how much damage they could do with deep learning or something.

If you want to do real machine learning in finance and actually do something that is meritocratic/skill/scientific instead of almost completely random and full of people who practice nonsense, go to a HFT firm (not that most people practice non-sense in low frequency funds, just that many do and this is something that is absolutely impossible to get away with in HFT). That said, I am continually and consistently underwhelmed when I hear of the research methods of low frequency quant funds.

It's probably because of the strong long-standing statistical underpinnings in economics and econometrics, and overall, risk prediction. For example, look at current research with fat-tail distributions and calculations for Expected Tail Loss (ETL), etc. These studies fit Student's t, Normal, Stable, and Pareto probability distributions to data and report that e.g. the Kolmogorov or Anderson-Darling goodness-of-fit distance is less for the normal distribution, that is, the normal distribution doesn't recover the area in the tails as well as the Stable and Student's t (for varying d.f.). Next, moving into time-series analyses, there is still tremendous merit in using ARIMA, ARMA, ARCH, and GARCH-type fitting to explaining conditional means and variances, and autocorrelation. You can simply use a handful of the methods listed above and perform an incredible amount of risk prediction that has scientific merit in financial risk management.

Next, for ML, the more you delve into non-linear manifold learning (ISOMAP, Laplacian eigenmaps, etc), metaheuristics (evolutionary algorithms and evolutionary strategies such as genetic algorithms, covariance matrix self-adaptation, ant colony optimization, particle swarm optimization), neural adaptive learning (ANNs), and many other ML and AI supervised methods you are essentially getting further away from deterministic and stochastic derivative-based gradient descent and Newton-Raphson methods -- which provide parameter uncertainty.

Recently, I started using particle swarm optimization for almost everything I optimize. Sure, Newton-Raphson is faster, is more consistent, and can provide parameter uncertainty, but as I see it, to correctly model risk one needs to combine realizations from many different distributions to build up a final uncertainty distribution. For this reason, the majority of time when I am not interested in Type I and II errors, power, AIC, etc, or for that matter the standard errors of coefficients, I typically spew out point estimates, and then use these as inputs to Monte Carlo analysis.

Regarding random forests (RF), it is one of the best unsupervised and supervised classifiers out there. RF simultaneously combines bootstrapping of the training samples and random feature selection to train each tree. Unknown test objects not in the bootstrapped training sample (called out-of-bag, OOB) are then "dropped" down each trained tree. Feature importance is then determine by comparing classification accuracy between permuted and unpermuted values of feature $$j$$ for all objects in the OOB via dropping them down the same trained tree. Breiman suggested using at least 5000 trees ("don't be stingy") and in fact his main ML paper on RF used 50,000 trees per run.

Artificial neural networks (ANNs) have been shown to be a universal approximator, however, they become quite expensive when compared with Newton-Raphson, providing you know the objective function your are fitting (optimizing). If you don't know the objective equation, then an ANN can be tremendously beneficial. Like I have said to other colleagues, in these cases, "the ANN is the equation." For this latter framework, where the ANN is the equation, if you draw random quantiles from distributions for various risk factors, and clamp each set of quantiles to the input nodes of the ANN, train using CV, the utility of an ANN combined with Monte Carlo will be immense.

Since it takes a decade or more to learn ML and AI at a level where you can tackle most any problem, there would be no expectation that someone with an extensive QF background would be able to pick it up quickly, and no expectation that the results would be better. ML and AI are best implemented for near NP-hard problems such as timetabling, shift and airline scheduling, and logistics, where there is not a closed form analytic solution (equation) that describes your model.

The main reason to use traditional methods is interpretability. Specially when you are dealing with portfolios. Portfolios are nothing more than a linear combination of assets. Many Machine Learning methods are highly non-linear and therefore are hard to replicate with a real portfolio. For example if you want to minimize volatility of your emerging markets stock portfolio with a currency hedge portfolio a traditional optimization through a regression problem would give you weights on individual currencies but a Machine Learning method like SVM, ANN, etc give you no clue on what to buy or sell.

Having said that I think non traditional methods could work very well on feature selection, outlier or anomaly detection or classification. I see no problem on combining them as long as you can still interpret the model.

I was just like you when I started out: I had learned a lot about machine learning (mainly neural networks and genetic algorithms/programming) and used it heavily. I also had learned about classic statistics but not nearly as much as about ML.

The problem with ML is - as I see it today - that you are often taking a sledgehammer to crack a nut, meaning: Because financial markets are so highly stochastic you have a lot of overfitting and often are confusing noise for signal.

Why is that such a problem with ML? Because you just have so many parameters to tinker with! Take ANNs as an example: How many layers? How many neurons? Which learning algorithm (each with many different parameters)? Which stopping criterion? How to find the best combination of these parameters - with genetic algorithms? Or just as an art? Combine different models with an ensemble approach? And so on...

Then comes interpretability because ANNs are a black box. Even if the results are promising you don't know what the model has actually learned.

Over the years I came to appreciate classic statistics because they are still the benchmark. I agree that they are mainly about linear models and Gaussians - something that is clearly wrong in financial markets, but it is a starting point!

Today I try to use modelling methods that represent a sweet spot:

• They must be simple enough to be interpretable,
• they must be complex enough to reproduce the most important stylized facts of financial markets,
• there must be some kind of economic intuition involved why they should work.

So my humble conclusion is that it would be wrong to just use linear regression and the normal distribution but it would be equally wrong to just use some kind of super ML algorithm to find the right solution for you. When I have learned one thing over the years is that modeling the stock market is first and foremost an ongoing lesson in humility... and you still have to do the thinking yourself.

Pretty much agree with what everyone is saying above. Just want to add one more comment. The sad truth of not advocating a lot on the usage of ML in Asset Management is the difficulty to marketing it. Most of the pitches on the quant portfolios are trying to make a systematic fundamental (these days called quantamenal) story. ML methods are apparently not in line with the quantamental story.

Last comment, act as a dual role of quant research and portfolio management, I presonally like ML a lot. ML signals could be with low correlation with traditional ones and provides the right direction of turnover in extreme periods where no diversification could be found among most of the traditional signals.