It's probably because of the strong long-standing statistical underpinnings in economics and econometrics, and overall, risk prediction. For example, look at current research with fat-tail distributions and calculations for Expected Tail Loss (ETL), etc. These studies fit Student's t, Normal, Stable, and Pareto probability distributions to data and report that e.g. the Kolmogorov or Anderson-Darling goodness-of-fit distance is less for the normal distribution, that is, the normal distribution doesn't recover the area in the tails as well as the Stable and Student's t (for varying d.f.). Next, moving into time-series analyses, there is still tremendous merit in using ARIMA, ARMA, ARCH, and GARCH-type fitting to explaining conditional means and variances, and autocorrelation. You can simply use a handful of the methods listed above and perform an incredible amount of risk prediction that has scientific merit in financial risk management.

Next, for ML, the more you delve into non-linear manifold learning (ISOMAP, Laplacian eigenmaps, etc), metaheuristics (evolutionary algorithms and evolutionary strategies such as genetic algorithms, covariance matrix self-adaptation, ant colony optimization, particle swarm optimization), neural adaptive learning (ANNs), and many other ML and AI supervised methods you are essentially getting further away from deterministic and stochastic derivative-based gradient descent and Newton-Raphson methods -- which provide parameter uncertainty.

Recently, I started using particle swarm optimization for almost everything I optimize. Sure, Newton-Raphson is faster, is more consistent, and can provide parameter uncertainty, but as I see it, to correctly model risk one needs to combine realizations from many different distributions to build up a final uncertainty distribution. For this reason, the majority of time when I am not interested in Type I and II errors, power, AIC, etc, or for that matter the standard errors of coefficients, I typically spew out point estimates, and then use these as inputs to Monte Carlo analysis.

Regarding random forests (RF), it is one of the best unsupervised and supervised classifiers out there. RF simultaneously combines bootstrapping of the training samples and random feature selection to train each tree. Unknown test objects not in the bootstrapped training sample (called out-of-bag, OOB) are then "dropped" down each trained tree. Feature importance is then determine by comparing classification accuracy between permuted and unpermuted values of feature $j$ for all objects in the OOB via dropping them down the same trained tree. Breiman suggested using at least 5000 trees ("don't be stingy") and in fact his main ML paper on RF used 50,000 trees per run.

Artificial neural networks (ANNs) have been shown to be a universal approximator, however, they become quite expensive when compared with Newton-Raphson, providing you know the objective function your are fitting (optimizing). If you don't know the objective equation, then an ANN can be tremendously beneficial. Like I have said to other colleagues, in these cases, "the ANN is the equation." For this latter framework, where the ANN is the equation, if you draw random quantiles from distributions for various risk factors, and clamp each set of quantiles to the input nodes of the ANN, train using CV, the utility of an ANN combined with Monte Carlo will be immense.

Since it takes a decade or more to learn ML and AI at a level where you can tackle most any problem, there would be no expectation that someone with an extensive QF background would be able to pick it up quickly, and no expectation that the results would be better. ML and AI are best implemented for near NP-hard problems such as timetabling, shift and airline scheduling, and logistics, where there is not a closed form analytic solution (equation) that describes your model.