大多數陰性測試示例的估計SVM概率太低嗎?


4

我正在使用LIBSVM(以及MatlabfitcsvmfitSVMPosterior)來訓練SVM模型並獲得概率估計。我注意到,絕大多數負面測試示例的估計概率太低(例如<0.01)。我不知道該怎麼解釋?

一個可以解釋的事實是,有充分的理由相信一些負面的訓練實例實際上是正面的。這將使分類邊界遠離否定示例,從而使估計概率太低。召回率相對較低也表明了這一方向。

[請注意,測試數據集更準確,標籤噪音更少]。

這個解釋有意義嗎?如果沒有,那可能是什麼原因?

1

The first thing you want to do is look at the outputs of your trained csvm (not the posterior probabilities!). What is happening is that the fitSVMPosterior tries to fit a sigmoid through the scores / outputs to generate the posterior probabilities. Plot the scores versus the class label. If the outputs do not seem to follow a sigmoid kind of curve, then you know you are in trouble since fitSVMPosterior will not be able to fit it. The best way to evaluate this is to train your classifier for example on a trainset, and plot the predicted scores on a testset versus their class labels.

Furthermore, you mention you use oversampling to adres the imbalance issue. You can also try use weights to train your SVM instead. Apparently Matlab's behavior is to set the weights in such a way they sum up to the prior probabilities (see here). So definitely try to use just a regular sample of your data as well to evaluate your posteriors.

What could be happening is that the svm model (svmc) that you train, is evaluated using the accuracy during cross validation. Furthermore, the svmc model uses the hingeloss. These performance measures do not say anything about the quality of the posteriors. So the problem is that the model tries to minimize the accuracy, and because of this the posterior quality might not be good. I'm going a bit on a limb here, but I'm going to assume you want a model that outputs proper posterior probabilities. So in my answer I'll detail how to do just that.

There are four options: (1) if you really prefer the SVM model, you could change the cross validation procedure to measure the quality of the posteriors, and you can perform cross validation using this performance measure to obtain better posteriors. (2) If you looked at the scores versus the outputs of the model of the SVM and did saw a pattern that could be classified somehow, but not using a sigmoid, it is possible to write your own function to fit the posteriors using a different model, but this could be a lot of work. (3) You could use a kernelized penalized logistic regression model, which directly optimizes the quality of the posteriors during the training procedure (my recommended solution). (4) You could use a Gaussian processes classification model, but they are quite hard to train in practice.

(1) To do this you would: train your model with some hyperparameters (cost, sigma of the kernel if you use a Gaussian kernel) on the training fold, fit the SVM posterior model on the training fold, and predict the posteriors on the test fold. Then compute the log likelihood on the testset using the posteriors and the class labels. Repeat this for all folds, and choose the hyperparameters that give the best log likelihood. Why? The log likelyhood measures the quality of the posteriors, so if this is optimized by cross validation, possibly you will get better posteriors. However, it might be the case that this will not work very well, since the csvm itself does not aim to give accurate posteriors.

2) Take a look at the Platt scaling that is used by fitSVMPosterior. What you can do instead of a logistic transformation, is to use binning. You can bin the scores, and compute the posterior for each bin. You can find some details here. Possibly this will give you better results, but it will likely be a pain to implement and is not used that often...

3) Penalized kernel logistic regression is similar to SVM's. It uses regularization (which corresponds to the cost parameter of the SVM) and can be used with kernels like the SVM model. Mark Schmidt has a nice Matlab implementation here. Take a look at the file minFunc_examples.m and then look for Kernel logistic regression. This model performs quite well for classification in terms of accuracy and can be used to get proper probability estimates.

4) Gaussian processes naturally compute posterior probabilities. If you want to know more I definitely recommend reading this free book. The website also contains code samples which you can use (however, it will take quite some reading to understand what you want to use when).

Finally, it is possible that all these models estimate small posterior probabilities. Maybe this is just optimal? Therefore, if you want the best posteriors, be sure to compare them using some performance measure. As I said before you can use the log likelihood to evaluate the quality of your posteriors.

But perhaps you have some costs for false positives and false negatives? Try to use the performance measure that you are interested in to evaluate your model. If you are going to compare your models, be sure to use proper cross validation, otherwise you will not be able to tell which model is better ;).