MNIST Softmax回歸以70%的精度撞牆


1

作為Android機器學習程序的一部分,我已經在Java中實現了Softmax回歸算法。但是,不管我讓它運行多長時間,精度都會達到約70.5%,然後無限期達到平穩狀態,但是我從中獲得數據的站點指出,我應該接近90%。我已經一遍又一遍地遍歷我的代碼,無法找到源,所以我希望您能夠提供幫助。我不知道為什麼它會成功提高到大約70%然後停止。This is the Softmax formula that I'm usingthis is the dataset I'm using

這是我的softmax算法:

public List<Double> gradient(List<Double> weights, double[] features, int type){
        int D = Constants.featureSize;
        int K = Constants.numberOfClasses;

        List<Double> grad = new ArrayList<Double>(D*K);
        for(int i = 0; i < D*K; i++){
            grad.add(i, 0.0);
        }
        //Σ(i:k) exp(Θ_i · X) 
        double dot = 0;
        double denom = 0;
        for(int i = 0; i < K; i++){
            //dot product w_i*x
            dot = 0;
            for(int j = 0; j < D; j++){
                dot += features[j] * weights.get(j + (D*i));
            }

            denom += Math.exp(dot);
        }

        //regularization constants
        double[] regular = new double[D * K];
        for(int i = 0; i < D * K; i++){
            regular[i] = 2 * weights.get(i) * Constants.regularizationConstant;
        }

        double prob;
        //prob_i = exp(Θ_i · X)/denom
        for(int i = 0; i < K; i++) {
            //dot product w_i·x
            dot = 0;
            for (int j = 0; j < D; j++) {
                dot += features[j] * weights.get(j + (D * i));
            }
            prob = Math.exp(dot) / denom;

            //∇_0_i = -X(1{i = y} - prob_i)
            int match = 0;
            if(i == type){
                match = 1;
            }
            for (int j = 0; j < D; j++) {
                grad.set(j + (D * i), -1 * features[j] * (match - prob));
            }
        }

        //apply regularization
        for(int i = 0; i < D * K; i++){
            grad.set(i, grad.get(i)+ regular[i]);
        }

        return grad;
    }

在這裡我將漸變應用於權重(寫為javascript服務器):

for (i = 0; i < length; i++) { 
                adaG[i] += gradient[i]*gradient[i];
                newWeight[i] = currentWeight[i] - ((c/Math.sqrt(adaG[i]+eps)) * gradient[i]);
            }

weight = newWeight

最後,如果您認為這可能是我的準確性測試中的錯誤,請使用以下代碼:

var correct = 0;
        var error = 0;
        var labels = require('fs').readFileSync(testLabels).toString().split('\n')
        var features = require('fs').readFileSync(testFeatures).toString().split('\n')
        for(i = 0; i < N; i++){
            var classResults = [];
            line = labels[i];
            var label = parseFloat(line, 10);
            line = features[i];
            var featureStr = line.split(/,| /);
            function valid(str) {
                    return str != "";}
            var featureClean = featureStr.filter(valid);
            var featureArray = [];
            for(var j=0; j<featureClean.length; j++) { 
                featureArray[j] = parseFloat(featureClean[j], 10);}

            for(h = 0; h < K; h++){
                dot = 0;        
                for(j = 0; j < D; j++){
                    dot += featureArray[j]*testWeight[j + (h*D)];}
                classResults[h] = dot;      
            }
            var bestGuess = 0;
            for(h = 0; h < K; h++){
                if(classResults[h]>classResults[bestGuess]){
                    bestGuess = h;}
            }

            if(bestGuess == label){
                correct++;}
            //else{
                //console.log(classResults)
                //console.log('correct: ', label)}      
        }

        var accuracy = correct/N;
        console.log(accuracy)

如果有人可以告訴我我在哪裡出錯,請告訴我。

3

After running more tests with different parameters, I found that the issue was that the initial learning rate was set too low. If I'm perfectly honest I'm pretty frustrated with myself that the solution was that simple, but that's all it was.