AdaGrad-參數稀疏


2

我讀到on Wikipedia

AdaGrad (for adaptive gradient algorithm) is a modified stochastic gradient descent algorithm with per-parameter learning rate, first published in 2011. Informally, this increases the learning rate for sparser parameters and decreases the learning rate for ones that are less sparse.

參數稀疏是什麼意思?我讀到AdaGrad隨時間平均/累積梯度。參數稀疏性是否指的是這些梯度的累積頻率(密集?)?還是其他?

2

Yes by sparse parameters they mean that they do not receive often/large updates. This naming is a little bit confusing, because this often happens when the data is also sparse, so that some features appear only a few times.