我讀到on Wikipedia

AdaGrad (for adaptive gradient algorithm) is a modified stochastic gradient descent algorithm with per-parameter learning rate, first published in 2011. Informally, this increases the learning rate for sparser parameters and decreases the learning rate for ones that are less sparse.



Yes by sparse parameters they mean that they do not receive often/large updates. This naming is a little bit confusing, because this often happens when the data is also sparse, so that some features appear only a few times.