收斂速度值迭代


8

值迭代是解決馬爾可夫決策過程的最常用方法之一。它的收斂速度顯然取決於狀態和動作的數量。但是,收斂速度在狀態/動作數相似的不同MDP之間也存在很大差異。

是否存在暗示收斂緩慢的特定特徵?我們是否可以對此類系統進行重新建模以加速價值迭代的收斂?例如,以我的經驗,具有稀疏過渡矩陣的系統的收斂速度通常比具有密集過渡矩陣的系統慢得多。如果在轉移矩陣中存在所謂的循環,則尤其如此。

2

I will answer on the basis of one particular case of MDPs. There have been a few recent articles that extensively discuss the issues of the diversification of MDPs and introduce a new model known as the MDP-DIV (Markov Decision Process for Diversifying). For that I refer you to Xia et al. (2017).

It is identified that the main causes for slow convergence are data scarcity/sparse transition matrices as you have pointed out, and a large action space.

Two new approaches to accelerate the convergence rate are detailed in Liu et al. (2018). Here, I will quote the relevant parts of the abstract.

The MDP-DIV-kNN method adopts a $k$ nearest neighbor strategy, i.e., discarding the $k$ nearest neighbors of the recently-selected action (document), to reduce the diversification searching space.

The MDP-DIVNTN employs a pre-trained diversification neural tensor network (NTNDIV) as the evaluation model, and combines the results with MDP to produce the final ranking solution.

The experiment results demonstrate that the two proposed methods indeed accelerate the convergence rate of the MDP-DIV, which is 3x faster, while the accuracies produced barely degrade, or even are better.


References

[1] Xia, L., Xu, J., Lan, Y., Guo, J., Zeng, W., Cheng, X. (2017). Adapting markov decision process for search result diversification. pp. 535–544. SIGIR ’17, ACM, New York, NY, USA.

[2] Liu, F., Tang, R., Li, X., Ye, Y., Guo, H., He, X. (2018). Novel Approaches to Accelerating the Convergence Rate of Markov Decision Process for Search Result Diversification. https://arxiv.org/pdf/1802.08401.pdf.