关于VI、PI和截断PI收敛速度比较的度量问题 #36
Replies: 1 comment
-
|
Hi Professor Zhao, PI: each outer iteration requires solving the Bellman equation to convergence (or equivalently, a matrix inversion), which is very expensive. So my question is: If we use total computational cost (e.g., total number of Bellman backups across both inner and outer loops) as the x-axis instead, is there a known theoretical result on which algorithm converges faster? I understand the main purpose of the unified framework in the book is to show that VI, PI, and TPI are special cases of the same algorithm (with the inner-loop iteration count jj |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
赵老师您好,


书中第4章比较VI、PI和截断PI的收敛速度时,图中横轴用的是外循环次数(即策略改进的轮数)。在这个度量下,PI收敛最快,截断PI居中,VI最慢。
但我觉得这个比较不太公平,因为每轮外循环的计算量差别很大:PI每轮要把策略评估做到收敛,代价很高;VI每轮只做一次Bellman最优备份,代价很低。
所以想请教:如果以总计算量(比如Bellman备份的总次数)作为度量,关于三者谁收敛更快,学界有没有一般性的理论结论?还是说这本质上是问题相关的(取决于状态空间大小、折扣因子γ等),没有统一定论?
Beta Was this translation helpful? Give feedback.
All reactions