Model-based value expansion

Model-based value expansion (MVE) was first proposed by Feinberg et al. (2018). They expand the critic’s target value using a deterministic dynamics model.

Stochastic ensemble value expansion (STEVE) (Buckman et al. 2018) extend MVE by learning a dynamics model with an ensemble of probabilistic NNs. It combines different length value expansions by weighting them according to the inverse-variance from the dynamics ensemble.

Stochastic value gradient (SVG) (Amos et al. 2021) uses a deterministic dynamics model for MVE of policy (actor) objective but not the critic. It then uses SAC to get exploration using entropy.

Palenicek et al. (2022; Palenicek, Lutter, and Peters 2022) tried using the oracle dynamics (i.e. the simulator) for the model-based value expansion. They found that even when using the oracle dynamics, model-based value expansion could not improve sample efficiency. They suggest model-free value expansion (e.g. Retrace) is a strong baseline without the computational overhead of model-based methods. I can’t help but think they could increase the UTD when using MVE to see improvement in sample efficiency.

References

Amos, Brandon, Samuel Stanton, Denis Yarats, and Andrew Gordon Wilson. 2021. “On the Model-Based Stochastic Value Gradient for Continuous Reinforcement Learning.” In Proceedings of the 3rd Conference on Learning for Dynamics and Control, 6–20. PMLR.
Buckman, Jacob, Danijar Hafner, George Tucker, Eugene Brevdo, and Honglak Lee. 2018. “Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion.” In Advances in Neural Information Processing Systems. Vol. 31. Curran Associates, Inc.
Feinberg, Vladimir, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, and Sergey Levine. 2018. “Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning.” In Proceedings of the 35th International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1803.00101.
Palenicek, Daniel, Michael Lutter, Joao Carvalho, and Jan Peters. 2022. “Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning.” In The Eleventh International Conference on Learning Representations.
Palenicek, Daniel, Michael Lutter, and Jan Peters. 2022. “Revisiting Model-based Value Expansion.” arXiv. https://doi.org/10.48550/arXiv.2203.14660.
Next