Model-based value expansion
Model-based value expansion (MVE) was first proposed by Feinberg et al. (2018). They expand the critic’s target value using a deterministic dynamics model.
Stochastic ensemble value expansion (STEVE) (Buckman et al. 2018) extend MVE by learning a dynamics model with an ensemble of probabilistic NNs. It combines different length value expansions by weighting them according to the inverse-variance from the dynamics ensemble.
Stochastic value gradient (SVG) (Amos et al. 2021) uses a deterministic dynamics model for MVE of policy (actor) objective but not the critic. It then uses SAC to get exploration using entropy.
Palenicek et al. (2022; Palenicek, Lutter, and Peters 2022) tried using the oracle dynamics (i.e. the simulator) for the model-based value expansion. They found that even when using the oracle dynamics, model-based value expansion could not improve sample efficiency. They suggest model-free value expansion (e.g. Retrace) is a strong baseline without the computational overhead of model-based methods. I can’t help but think they could increase the UTD when using MVE to see improvement in sample efficiency.