Chris Nota, Philip S. Thomas, Bruno C. da Silva: Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods. ICML 2021: 8238-8247