Conditional Importance Sampling for Off-Policy Learning

Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

June 2020

Abstract

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Type

Conference paper

Publication

In International Conference on Artificial Intelligence and Statistics (AISTATS)

off policy

Conditional Importance Sampling for Off-Policy Learning

Abstract

Related