Conditional Importance Sampling for Off-Policy Learning

Abstract

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Publication
In International Conference on Artificial Intelligence and Statistics (AISTATS)

Related