Will Dabney

Research Scientist



Currently, I am a senior staff research scientist at DeepMind, where I study reinforcement learning with forays into other topics in machine learning and neuroscience.

My research agenda focuses on finding the critical path to human-level AI. I believe we are in fact only a handful of great papers away from the most significant breakthrough in human history. With the help of my collaborators, I hope to move us closer; one paper, experiment, or conversation at a time.


  • Distributional RL
  • Foundations of deep RL
  • Connecting Neuroscience & AI
  • Learning from sparse rewards


  • PhD in Computer Science, 2014

    University of Massachusetts, Amherst

  • BSc in Computer Science, 2007

    University of Oklahoma

  • BSc in Mathematics, 2007

    University of Oklahoma

Everything is theoretically impossible, until it's done.
— Robert A. Heinlein

Recent Posts

How do we search for discovery?

Imagined for centuries, with unimaginable consequences and only examples from nature effortlessly demonstrating what is possible. The invention of flight bears numerous similarities with our on-going search for the discovery of human-level artificial intelligence (AI). Even today, as some of the most brilliant and successful people of our time recede into a type of paranoid fever dream about its dangers, others continue to argue true AI is simply impossible. We have been here before, and come …

Recent Publications

Quickly discover relevant content by filtering publications.

Adaptive Trade-Offs in Off-Policy Learning

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives of existing methods, and also naturally yields …

Conditional Importance Sampling for Off-Policy Learning

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Fast Task Inference with Variational Intrinsic Successor Features

It has been established that diverse behaviors spanning the controllable subspace of a Markov decision process can be trained by rewarding a policy for being distinguishable from other policies. However, one limitation of this formulation is the difficulty to generalize beyond the finite set of behaviors being explicitly learned, as may be needed in subsequent tasks. Successor features provide an appealing solution to this generalization problem, but require defining the reward function as …

A Geometric Perspective on Optimal Representations for Reinforcement Learning

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate …

Hindsight Credit Assignment

We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the observed outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions can be rewritten through this lens, yielding a new family of algorithms. We study the properties of …