Will Dabney

Research Scientist



Currently, I am a senior staff research scientist at DeepMind, where I study reinforcement learning with forays into other topics in machine learning and neuroscience.

My research agenda focuses on finding the critical path to human-level AI. I believe we are in fact only a handful of great papers away from the most significant breakthrough in human history. With the help of my collaborators, I hope to move us closer; one paper, experiment, or conversation at a time.


  • Distributional RL
  • Foundations of deep RL
  • Connecting Neuroscience & AI
  • Learning from sparse rewards


  • PhD in Computer Science, 2014

    University of Massachusetts, Amherst

  • BSc in Computer Science, 2007

    University of Oklahoma

  • BSc in Mathematics, 2007

    University of Oklahoma

Everything is theoretically impossible, until it's done.
— Robert A. Heinlein

Recent Posts

Giants all the way down

Just after the frenetic sprint leading up to a conference deadline, I often end up chastising myself for not being more single-focus in my research. This weekend, while recovering from the ICML deadline, I found myself replaying the same familiar thought patterns: “I want to work deeply on only a couple of projects”, then “What would those be now that the slate is clean”, and finally, realizing that I have listed a half-dozen projects that should start immediately!

How do we search for discovery?

Imagined for centuries, with unimaginable consequences and only examples from nature effortlessly demonstrating what is possible. The invention of flight bears numerous similarities with our on-going search for the discovery of human-level artificial intelligence (AI). Even today, as some of the most brilliant and successful people of our time recede into a type of paranoid fever dream about its dangers, others continue to argue true AI is simply impossible. We have been here before, and come …

Recent Publications

Quickly discover relevant content by filtering publications.

Adaptive Trade-Offs in Off-Policy Learning

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives of existing methods, and also naturally yields …

Conditional Importance Sampling for Off-Policy Learning

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Fast Task Inference with Variational Intrinsic Successor Features

It has been established that diverse behaviors spanning the controllable subspace of a Markov decision process can be trained by rewarding a policy for being distinguishable from other policies. However, one limitation of this formulation is the difficulty to generalize beyond the finite set of behaviors being explicitly learned, as may be needed in subsequent tasks. Successor features provide an appealing solution to this generalization problem, but require defining the reward function as …

A distributional code for value in dopamine-based reinforcement learning

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial …

A Geometric Perspective on Optimal Representations for Reinforcement Learning

We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functions. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate …