Safe and Efficient Exploration in Reinforcement Learning
At the heart of Reinforcement Learning lies the challenge of trading exploration -- collecting data for identifying better models -- and exploitation -- using the estimate to make decisions. In simulated environments (e.g., games), exploration is primarily a computational concern. In real-world settings, exploration is costly, and a potentially dangerous proposition, as it requires experimenting with actions that have unknown consequences. In this talk, I will present our work towards improving efficiency and rigorously reasoning about safety of exploration in reinforcement learning. I will discuss approaches, where we learn about unknown system dynamics through exploration, yet need to verify safety of the estimated policy. Our approaches use Bayesian inference over the objective, constraints and dynamics, and -- under some regularity conditions -- are guaranteed to be both safe and complete, i.e., converge to a natural notion of reachable optimum. I will also present recent results on harnessing uncertainty for improving efficiency of exploration in model-based deep reinforcement learning, and on meta-learning suitable probabilistic models from related tasks.