Gaussian Exploration

Speaker:

Xunyu Zhou

Date and Time:

Tuesday, April 30, 2019 - 9:15am to 10:00am

Location:

Fields Institute, Room 230

Abstract:

We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration of a black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures repetitive learning under exploration. We carry out a complete analysis of the problem in the linear-quadratic (LQ) setting and deduce that the optimal feedback control distribution for balancing exploitation and exploration is Gaussian. This in turn interprets and justifies the widely adopted Gaussian exploration in RL, beyond its simplicity for sampling. Moreover, the exploitation and exploration are captured, respectively and mutual-exclusively, by the mean and variance of the Gaussian distribution. We also find that a more random environment contains more learning opportunities in the sense that less exploration is needed. Finally, we demonstrate the results by a continuous-time mean-variance portfolio selection problem.

The Fields Institute for
Research in Mathematical Sciences

Gaussian Exploration

Scheduled as part of

People and Contacts

Calendar and Events