Offline reinforcement learning for optimal trading
We consider an offline learning problem for an agent who first estimates an unknown price impact kernel from a static dataset, and then designs strategies to liquidate a risky asset while creating transient price impact. We propose a novel approach for a nonparametric estimation of the price impact kernel from a dataset containing correlated price trajectories, trading signals and metaorders. We quantify the accuracy of the estimated kernel using a metric which depends explicitly on the dataset. We show that a trader who tries to minimise her execution costs by using a greedy strategy purely based on the estimated model will encounter suboptimality due to spurious correlation between the trading strategy and the estimator. By adopting an offline reinforcement learning approach, we introduce a pessimistic loss functional taking the uncertainty of the estimated model into account, and derive an asymptotically optimal bound on the execution costs even without precise information on the true price impact kernel. Numerical experiments are included to demonstrate the effectiveness of the proposed price impact estimator and the pessimistic trading strategy.
The talk is based on joint work with Eyal Neuman and Wolfgang Stockinger.