Bandits for Algorithmic Trading with Signals
We propose a class of execution algorithms that consists of a strategic layer and a speculative layer. The strategic layer is an optimal trading schedule that encodes the trader's objective, her tolerance to risk, and the impact of her own trades in the market. The schedule of the strategic layer contains child orders that are executed in the market over a trading horizon. On the other hand, the speculative layer relies on unsupervised learning from market features and on a set of trading decisions to optimise the execution of each child order of the strategic layer. Specifically, the speculative layer uses a new contextual bandit algorithm, which we call MTGP-LR, to learn the reward functions that map features to trading performance for each action. In our approach, the set of reward functions is a sample from a multi-task Gaussian process and our algorithm employs a new online change-point detection test to learn in non-stationary environments. As an application in optimal execution, we use MTGP-LR to incorporate short-term predictive signals which are used by the speculative layer to decide the timing of child orders. We use limit order book data from Nasdaq to showcase the performance of MTGP-LR for several shares where the strategic layer includes price trend and price pressure signals.