Continuous Control with Deep Dynamic Recurrent Reinforcement Learning for Portfolio Optimization
Recurrent Reinforcement Learning (RRL) techniques have been used to optimize asset trading systems and achieved outstanding results. However, previous work has been dedicated to systems with discrete action space. To address the challenge of continuous action and state spaces, we propose the so-called Deep Dynamic Recurrent Reinforcement Learning (DDRRL) architecture to construct a real-time optimal portfolio. The model captures the up-to-date market conditions and rebalances the portfolio accordingly. Under this general vision, Sharpe ratio, which is one of the most widely accepted measures of risk-adjusted returns, has been used as a performance metric. Additionally, the performance of most machine learning algorithms highly depends on their hyperparameter settings. Thereby, we equipped the agent with the ability to find the best possible architecture topology using an automated Gaussian Process with Expected Improvement as the acquisition function. Furthermore, we perturb the architecture to measure the robustness of the agent's investment decisions and output the resulting risk impact. Finally, our system is trained and tested in an online manner respectively along 20 successive rounds with data of ten selected stocks from different sectors of S&P 500 from January 1st, 2013 to July 31st, 2017. The experiments reveal that maximizing Sharpe ratio achieves superior performance over the buy-and-hold strategy.