Forecasting ETFs using Machine Learning
In this work, we apply cutting edge machine learning algorithms to one of the oldest challenges in finance: Predicting returns. For the sake of simplicity, we focus on predicting the direction (e.g. either up or down) of several liquid ETFs and do not attempt to predict the magnitude of price changes. The ETFs we use serve as asset class proxies. We employ approximately five years of historical daily data obtained through Yahoo Finance from January 2011 to January 2016. Utilizing our supervised learning classification algorithms, readily available from Python’s Scikit-Learn, we employ three powerful techniques:
(1) Deep Neural Networks,
(2) Random Forests, and
(3) Support Vector Machines (linear and radial basis function).
We document the performance of our three algorithms across our four information sets. We segment our information sets into
(A) past returns,
(B) past volume,
(C) dummies for days/months, and a combination of all three.
We introduce our “gain criterion” to aid in our comparison of classifiers’ performance. First, we find that these algorithms work well over the one-month to three-month horizons. Short-horizon predictability, over days, is extremely difficult, thus our results support the short-term random walk hypothesis. Second, we document the importance of cross-sectional and intertemporal volume as a powerful information set. Third, we show that many features are needed for predictability as each feature provides very small contributions. We conclude, therefore, that ETFs can be predicted with machine learning algorithms but practitioners should incorporate prior knowledge of markets and intuition on asset class behavior.