Mixing alternative data and machine learning on financial markets
The last 10 years have seen an increase in the use of alternative data by market participants. These data are not financial data, for instance satellite images, texts, credit card rockets, geolocation of mobile phones, etc. They can give insights on the health of economic entities like companies, cities, or production of commodities. I will explain how to address these new types of data that have very specific characteristics compared to the ones previously used to inform financial decisions : unlike prices or traded quantities, they are not well structured in collections of time series. Unlike financial statements, they may be available on a small subset of entities and linking these entities to tradable instruments can be a challenge.
I will review them and explain how the tools provided by machine learning can be used to address them, enabling to provide nowcasting (as opposed to forecasting) indicators. Enhancing financial decisions with nowcasting provides a better connection with the real economy, that is well described by alternative data. Last but not least, if I have time I will focus on the use of texts on financial datasets.
For more details, please have a look at "Do Word Embeddings Really Understand Loughran-McDonald's Polarities?" (https://arxiv.org/abs/2103.09813) and wait a few months to read two forthcoming books "The Financial Ecosystem in Practice: From Post-Crisis Intermediation To FinTechs" (World Scientific 2022, C-A. L and Amine Raboun) and "Machine Learning And Data Sciences For Financial Markets: A Guide To Contemporary Practices" (Cambridge University Press 2022, Agostino Capponi and C-A. L editors).