A tutorial on Manifold Learning for real data
Dimension reduction is used to compress large high dimensional data, to discover predictive features, or simply to understand the data generating process. Manifold learning is the most natural approach for the latter goal, whenever the data can be well described by a small number of parameters. ML is being used by scientists for analysis and discovery in data obtained by both observation and simulation.
This course will describe core algorithms for manifold learning,underscoring the importance of adopting the statistics know-how associated with these algorithms. Black-box use of these algorithms can present the user with distorted views of the data, also known as algorithmic artefacts.
Therefore, the course will outline a complete framework for performing ML on real data, with in-depth discussions of the modeling choices, as well as of the post-processing choices that influence the quality of the results.
These include selection of the local scale, choices of kernel function and graph Laplacian for preprocessing, and postprocessing methods to refine and correct distortions in the obtained data maps. Extensions to learning of vector fields will also be presented. The methods will be illustrated with examples from chemistry, astronomy, and the social sciences.
Joint with Dominique Perrault-Joncas, James McQueen, Yu-chia Chen, Samson Koellev, Hanyu Zhang