Summer School on Data Science Tools and Techniques in Modelling Complex Networks
Description
Simply stated, data mining is the process of answering questions by analyzing data sets from different perspectives using algorithms which run on a mathematical representation of the data. The most commonly used data representation is the data matrix, where each row corresponds to an observation and each column represents a feature. In the special case where all features are numerical, this is called a vector space representation. A large number of algorithms can be applied to such data, including sampling and dimension reduction methods.
Not all data types naturally fit in the data matrix representation. In a relational data set, an observation involves two or more entities. Such data sets are often modeled via graphs or hypergraphs. A graph is a collection of vertices representing the entities, connected via edges, each of which represents a relationship between two vertices. Hypergraphs are used to model relations involving an arbitrary number of entities. Exploratory data analysis over relational data can be challenging. Slicing or sampling a relational data set tends to destroy its structure and not much can be learned from it. The missing or noisy data problem is also more problematic with relational data. For example, the addition or removal of an edge in a graph can considerably change properties such as the diameter.
We will explore various theoretical and practical aspects of relational data representation and mining. The format of the course will be a mix of lectures and demonstrations of various techniques over relational datasets using Python, Julia and Jupyter Notebooks.
Speaker Bios
François Théberge holds a B.Sc. degree in applied mathematics and computer science from the University of Ottawa, a M.Sc. in telecommunications from INRS and a PhD. in electrical engineering from McGill University. He has been employed by CSE since 1996 during which he was involved in the creation of the data science team as well as the research group now known as the Tutte Institute for Mathematics and Computing. He also holds an adjunct professorial position in the Department of Mathematics and Statistics at the University of Ottawa. His current interests include relational-data mining and deep learning. |
|
Pawel Pralat (http://www.math.ryerson.ca/pralat/) is an Associate Professor at Ryerson University and the Director of Fields-CQAM Lab on Computational Methods in Industrial Mathematics at The Fields Institute for Research in Mathematical Sciences. His main research interests are in modelling and mining complex networks. Since 2006, he has written 150+ papers with 100+ collaborators. He is trained both in (theoretical and applied) computer science as well as mathematics (M.Eng. and M.A.Sc. in CS, Ph.D. in Mathematics and CS), has strong programming and applied research skills, gained through experience in collaboration with the private sector (such as Microsoft Research, Google Research, NXM, Motorola, The Globe and Mail, BlackBerry, Alcatel-Lucent, Environics Analytics) as well as the Government of Canada. |
|
Bogumil Kaminski is the Head of Decision Analysis and Support Unit at Warsaw School of Economics. He is a member of the Management Committee of European Social Simulation Association (ESSA), and Vice President of Institute for Operations Research and Management Sciences (INFORMS) Polish Chapter. His field of expertise is operations research, with special focus on industrial applications of forecasting, optimization and simulation. He has 15 years of experience in teaching data science related topics at undergraduate, graduate, and MBA courses. He has been involved in development of core Julia language and its packages related to data science workflow. He is also one of the top answerers all time for Julia questions on StackOverflow. |
|
Przemyslaw Szufel is an Assistant Professor in Decision Support and Analysis Unit at Warsaw School of Economics. His current research focuses on methods for execution of large-scale simulations for numerical experiments and optimization. He is working on asynchronous algorithms for parallel execution of large-scale simulation in the cloud and distributed computational environments. He is an author or a co-author of several Open Source tools for high performance and numerical simulation (such as KissCluster, D MASON, Isislab SOF, SilverDecisions, PyCX), and actively participates in their development. He is also a co-author of various algorithms for distributed simulation models (such as AKG, AOCBA). |
Sponsors
Schedule
09:00 to 12:00 |
François Théberge, Tutte Institute for Mathematics and Computing |
12:00 to 13:00 |
Lunch
|
13:00 to 16:00 |
Bogumił Kamiński, SGH Warsaw School of Economics |
09:00 to 12:00 |
François Théberge, Tutte Institute for Mathematics and Computing |
12:00 to 13:00 |
Lunch
|
13:00 to 16:00 |
Bogumił Kamiński, SGH Warsaw School of Economics |
09:00 to 12:00 |
François Théberge, Tutte Institute for Mathematics and Computing, Pawel Pralat, Toronto Metropolitan University |
12:00 to 13:00 |
Lunch
|
13:00 to 16:00 |
Przemyslaw Szufel, Warsaw School of Economics, Bogumił Kamiński, SGH Warsaw School of Economics |
09:00 to 12:00 |
Modelling complex networks with random graphs - standard models and tools
Pawel Pralat, Toronto Metropolitan University |
12:00 to 13:00 |
Lunch
|
13:00 to 16:00 |
Graph-based analysis of spatial data and transportation system modeling in Julia
Przemyslaw Szufel, Warsaw School of Economics |
09:00 to 12:00 |
Modelling complex networks with random graphs - advanced models and tools
Pawel Pralat, Toronto Metropolitan University |
12:00 to 13:00 |
Lunch
|
13:00 to 16:00 |
Large scale hyper-graph analysis with parallel and distributed computing tools in Julia
Przemyslaw Szufel, Warsaw School of Economics |