|
CIM PROGRAMS AND ACTIVITIES |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
December 21, 2024 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Overview:
|
Problem
1: Presenters - DBRS
|
Corporate Credit Estimates Problem Description |
The big picture project that we are working on is to take a large set of financial data from all tax filing companies in Europe over a limited time period and from this develop an entirely quantitative credit rating methodology. Analyst developed credit ratings are very time consuming and often all that is needed is an estimate. We are working to create a method that is either instant or at lease takes a minimal amount of an analyst's time. Credit ratings follow a scale from A (safe), B (investment grade), C (junk) and D (imminent default). Each of A to C are divided in to triple, double and single, (AAA, AA and A) from highest to lowest. Finally these are sometimes further divided in to high, stable and low (AA-high, AA-stable and AA-low) for a total of up to 28 possible ratings. Every rating category has an expected default rate, but the companies with higher ratings have very few defaults that occur while still in that category. Usually companies get downgraded before defaulting. We are looking for a model that can correctly rank companies against each other, then if some of the company's analyst ratings are known we can apply categorical ratings to the others. Our dataset consists of company information and related accounting information for all European companies. The company information is a company identifier, industry, size, country, and a series detailing the company's legal status over time. The accounting information is all income statement (earnings, expenses ...) and balance sheet (assets, liabilities ...) data as well as some share price and market data for public companies. Most of this data is only available yearly and there is at most 10 years of data for each company. We refer to one observation as the accounting data for one firm in one year. For most of the types of statistical models that we have looked at there are assumptions required on the data that we must test. First we need independent observations, but our observations are grouped by company and by year and then by industry and country. To add to the complexity here our observations are high dimensional. We should also know the distribution of our data, but we would need a way to test the distribution of multidimensional data. We derive our default data from the companies' legal statuses over time, some of which are very clear whether the company has defaulted or not, but others are fairly ambiguous. We hope to find a way to include these ambiguous statuses in our dataset as either healthy or defaulted companies. Can this be done without creating bias in our data? Finally, in accounting it is common knowledge that more information can be drawn from ratios of accounting variables than from the accounting variables themselves. These ratios are just simple multivariate functions of two to three accounting variables. In creating our model we will need to look for ratios that are better predictors of default than others, there is some intuition involved but we need to find a way to create, or at least choose, these ratios statistically.
|
Problem 2:
Presenters - The TMX Group
|
Within the capital markets ecosystem, volatility is defined as a measure for variations in the price of a stock over time. While volatility correlates with the frequency a stock is traded, little research exists on how volatility comes to exist in the first place. Using multiple data sources including Canadian stock market data, the Fields Institute Problem Solving Workshop in Big Data will partner with the TMX Group to identify causal factors creating price volatility differences between highly traded stocks, and those which trade less frequently.
|
Problem 3:
Presenters - GlaxoSmithKline
|
Pharmacovigilance in small sample sizes, rare adverse drug events, and low drug exposure prevalence |
Although early detection and assessment of drug safety signals are important, post-approval drug safety studies often face challenges such as small size, rare incidence of adverse outcomes, and low exposure prevalence after the launch of a new drug or vaccine. In addition, nonrandomized studies of treatment effects in healthcare data are vulnerable to confounding bias. Propensity Score (PS) methods are increasingly used to control for measured potential confounders, especially in pharmacoepidemiologic studies of rare outcomes in the presence of many covariates from different data dimensions of large administrative healthcare and electronic health records databases. The High-Dimensional Propensity Score (hd-PS) algorithm is a semi-automated software can select and adjust for baseline different characteristics of patients for drug and vaccine safety studies. This software is used by investigators including FDA Sentinel, European Medicines Agency (EMA) to monitor the drug and vaccine safety. The hd-PS algorithm prioritizes variables within each data dimension (e.g., inpatient diagnoses, inpatient procedures, outpatient diagnoses, outpatient procedures, dispensed prescription drugs) by their potential for confounding control based on their prevalence and on bivariate associations with the treatment and with the study outcome. Once variables have been prioritized, a predefined number of variables with the highest potential for confounding per dimension is chosen to be included in the PS. To early detect and evaluate drug safety signals is important, however the hd-PS may face the challenges in the situations such as small sample sizes, rare adverse drug events, and low drug exposure prevalence. Our proposed solutions to aggregate medical codes using hierarchical coding systems improved the performance of the hd-PS to control for confounders by reducing up to 19% bias in an empirical example. We will share the study findings and discuss further research to prove the benefits of this aggregation method. References
|
Problem 4
|
Women's Rugby Sevens |
The utilization of sports analytics is rapidly growing field that is changing team's approaches towards training, preparation, and competition tactics/approaches. In recognizing the benefits of incorporating sport analytics in order to gain a competitive advantage over teams, Rugby Canada Women's Sevens program has already been collecting a host of physiological, anthropometric, medical, positional, movement, technical and tactical data on its players and teams. Examining the relationship between the various data streams will address a number of gaps both on and off field. Tactical analysis will allow coaches to better target game strategies, training approaches, and player selection. Identification of key tactical indicators will allow coaches to adjust game strategy at optimal times. Targeted training programs based on key game tactical indicators will impact performance through improving tactical indices leading to better game understanding and decision-making. Understanding the relationship between datasets to tactical outcomes will create tactical performance profiles that would further our understanding of top player characteristics, fostering talent and helping with player selection procedures. We currently need expert support in effectively integrating results to assist the team in:
|
Monday May 25
9:00 - 10:30 Problem presentations and discussions 10:30 Coffee break 11 - 12:30 Problem presentations and discussions (continued) 12:30 Lunch onsite 1:30 - 3:00 Group discussions 3:00 Coffee break 3:30 - 5:00 Group discussions 5:00 Summary session Tuesday May 26
9:00 - 10:30 Problem presentations and discussions 10:30 Coffee break 11 - 12:30 Problem presentations and discussions (continued) 12:30 Lunch onsite 1:30 - 3:00 Group discussions 3:00 Coffee break 3:30 - 5:00 Group discussions 5:00 Summary session Wednesday May 27
9:00 - 10:30 Problem presentations and discussions 10:30 Coffee break 11 - 12:30 Problem presentations and discussions (continued) 12:30 Lunch onsite 1:30 - 3:00 Group discussions 3:00 Coffee break 3:30 - 5:00 Group discussions 5:00 Summary session Thursday May 28
9:00 - 10:30 Problem presentations and discussions 10:30 Coffee break 11 - 12:30 Problem presentations and discussions (continued) 12:30 Lunch onsite 1:30 - 3:00 Group discussions 3:00 Coffee break 3:30 - 5:00 Group discussions 5:00 Summary session Friday May 29
9:00 - 10:30 Final presentation 10:30 Coffee break 11 - 12:30 Final presentation 12:30 Lunch onsite
August 11-14, 2014
Fields-MPrime Industrial Problem-Solving WorkshopAugust 20-24, 2012
Industrial Problem-Solving Workshop on Medical Imaging
June 22-26, 2009
OCCAM-Fields-MITACS, Math-in-Medicine Study Group
August 11-15, 2008
Fields-MITACS Industrial Problem-Solving Workshop
August 14-18, 2006
Fields-MITACS Industrial Problem-Solving Workshop