NPCDS Workshop on Data Mining

Poster Titles and Abstracts

Huarong Chen, McMaster University
Pattern Recognition and Its Application in Detection of Customers' Children Information

In today's competitive market, the company wants to know what it should do to meet the demands from its customers or to attract new customers from a certain region, which kind of strategies should be employed by the company to maintain a profitable market share. The important information about customers such as the customer expenditure, the customer family income, and the customers' family composition will help the company a lot in its decision making.

In the project sponsored by MITACS and Rogers Communication Inc. we extract the pattern to predict the customers' children information based on their buying behaviors. Although the model is particularly focused on customers of video stores in Calgary region, the data mining methodologies applied in the model can be easily extended to the similar models for detecting other customer information.

Ahmed Hossain, University of Toronto
Selecting Differentially Expressed Genes in a Multifactorial Microarray Experiment

Microarrays are part of a new class of biotechnologies which allow the onitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in DNA microarray experiments is the identifcation of differentially expressed genes, that is, genes whose expression levels are associated with a response or covariate of interest. The primary goal of this study lies to identify differentially expressed genes in a designed microarray experiment Between male and female group. To fulfil our purpose of this study we characterize the data by a statistical model that accounts for relevant sources of variation in the data and then we consider test statistic values of the model parameters using appropriate contrast. Here in this case study we includes reading in the data, data display and exploration, as well as normalization and differential expression analysis. Keywords: DNA microarrays, Empirical Bayes, linear models, differential expression, Multiple Testing, False Discovery Rate.

Gatot Ilhamto, University of Guelph
Neural Networks Application in Fertility Prediction

We evaluate the performance of two predictive models for the waiting times to first birth among Indonesian women, the logistic regression and the neural networks. There is no significant difference between the two models, although the neural networks tends to give lower misclassification error.

Tanguy Pallaver, Laval University
Data classification from improved self organizing map

We present recent results on the unsupervised improved Kohonen network on the problem of ordering and classifying data structures. We address the question if the Kohonen network map can reflect small world architecture inherent in the data set.

Xu Wang, University of Waterloo
A New Mixture Discriminant model for Drug Discovery Data

In drug discovery, statistical models are a powerful tool for predicting activity of compounds against biological targets. In this supervised learning problem, descriptors of molecular structure (e.g. atomic weight, types of bonds, many other exotic characteristics) are used to predict activity. The features of drug discovery data include the rareness of active compounds, multiple mechanisms, and high dimensional descriptor spaces. Conventional mixture discriminant methods have difficulty finding the best model for the drug discovery data due to the complication of data sets and the number of parameters. It is believed that the biological activity of compounds only depends on several descriptors, so we introduce a new mixture model, which has fewer parameters, and seeks to predict using multiple subspaces (ie multiple mechanisms). The EM algorithm is used to estimate parameter, in conjunction with carefully chosen initial values and some other tuning parameters.

Rob Warren, University of Waterloo
Dynamic analysis of social networks

A interesting problem in Social Network Analysis (SNA) is their resilience to interference and how information flows from one person
to another. In the past, we have always approached these problems from a static or 'snapshot' perspective: all available data was lumped in the same analysis and a conclusion derived.

Our hyphotesis is that since the world is a dynamic system, the analysis should either be dynamic itself or at a minimum, conclusions based on static SNA metrics should be revisited.

We test our assumptions on Gnu Privacy Guard key trust databases, discuss several examples where the static assumption is counter-productive and suggest possible alternatives.

Li Xu, Montreal Neurological Institute
Improved method for analyzing MR spectroscopy imaging

Multiple sclerosis is a chronic disease of the central nervous system. Proton magnetic resonance spectroscopy (MRS) can non-invasively measure the metabolites in human brain and is helpful in research of progression of MS. We applied multivariate mixed-effect statistical models for repeated measurement to analyze the MRS data and modeled the covariance matrix to take into consideration the spatial correlations within MRS. The method was applied in a series of studies and demonstrated that the distribution of brain metabolites was different among MS patients in different disease phases. These studies also showed the correlations between the brain metabolites and clinical data such as disease duration and clinical disability.

The National Program on
Complex Data Structures

November 10-12,2005
Workshop on Data Mining
at the Fields Institute, Toronto

Poster Titles and Abstracts

The National Program on Complex Data Structures

November 10-12,2005 Workshop on Data Mining at the Fields Institute, Toronto

Poster Titles and Abstracts

The National Program on
Complex Data Structures

November 10-12,2005
Workshop on Data Mining
at the Fields Institute, Toronto