Discovering Deep Knowledge from Relational and Sequence
This talk presents a novel method P2K (Pattern-to-Knowledge) with surprising findings that deepknowledge could be discovered from mixed-mode relational and sequence datawithout reliance on explicit prior knowledge. By deep knowledge, we mean the hidden and entangledassociations governed by different underlying factors that could not be discovered by traditional methodsbut could be unravelled in the different transformedstatistical spaces by P2K. From relationalmixed-mode and biological sequences datasets, P2K is able to discover the “what” and “where” ofcrucial associations, implicitly or explicitlyrelated to the source environments. It revealssubgroups of associations which might be masked or entangled due to multiple sources or underlying entangled factors with or without class labels. In bioinformatics, P2K is able to discover deep protein interacting knowledge up to the residue (amino acid) to residue interaction (R2R-I) level, purely from R2R contact data procured from a complex 3D physiochemical interaction environment. It reveals subtle yet specific interacting function between residues and their neighbors. It can use the knowledge discovered to direct theacquisition of additional deeper statistic features from massive data in the cloud via machine learning to achieve a muchhigher prediction rate. In relational datasets, it is able to discover and disentangle attribute value associations with or without class labels in different orthogonal statistical residual spaces. The deep knowledge discovered can be used to enhance experts’ insights and efficiency by shortening the search process and improving the predictiveanalysis.P2K will open an avenue todiscover deep knowledge for biology, drug discovery and medical research as well as engineering andbusiness practice. It meets a new challenge in the era of big data.