Network Connectivity Based Filtering of Microbiome Data
Human Microbiome Project (HMP) is a large scale nationwide study that utilizes next generation sequencing technology (NGS) to investigate the relationships between the human microbiota composition, diet and health status. Fragments of DNA sequences obtained in these experiments are classified at a species level, and typically referred to as species or taxa. One particular characteristic of these studies is that the data are often quite sparse but collected on a large number of variables, many of which are possible contaminants. To remove possible contaminants, a data normalization step, known in microbiome literature as filtering is applied prior to analysis. Currently there is neither any consensus on filtering criteria used, nor is there an evaluation of loss due to filtering done. We propose a taxa co-presence network based data normalization method that removes extremely rare taxa and evaluates loss due to filtering.