Factors affecting the distribution of viral genomes in bacterial DNA: which models are consistent with existing data?
Viruses share complex relationships with bacteria, along the full spectrum from pathogen to obligate symbiont. Viruses and bacteria also share genetic sequences, and in particular bacterial genomes often include long sequences attributed to viruses ("proviral DNA"), ranging from complete viral genomes to partial genomes or isolated proviral genes. Data collating proviral DNA in sequenced bacterial genomes have recently become available from several sources, revealing a characteristic bimodal distribution in the length of proviral sequences. We develop a PDE model with an underlying stochastic component to describe the influx and decay of proviral DNA within a population of bacterial genomes. Fitting this model to three available datasets, we address which underlying processes are consistent with the data and statistically justified in model fitting. Preliminary results suggest that three processes are essential in maintaining proviral DNA: the integration of viral sequences into bacterial DNA through active infection; the loss of viral sequences from bacterial DNA as viruses kill their hosts; and the probability that the viral genome contains one or more gene sequences that benefit the bacterial host.