Wednesday, August 27, 2014

Bacterial Genome Size and Ecology

 I often find myself wondering about general evolutionary pressures that shape bacterial genome sizes, and I'm going to use this space to try and crystalize some thoughts. Part of this is motivated by my interest in understanding how horizontal gene transfer affects adaptive trajectories (see here), and part is motivated by trying understand how to define (and what actually structures) bacterial populations in the context of ecology and selection (see here). In the latter case, Monod's famous quote doesn't necessarily hold true...if you wanted to define an elephant population you could go out and count them. This is sadly getting easier and easier every day. For bacteria, you can't go out and count total number of cells because micro-environments matter and all cells don't experience the same selection pressures. Chemical, geological, and biological gradients are much coarser for elephants than E. coli and this can be reflected in population subdivision. These kinds of questions don't really matter if you care solely about presence/absence of organisms...but if you want to try and predict evolutionary dynamics (strength of genetic drift, etc...), you have to understand what defines population size. This whole introduction is just a long winded way of introducing an interesting idea that has popped up across a couple of papers and lately in discussions I had with Steven Nayfach (from Katie Pollard's lab) over sushi. Can we use differences in average bacterial genome size across environments to say something about microbial ecology?

Bacterial genome size vs. number of annotated genes, from Wikipedia

Small Population Size + Host Association = Small Genome

Obligate microbial symbionts often have tiny genomes compared to free-living ancestors. This is due to the absence of purifying selection on genes no longer necessary within this symbiont lifestyle, an increase in effects of genetic drift due to small population sizes, and a slight deletion bias in mutations throughout the genome. Basically, when genes are no longer necessary in small populations they can accumulate and fix more mutations randomly, and these mutations tend to biased towards deletions. When vertical transmission is assured, all genes necessary for survival outside of this transmission cycle become superfluous. We see parallel increased rates of gene loss and inactivation (and overall smaller genome sizes) in some free-living bacterial pathogens as well with similar population size / relaxed selection explanations. For these cases, genetic drift is a key factor.

In terms of defining ecology, if you find a particularly small genome in your sequences with lots of pseudogenes, you might be able to a priori guess that this bacterium has particularly low effective population sizes and may be a parasite.

More DNA is Costly = Selection for Small Genome

Although patterns of genome evolution in symbiotic bacteria are likely driven by genetic drift, there are cases where selection appears to directly drive genome minimization. The best known example of this is referred to as "genome streamlining", and is seen in a wide variety of oceanic bacteria including the notorious SAR11 clade. These genomes are typified by a reduced but highly conserved core gene repertoire, a reduction in paralogs, and a reduction in intergenic spacer regions. Non-mutually exclusive explanations for such selective pressures include low Nitrogen and Phosphorous levels within the ocean (making extra DNA energetically costly) as well as optimization of cell surface to volume ratios. The cell surface / volume ratio theory is particularly interesting because it parallels discussions of genome size evolution all life (termed the C-value paradox). How are cell size and DNA content related...well, DNA takes up space and the more DNA in a genome the larger the cell size. An aside: there's scarce evidence that DNA replication is costly for bacteria across a variety of other environments where transcription and translation are thought to be the most costly processes.

So if you find a particularly small genome in your sequences (regardless of environment) with little evidence of genetic drift (low number of pseudogenes and low mutation fixation rate amongst core genes) it might be evidence of selection acting on genome size. This could in turn indicate competition for a scarce nutrient that makes up DNA or necessity of transport across cell membranes.

Evolutionary Correlates of Larger Genomes

It's possible that increased genome size can be selected as a correlate of cell size. I don't know of any cases where such selective pressures have been directly demonstrated in bacteria, but the correlation between DNA content and cell size certainly appears to hold true. That's not to say that there are other emergent ecological properties that could also select for larger genome sizes. As long as DNA isn't too costly (an important caveat), in variable environments where cells must be capable of metabolizing a wide range of compounds, genome size can increase as additional metabolic pathways are acquired through horizontal gene transfer (here and example here). These extra pathways can keep accumulating as long as they aren't selected against too strongly (which, you guessed it, is going to be dependent on population size). Just a correlation at this point as far as I know, but many "soil" bacteria have relatively large genomes: pseudomonads, Burkholderia, assorted Rhizobia, etc...*

It's also possible that emergent evolutionary properties will arise as genome size passes a specific threshold. Since the success of long-distance horizontal gene transfer increases with genome size (that's a bit circular, but them's the data....could also be confounded by observation bias and correlated to environmental proximity), but it's possible that free-living bacteria with larger genomes undergo fundamentally different evolutionary dynamics than free-living cells with smaller genomes. Likewise, cells with larger genomes appear to grow more rapidly than those with smaller genomes. This might be due to number of ribosomal operons but also to the presence of multiple large secondary replicons in bacteria with larger genomes (the more replication forks there are, the faster total genomic content is replicated). Bacteria with larger genomes might also be able to better tolerate secondary replicons like megaplasmids, which may again fundamentally and qualitatively shift phenotypic and genotypic evolution (here and here). We like to think that everything that is true for E. coli is true for Pseudomonas, I'm not so sure given possible evolutionary feedback loops that are correlated with genome size. For instance, you see a lot more megaplasmids in Pseudomonas.*


I'm definitely missing some citations and angles on this, so please feel free to point me in any relevant research direction. It's an interesting idea to imaging extrapolating ecological data and evolutionary trends from differences in average genome size across microbial populations. There are a couple of papers I've stumbled into that try and to just that. There are probably a lot more out there...

Disqus for