Thursday, November 21, 2013

So you want to do "experimental evolution"

Rich Lenski and his lab are getting a lot of well deserved publicity lately because they have published yet another awesome paper from their long term evolution experiment (LTEE). The success of the LTEE has no doubt sparked a bunch of researchers out there to go "hmm...I can do that!". I'm guessing that I was in third grade or so when Rich started the LTEE, and I have only been tangentially associated with the Lenski research lineage (who in my own experience are as smart and helpful as their mentor), but I've set up long-ish term lab passage experiments a couple of different times with different systems. There are a few things I've learned along the way that I think would be helpful to share with others jumping into the experimental evolution game, and hence this post. Please feel free to add suggestions to this list, or to contact me off-blog if you'd like to talk shop. The best tribute I can have for Rich is to provide as much help for the community as he and his students have for me over the years. I say this every time, but thank you very much!

1. Let the question guide your experiment.  We all have our favorite microbes (OFM), and the reaction that I've seen time and time again is to want to perform an evolution experiment with OFM just to see what would happen. I can assure you that OFM will evolve and adapt to passage conditions and will do so quickly, but what does this really tell you? My first piece of advice colors everything from here on out, and it's to focus on finding a question to ask and only then find the best microbial system to work with. E. coli works great for understanding general evolutionary principles, and in fact one of the most important questions to ask yourself should be "why not do this with E. coli?", but this would be a terrible system to study sporulation. Find the question that excites you and then find the system, it's easy enough to set one up if you know what to look for.

2. Once you've got the system, make sure you can measure fitness. A major piece of the LTEE is the ability to compare phenotypes and genotypes of cells from one generation vs. all others. For any evolution experiment to work, however, you need to be able to demonstrate that that evolution takes place. Competitive fitness assays are just one way to do this, but they are a very powerful test because they enable direct comparisons between strains. In order to carry out competitive fitness experiments, you need to be able to distinguish two strains from one another within a single culture under conditions that closely approximate passage. Rich's experiment directly competes strains that differ in arabinose utilization (Ara+/Ara-), which under the correct plating conditions enables you to visualize different strains by color (red/white). In many cases, such a simple phenotypic comparison isn't easily accomplished. In my first stab at an evolution experiment I was investigating the effect of natural transformation in Helicobacter pylori. Out of necessity, I designed my competitive fitness experiments slightly different than Lenski's because I was using antibiotic markers. Instead of directly competing evolved strains against each other, I would compete evolved strains vs. an ancestral "control" strain which was doubly marked with kanamycin and chloramphenicol. This isn't quite as elegant as I'd like, but I wanted to avoid confounding my evolution results with compensation for these phenotypic markers (in Rich's case, he spent a lot of time demonstrating that the Ara marker is a neutral change under his passage conditions, this often isn't the case for antibiotic resistance). At first I simply tried to plate out the same competition onto non-selective media and kan/cam media, but found that the variance in ratio of evolved/control strains was way too high to be reliable for fitness estimates. For instance, in some cases there would be more colonies on the kan/cam plates than on the non-selective media. To get around this issue and control for such plating variance, I decided to first plate the competition out on non-selective media and then to replica plate to the kan/cam selective conditions. This change allowed me to actually measure fitness using antibiotic markers and all was happy and good for the time being.  It completely sucked to replica plate everything, but it was the only way to get reliable numbers.

3. Carefully think about your passage conditions.  When you are performing a passage experiment, EVERYTHING MATTERS. Are you going to passage under batch culture conditions where there are such things as lag/log/stationary phase, are you going to passage in a chemostat, are you going to passage in vivo, etc...? Every change you make to your passage conditions can affect the results in subtle or not so subtle ways as selection will operate differently under different conditions. If you are passaging in vivo (mouse, plants, whatever), how do you control interactions between other microbes and your targets of interest or sample your focal microbe for freezing? Even the way that you passage your microbes in vivo can change selection pressures. For instance, motility will be a target of selection if you simply place your microbes on a plant leaf and select for infection BUT if you inoculate a leaf with a syringe (bypassing the need for microbes to invade), motility likely doesn't matter at all for infection and my guess is that you'll quickly get amotile mutants. Along these lines, always try to set up cultures using defined media even if you aren't quite sure that all components are necessary (plus, if you carry out LTEE long enough, cool things happen with the "unnecessary components"). With my H. pylori cultures, applicable to passage experiments with many pathogenic microbes, I was forced to use media which contained fetal bovine serum (FBS). The problem here is that every batch of FBS is different because every calf is different! I no doubt missed out on some fine scale evolutionary events simply because my H. pylori populations adapted to growth in different batches of FBS. LB is a little bit better, but remember that a major component of LB is actually yeast extract which can differ significantly from batch to batch and company to company. Something else to keep in mind is that LB media and other types of rich media provide a wider range of niches than defined media which can promote crazy scenarios of dependence between microbes (such as acetate cross-feeding).

What is your dilution factor going to be each passage? Even though effective population sizes are calculated based on harmonic means, differences in dilution can change evolutionary dynamics within cultures. Passage to densely and your cultures will spend more time at stationary phase than if you passage less densely (unless you time things perfectly). I always try to find the dilution scheme that allows me to catch ancestral populations just after they've started to hit stationary phase at some multiple of 24 hours. For H. pylori a 1:50 dilution achieved this every other day, for Pseudomonas stutzeri (in my experiment) a 1:1000 dilution achieves this every other day. I can't emphasize this enough, for your own sanity you want to design the conditions so that you can come in and passage at regular intervals!

4. How will you archive your populations? Another powerful characteristic of the LTEE is the ability to freeze populations to create a "fossil record". Carefully consider how frequently you want to freeze, and how much of a population you will freeze. The answers here will depend on the hypothesis you are testing. For frequency, consider that frozen cultures take up space that your PI can't allocate to other projects. One of my graduate school advisors still (maybe) has my H. pylori populations frozen down in her freezer (Sorry Karen! We're BSL2 now and I can finally take them off your hands!) even though she is not working with these lines anymore. As the generations pile up, you have to allocate more and more space. As for how much of the population you'd like to freeze, just remember that unless you freeze the whole culture you will be losing some of the population. This doesn't necessarily matter for high frequency genotypes but it does for the low frequency variants. Think of this as a good example of human influenced genetic drift just like an actual passage.

5. Catastrophes will happen. You can have the best planned experiment in the world, but that doesn't prevent your lab mates from "accidentally" (shifty eyes) knocking over your cultures. Before you start, make a plan for what happens if you lose a passage or if your freezer melts. For me, I always keep the previous passage in the fridge until the next passage is complete. Sure, it's a slightly different selection pressure than constant passage...but so is going into the freezer stocks. Also remember that catastrophes happen to everyone, even Rich Lenski, and it's a part of science. It sucks at the time, but exhale and move on. Trust me, you'll be much happier in the end.

6. Can you tell if you've cross-contaminated your experimental lines? Trust me again on this, cross contamination happens so figure out ways to identify it. I always try and alternate between pipetting and passaging phenotypically different strains. For H. pylori this meant having one set of strains be kanamycin resistant while the other set was not (had to perform an extra experiment after the fact to control for this difference). However, I was able to spot one instance where one of the lines had a low frequency of kanamycin resistant colonies. In the final analysis I threw out this line, which is why there are only 5 competent lineages in my Evolution paper. You might say "well Dave, I'm not that sloppy in the lab". That could be a very true statement, but I guarantee that if you run the experiment long enough you will have other people perform the passages. People make mistakes when they aren't as invested, haven't designed the experiments, and are reading from a written protocol. They don't mean to, but it's a fact of life.

7. Be curious. I suppose this works for every single experiment ever done...but curiosity is one of the most important characteristics for research. You will grow to love your cultures, to see them flourish and change. If you understand what to expect from your cultures, you can identify interesting yet unexpected events. Know what to look for and note any changes from this search image. That's where you find really cool results.

Wednesday, November 13, 2013

Should I go to Grad School?

Given I live in a desert which -- for the most part -- lacks colorful deciduous trees, the one way that I know it's fall is a flurry of activity concerning grad school applications. Since I teach an upper division core class for microbiology majors, I often get questions from students about what to do after undergrad. The first thing I tell them is this: The one burning memory that I have from graduate school is from sometime in the spring of 2004. It was my third year and I distinctly remember getting hit with the combination of relationship problems (long distance girlfriend and I finally broke up) and the 3rd year grad school treat of having a bunch of experiments with no hope of any successful results. Everything was so confusing. It was 2am, I was in the lab on a Saturday, the only car in any of the parking lots outside was my own, what the hell was I doing with my life? I sat there on the floor of the lab and cried. Seriously...even went fetal position a couple of times. With the perspective I have now, and looking back on all of my 5 years in graduate school, I can honestly say that getting a PhD sucked. It was a slog, a war of attrition. There were so many times I wanted to quit...BUT it was also one of the greatest experiences in my life. I don't regret any moment of it, and would do it again and again and not change a thing.

Why did I stay with graduate school? I had other options, I was a decently compensated intern at a pharmaceutical company all throughout undergrad and had gotten offers to remain on but turned them down. The 9 to 5 life and a daily routine wasn't for me. Sure I was turning down a good job, but I knew deep down that I'd be much more happy as a university researcher. I just always knew that I got bored with routines, with dealing with the same problems over and over again. Industry jobs seemed like scenes from the movie Groundhog day (I'm not entirely right or wrong about this). It seemed as though a job in academia would bring different challenges every day (and it certainly does). I wanted to be challenged, constantly, always from different angles. I knew that that kind of changing landscape of problems is what satisfies my brain.

It was during my time as an intern that I realized I really enjoyed asking questions, finding out how the world worked. I knew I didn't want to go to medical school, and graduate school just seemed like a good way to continue learning about the world. I remember being amazed that I could actually get paid (not a lot by comparison to other things, but enough) to go to school!!! I still can't believe that there are actual jobs that pay me to learn about the world and share what I learn with others. During my first of second year in grad school, my view of life solidified completely. It was at this point that one of the experiments I had thought of and designed actually worked. There I was, the only person at that moment in time that knew a new fact about how the world worked. It was thrilling, it was addictive...there is simply nothing like the rush you get when you get new experimental results. Sure, the paper that came of this experiment was pretty niche, but I was hooked. It's a combination of all of those feelings that helped me stay the research course even when things looked so incredibly bleak.

So should you go to grad school? It's definitely not for everyone, and as I say above, it really really sucks sometimes. It's simply a personal decision that I can only provide one perspective on. Every department and lab is different, and it's up to you to find a place to thrive. You have to find ways to motivate yourself to keep putting one foot in front of the other, to continue performing experiments even though 95% of them fail. Starting in grad school -- and continuing throughout academic careers -- you are surrounded by rejection. Rejection is never fun or easy, but over time it becomes easier to deal with.

I didn't think I'd make a ton of money with a PhD, I didn't even know if I'd eventually have a job. To this point there are a couple of things I can say now that I didn't know before 1) it's much easier to get an industry job with a BS or Masters than a PhD (companies can hire people and train them the way they want) and 2) it's easy to start out as a Masters student (or PhD) and upgrade to Phd (or downgrade to Masters) so your path isn't set the moment you start grad school. I didn't know what I wanted to do with my PhD when I started grad school (in the beginning I didn't think I'd actually be good enough at research to be a PI), but I knew that I enjoyed learning. My love of learning kept me motivated.

You don't finish grad school, you survive grad school. Your job as a graduate student is to make mistakes and to learn how to avoid making mistakes in the future. Your job as a graduate student is to consume every possible piece of information you can and learn to filter out good from bad. Grades really shouldn't matter to you anymore (in fact, if you can, take every class Pass/Fail). Classes are there not to prove that you can get an A, but to give you an opportunity to truly internalize relevant information. As a grad student you are much more likely to figure out some very small thing about the world that only a handful of people really care about, and that leaves your mom to question why you aren't a REAL doctor, than you are of actually making difference to human health. That's OK, it's all about building a foundation for the future wherever that may lead.

Looking back, there is one extra unexpected bonus that made graduate school worthwhile. Apart from the rush of science and research, grad school happened at a time in my life when I was truly becoming who I actually am as a person. I had moved across the country from NY to Oregon, and had started a life completely on my own away from the training wheels that undergrad life can bring. Some of my best friends to this day are people from my grad school cohort. People who were always up for a beer or pizza, people who shared similar experiences to me growing up as a bit of a science nerd. People from all walks of life, with very different perspectives, who nonetheless all found ourselves diving headfirst into research. I would be a very different person if I did something other than graduate school, because that was the moment in time when I really ventured out from the nest.

Grad school is one of the most difficult things I've ever done, and it's not for everyone, but for me it was completely worth it.



Friday, November 1, 2013

Replication and Studies of Host-Pathogen Relationships

There has been a buzz around the interwebs (and on actual paper too, so I guess it must be real!) lately about how difficult it can be to replicate published results. Much of the popular press has focused on a couple of articles from The Economist called "How Science Goes Wrong" and "Trouble at the Lab". There have also been a variety of well thought out posts from the likes of Jerry Coyne, Ian Dworkin, Chris Waters among others.

Some of the chatter has been along the lines of "BUT...REPLICATION IS A PILLAR OF THE SCIENTIFIC METHOD. THERE IS A SERIOUS PROBLEM IF MOST STUDIES CAN'T BE REPLICATED. WASTE OF THE MONEYZ!!! GRUMBLE GRUMBLE..."

At the top of this post I'm hoping to add a slightly more nuanced opinion here, followed by some unpublished results at the bottom to serve as a cautionary tale. I don't really disagree with worries about the state of science. Replication is of the utmost importance for research, and if results aren't robust there must be a way to keep track. Perhaps post pub peer review and comments will fill this particular niche. Experiments now are built on a foundation of experiments and models pioneered over years and decades. If you are interested in getting involved in a new research direction, one of the most important things to do is actually see if you can replicate foundational results in your hands in your own lab. That being said, biology is hard. Replication of single experiments under well controlled conditions can easily be thrown off by Rumsfeldian unknown unknowns. I remember hearing from someone (I want to say it was Paco Moore and that there is a paper somewhere on this which I can't find with quick google searches) that measurement of fitness in the context of Rich Lenski's long term E. coli experiment can be slightly altered by University water quality. In grad school I remember Patrick Phillips describing an experiment with nematodes where the assay would only work for about two weeks a year because the stars and sun and temperature aligned to yield the perfect experimental environment. It turns out that physiology and behavior of living organisms can be extremely sensitive to just about everything if you measure closely enough.

This problem is compounded even more when you are dealing with multiple living organisms, for instance,  when your research area is host-pathogen (host-symbiont, same diff) relationships. I can't speak for anyone that works with animal models, but I can definitely attest that plant immune responses are EXTREMELY sensitive to pretty much every stimulus you can think of. Since plant immune responses are dependent on cross-regulation across multiple hormonal pathways, even the slightest change in some environmental factors can completely shift the likelihood of infection. This is exacerbated by having to grow plants for multiple weeks before you can actually do the experiments, all the time worrying that some random lab malfunction (3am growth chamber overheating anyone?) will render batches of host plants unreliable. Different labs will have different water, soil, temperatures, humidity (low humidity in Tucson is the bane of my lab existence sometimes!), etc...When I started working on P. syringae and plants as a postdoc, I would get very frustrated at my inability to replicate other peoples published experiments. The more time I spent in the lab, the more I realized that that's just the way it is sometimes. Don't get me wrong, there are a variety of other reasons that replication may fail, but when you're crying into your lab notebook at 3am keep in mind that it's incredibly hard to control both the host and pathogen growth in the exact way that the published experiments were performed.

I'm guessing that every PI that works with phytopathogens and plants has a story where there was an interesting phenotype which couldn't be replicated when they moved to a different lab/University. As a postdoc I remember screening through 50 or so very closely related isolates of P. syringae pv. phaseolicola to look for subtle differences in virulence on Green (French) bean. The goal here was to minimize random genomic variability between strains, by choosing very closely related strains, so that I could hopefully quickly pin down genotypic differences underlying interesting phenotypic differences simply by looking at the genomes. Basically GWAS for microbes to use a looser term.  This was one of the experimental directions I started as a postdoc and was hoping to continue as PI in my own lab. One of the most solid results I had was a subtle difference in growth between two strains on French bean cultivar Canadian wonder. Canadian wonder is the universal susceptible cultivar to P. syringae pv. phaseolicola, which basically means that this plant was thought to be highly susceptible to all flavors of this particular pathogen. I had actually found that one strain (Pph 2708) grew 10-fold less than a very closely related strain (Pph 1516) in this cultivar (Fig. 1).







When I did pod inoculations, although the response was somewhat variable, there did seem to be some immune recognition of Pph 2708 compared to other strains (Fig. 2).




You can tell that there is something different in this inoculation because the water soaked halo is smaller for Pph 2708 than other strains, except for the avirulent mutant that lacks a functioning type III secretion system (Pph 1448a hrcC-).

So there it is, I've got two very closely related strains of P. syringae that slightly differ in pathogenicity. I have genome sequences for these (will link when I've stored up the strength to navigate the Genbank submission). There aren't many differences between them, on the order of hundreds of SNPs and tens of gene presence/absence. I had everything set up and ready to go to finish off the story once I got to Tucson and set up shop.

Here's where the problem arises...even though the result is solidly replicated under North Carolina conditions there is no growth difference between Pph 1516 and Pph 2708 in Tucson. A lot of strains I've worked with behave differently here in the desert compared to the land of tobacco and barbecue, and my guess is that it's because there is literally no humidity in the air. Since plant immune responses are linked to abscisic acid I'm guessing that the lack of humidity really annoys them when I take plants out of the growth chamber to perform inoculations. Not necessarily the lack of humidity per se, but the necessary change in humidity that accompanies taking plants out of the growth chamber. Yes, there are ways to Rube-Goldberg my way around this problem, and I have thought about a walk in growth chamber, but truth is other things worked better and I've concentrated on them. On top of that I'm using slightly different soil (what I could get my hands on), it's a different growth chamber, etc...Point is, I have a result that I would not think twice about publishing if only I hadn't tried to replicate this experiment in a different place. This happens a lot.

Monday, September 2, 2013

Fear and Reviewing in Academia

I've got at least two things going against me. For one, most of human communication is non-verbal. Whatever I say or write in critique of a paper is always more easily misinterpreted than if I were to say the exact same words to the authors in person. Second, it's likely that inherent biases in our brains will always influence how you read and interpret a critique. Malcolm Gladwell sold a lot of books on this premise.

Within the last year I was a reviewer on a paper for a journal where technical aspects of experiments within the manuscript are the most important factor in acceptance. As a reviewer I had absolutely no problem with the technical aspects of the manuscript, but I personally think that the introduction and discussion should be completely rewritten to de-emphasize what ends up being the take home story. I wrote that I didn't think the manuscript should be accepted in this state and suggested a variety of other ways to report and analyze the data which would allow the paper to be received by a larger percentage of the relevant audience. I was essentially arguing over subjective differences between the authors and I, even though the paper was technically OK. Ultimately the paper was published without the changes. This is how the system works, and I'm OK with this outcome (again, the paper is technically OK).

I want to be able to describe the specifics of this experience in a blog post, and maybe even a manuscript because I think it highlights one major downside of the "publish if experiments are technically OK" suite of journals. I want to write a post-publication critique of this article and include my actual review. I'm motivated enough to write a paper highlighting the dangers of crystallizing subjective interpretations in the form of a manuscript that glosses over this subjectivity. All this being said, I am currently an assistant professor on the tenure track. I don't want to make enemies, even though (as anybody who knows me will attest) nothing I say is ever meant as an ad hominem attack. I can be direct and this is off-putting to some, but I do this for the sake of making the story better (It's the New Yorker in me). I think that science advances much further without in-fighting and with colloquiality. I simply want science and research to progress in an efficient way with self-corrections of confusing statements. We can disagree, but let's do this over a beer and shake hands at the end.

Since I'm currently untenured, I'm absolutely terrified at inadvertently pissing the wrong people off and therefore tanking my career (and my family's well being). There are always camps in science which disagree with one another. Some of the best examples are described in Provine's "The origins of Theoretical Population Genetics" and Hull's "Science as a process". Ultimately, I'm OK if I'm lumped into a camp in some way or another but I want this to be for strictly science reasons not personal ones. In order to get tenure in the US, I must have outside letter-writers from peer institutions (some chosen by me, some by the college). These letters will hopefully describe how I make worthwhile contributions and further research in my area of expertise. One bad letter can tank my career. It's possible that someone may read a blog post (or critique of a paper) and simply take it the wrong way. Since I'm reviewing these papers, they are definitely within my realm of expertise, and so the authors have a chance at being selected by my higher ups as letter writers. I worry that critiquing a paper I've reviewed will be looked down upon by the editor, who in many cases is within the ballpark of potential letter writers. If I critique a paper over subjective and controversial interpretations, there are others out there who may hold the same viewpoints (who aren't authors on the manuscript) that could be off put by my critique. Letters are just one aspect of tenure. What if these critiques limit the chances of me being asked to speak about my work at conferences? What if these critiques make it more difficult for me to publish my own papers or get grants due simply to psychology? Is that risk worth it even if post-publication review might make a difference or open up an important discussion?

There are a bunch of folks that describe a utopian world where post-pub review is the norm, reviewers are always named, and reviews made public. I want to live in a world where I can sign my name to reviews and comment and critique papers in blog form or in a comment box next to the article. I want to be able to write papers with an opposing viewpoint. Often times fears of this world are stated hypothetically. I'm aching for a real and open discussion about the topics I raised in my original review, I think this would hugely benefit the field. I'm terrified, at least at this point in my young career, at what happens if I become the dog that catches the car. Maybe anonymity and pseudonyms are best for some things...

 I don't know that there is a fix because of the way human brains work.

Update: For some the comment box works, for others not so much. Feel free to email me comments and I'll post (I'm pretty easy to find).

Comment from Rich Lenski (http://telliamedrevisited.wordpress.com):

+++

I think there are three issues here that I’ll try to unpack.  Issue #1 is the worry over potential repercussions for your career from the authors of the paper. That’s obviously important, but let’s set it aside and look at the other two issues.

Issue #2 is that the authors of this paper ignored your useful suggestions.  Nonetheless, the paper was accepted and published.  That’s annoying.  But from what you wrote, it seems you don’t think that particular paper is a very important one in the grand scheme of science.  So I think you can let it go with respect to #2, and focus on the interesting and important work that you yourself are doing.

Issue #3 is your broader concern that journals that require only technical correctness may be weakening or diluting the scientific literature.  In that case, if you feel strongly about it, then I suggest you look for an outlet where you could write a short editorial or perspective on this issue.  You could mention that you were involved in such a situation, but there's no need to name authors or even the journal (or you might mention several journals where this is the policy).  To illustrate what you’re talking about, you could construct a strictly hypothetical case where: a paper is technically correct but ignores some issue; a reviewer asks that issue to be explicitly noted; the authors ignore the advice; and, because the paper is technically correct, the editor gives the go-ahead and it’s published.  Given all the subtleties and complexities of real science, it will probably be easier for you to construct and explain a hypothetical case than to explain the actual case that bothers you.  Plus, notice that issue #1 has gone away!


+++

Tuesday, August 27, 2013

So you want to be a postdoc

As with many of these posts so far, I was slightly involved in a twitter conversation last week that touched on a topic I've been meaning to write about. What makes for a good postdoc experience? Keep in mind that I completely understand that everyone is different, and so the following certainly doesn't apply universally. In the very least this should provide some insight into how I run my lab and what I expect from people within the lab (including PDs), so if you're considering working with me in the future take these words as a brief intro into my style.

1) Be able to say "No" to your PI

As a PI, it's very easy to come up with ideas when reading papers and seeing talks. There are all sorts of new projects in every direction and this can be kind of overwhelming. Keep in mind that the goal as a PD is to write grants, papers, and generally be productive by seeing experiments through tho the end. It is very easy for your PI to say "why don't you try this" or "maybe this is something we should think about" without having to actually do the experiments. One of the most important skills as a postdoc is to be able to say no to your PI. If you can't say this simple two letter word without anxiety, you will simply run out of time in the lab and be swamped. Extra bonus, this skill often comes in handy later after you've landed that tenure track job and you're asked to be on every committee possible.

2) Don't take everything your PI says as gospel

Your PI is a researcher just like you...the difference is that they're more experienced at the job. They've likely interpreted more data sets, read more papers, dealt with more rejection, etc...  Simply stated, your PI has had more practice than you at your job. However, within this context, realize that PIs are wrong all the time. If we mention/cite a paper we may be misremembering it. There may be some new and better paper (which we haven't read because, trust me, it's hard to keep completely up on the literature in real time) that has disproved the first. There's a very real chance that the data we remember is more nuanced than we think it is. Always read the primary literature and interpret the data for yourself.

3) It's OK if your PI disagrees with you, but know when their evidence is overwhelmingly good

That being said, your PI isn't wrong all the time. There will be times when you want to argue over interpretation, and that's OK, but learn to know when you've lost the argument. Trust me, this will save you much time and effort in the end.

4) Help your PI be a better mentor

I am very good at being me as a researcher. I understand my own body rhythms and know when my most efficient working hours are. I know exactly what type of mentorship and interactions I needed to succeed. I understand myself reasonably well, but everyone is different. One of the most difficult parts of mentorship at any level is understanding what the other person needs from you in terms of opinions, information, and interaction. How do you motivate someone else? You will have a much more successful PD (I think) if you can discuss with your PI exactly what kinds of feedback and interaction you need and expect. Think about what kinds of feedback you require in order to succeed. Have an open discussion, in the end this is the best possible situation for both of you.

5) Don't be afraid to start small pilot side projects

Never be scared to start small projects on the side (for me small projects require less than about 100$ of new supplies). If money's an issue your PI will let you know. If you read about a new technique, try it and see what happens. Screen a bunch of isolates for presence of a PCR product. Mix two strains together to see who wins. This will give you added experience designing experiments and interpreting data in a new framework. In the very least you will learn the hugely important skill of cutting bait when things aren't working. In the best case scenario you will develop projects that you can take with you to your new lab.

6) Have continuing and open discussions with your PI about which projects you can take

Data sets change. Some experiments work and others don't. The most tension I've seen between PIs and their PD always seems to be over ownership of projects. Be clear with your PI about what you want to take with you even before you start applying for jobs. If you've had some small side projects work, tell your PI and have the discussion about who "owns" what. The more open you are the clearer limits will be when you are starting your own lab.

7) Don't be afraid to apply for independent fellowships

I've seen some cases where PIs don't want their PDs applying for fellowships because the time invested could be better spent on experiments, I strongly disagree. If you land a tenure track job, you will have to write grants for a living and the more practice the better. Even if you have a paycheck through your PI's grants, independently earned fellowships are a huge CV boost that can help you land a job. It's worth the effort, just make sure you don't drop the ball on your experiments.

8) You are not hired as a technician

You aren't there to have your PI feed you experiments to do, you're hired as a PD to be an independent thinker. To design new experiments, to read papers, to try and figure out new directions for the project to go. It's a bad situation if your PI is hawking over you and giving you precise direction at every step. You will not develop the skills needed as a tenure track researcher and your PI missed a golden opportunity to push their research program forward.

9) Take every opportunity to speak, teach, and mentor

It's very likely that you will have to do these things when you are a PI, and (as I've said a bunch of times above) the more practice you have the better. If you have the chance to give guest lectures or teach a course (so long as your PI is OK with this) go for it. You will never understand a topic better than when you have to explain it to someone from first principles. You may even see the problem in a new light or make new connections. Practice can only help you out later when your doing these things continuously.

10) Enjoy your life as a postdoc

Your postdoc is likely the last time, for a while, that you will get to decide where you want to live. The job of being a PD is about performing experiments and writing papers, but you are a person outside of the lab too. A researcher's life is stressful, so use the time outside the lab to enjoy the world around you. Feel free to go to a lab X to work on an awesome project, and completely disregard the outside world, but I'm just saying that there is more to life. I often find that some of my best thinking gets done while I'm out running...There might not be as awesome a project in lab Y, but if the quality of life is better you may end up with a more fruitful and fulfilling postdoc experience.



Thursday, August 15, 2013

Is "Ecological Epistasis" a Good Term?


I've been inspired by a couple of recent twitter conversations I've had to write a post and basically lay out why using the phrase "ecological epistasis" triggers my population genetics spidey-sense.

The first conversation happened about a month ago while I was sitting through previews for the completely satisfying movie Pacific Rim.

The second happened yesterday (Ian is live tweeting a bunch of talks at BEACON and Maren Freisen (@symbiomics) was talking about her Medicago research)

I'm using this space as a way to crystallize my thoughts and to try and solicit other opinions. I'm also going to try and be involved in Seth Bordenstein's G+ chat on the hologenome in a couple of weeks, so consider this a bit of a warmup.

Epistasis is a tricky word. Problems arise when different (yet highly related and somewhat overlapping) groups start to use the same word yet mean different things. One of the people responsible for making me the scientist I am today has written on this topic (here and here), so I won't go too much into it. To summarize though, you can define epistasis in the quantitative genetics sense (multiple loci interacting in a non-additive way), in the small genetic sense (two proteins actually interact or function in the same pathway as would be found by genetic screen) or you can define it in the larger genetic sense (interactions between multiple genes). Neither is wrong per se, but use of the same term can get confusing depending on your audience. I don't have a problem with any of these definitions, but I just think that making headway in biology always becomes more difficult when you have to start referencing quotes from Justice Potter Stewart.

So here's a couple of my problems with "ecological epistasis". This uses the latter definition of the term that I mention above, gene interactions writ large. If you have two co-evolving organisms, genes from one organism interact with genes from the other organism in the population genetics sense, fitness of one organism is dependent on the other organism, all well and good (if I'm misinterpreting, please let me know). We already have language to describe these circumstances though, in terms of Gene by Environment (GxE) interactions without the need to invoke the "e" word. In this case each organism is the co-evolving organisms environment variable. Why not modify this term a little bit and call it GxEG (environment/genetic) interactions? There is nuance in this specificity that isn't captured by using the term epistasis. This isn't really my area of expertise, but I'm guessing that dynamics that apply to interactions between genes residing in different genomes may be inherently different than those in linked together and vertically inherited.

The second idea that needs clarifying IMHO is what the limits on interactions between organisms are when speaking in population genetic terms. When I've seen the phrase "ecological epistasis" used, it's in reference to interactions between intimately co-evolving organisms. However, if you are going to define the term as interactions between genes in different organisms without specificity, you could extend the definition in absurdum. Much of my work focuses on plant pathogenic bacteria, and genomes of both the host and pathogen encode for proteins that mediate interactions between the two. Is this "ecological epistasis"? Stepping further back, this morning I killed a cricket that tormented my household last night (keeping my 8 month pregnant wife awake more than usual...I have no regrets about my actions). In result this is no different than a pathogen killing a host, just that co-evolutionary interactions are weaker between me and the cricket population at large than in typical pathogen/host dynamics. My foe and I both have genomes that encode for proteins that ultimately mediated our interactions this morning. Is this "ecological epistasis"? Ian Ziering managed to fight off a shark with a chainsaw:

                                                          Is this "ecological epistasis"?

I'll save my hologenome critiques (great term, needs limits on the definition) for a future blog/G+ chat. My point is simply that if you start defining interactions between organisms, that these interactions can take a wide variety of forms that you may not inherently consider. Specific wording could avoid me having to make sharknado references.

It's not that I think that using "epistasis" in the context of interacting organisms is improper. I think that the term is muddled enough as is that it doesn't make sense to use it for the sake of linking onto an already established (and muddled) term. Using "ecological epistasis" doesn't clarify things in the way that a more nuanced term could, at least to me, but maybe I'm just missing something?

Update: Maren Friesen has clarified what she was referencing in her talk:


So...4) non-additive interactions between species (not genes)

Update 2: Great response by Maren Friesen

Monday, August 12, 2013

What if Diet Soda Wasn't Diet?

Ideas are cheap, actually pulling off the experiments is the difficult part. Sometimes these experiments aren't even possible to do at the present time. I'm probably not the only one who has a running list of experiment ideas in a text document, many of which will never see the light of day. I'm going to start something new around here by posting about research/experiment ideas that I think would be interesting and informative, but which I have absolutely no time to carry out right now (however, if you're up for collaborating definitely shoot me an email!). I'm naturally curious, so it would give me great pleasure to see SOMEONE figure out the answers to these observations or actually carry out experiments. Hell, someone might have even already done the experiments (if so, please send me a link in the comments!). Use these posts for inspiration or even just to get a feel for how I think about science, especially if you're keen on being my grad student or postdoc in the future. Point is that ideas are cheap but my mind keeps grinding. So without further delay here's where it goes sometimes...

Since my undergraduate days I've had a thing for "diet" drinks. Soda, fruit juice, etc...I always go for the "light" version. First it was the deliciously aspartame-filled Diet Coke (I definitely don't have phenylketonuria) and I've since transitioned into deliciously sucralose-filled products. Supposedly, drinking diet products can help you shed weight (see here but also here). Diet soda et al. have no calories because they contain artificial sweeteners that can't be metabolized by your body. I've always believed this, I could be completely wrong but this seems right. Relevant to this story, it does seem as though drinking diet soda can actually make you gain weight and can increase the incidence of type II diabetes (Hmmmm...)

Here's the thing. Your body is also teeming with microbes, especially in your digestive tract, billions of them. Some of these can even aid digestion by breaking down products. If there is one thing I know that microbes are good at, it's adapting to use novel resources. Unexploited potential energy sources are just another niche that microbes can thrive in. I don't see why microbes can't break down, or easily evolve to break down, aspartame, sucralose, and Truvia.

So here's a couple of potential experiments. I'd like to take some gnotobiotic mice, as gut flora may influence their weight. In the lab I'd adapt a suite of common gut microbes to growing on one of the artificial sweeteners. Then I'd transplant these bacteria back into the gnotobiotic mice in one group, and "ancestral" bacteria that can't break down the sweetener into another group. Next I'd feed different groups of mice a diet supplemented with one of the three sweeteners (as well as a regular control diet). The null hypothesis in this case would be that there will be no different in weight gain attributable to evolved vs. un-evolved microbes. A second experiment is really just a converse of the first. Basically I'd feed mice with "normal" gut flora a diet supplemented with one of the three sweeteners or the control diet with none. Then I'd measure if the ability of gut microbes to digest the artificial sweeteners changes over time.  Null hypothesis here is that there would be no change in the microbe's abilities to break down artificial sweeteners over time.

So that's the outline. Thoughts? Has this been done? If someone does this will the artificial sweetener industry put a hit out on them?

UPDATE: Thanks for the input folks! Definitely understand now that there is much less artificial sweetener in diet soda than regular. Maybe not the best example, but, doesn't change the thought experiment. I know people who replace regular sugar with sucralose or Truvia in coffee and baking. They use the exact same amounts so, plus or minus differences in the molecular formulas, there's roughly the same potential mass going in.

Monday, August 5, 2013

Yes Mom, I do study GMOs

Reading Amy Harmon's great piece on GMOs and citrus greening inspired me to write this post. What follows is a slightly fictionalized account of a conversation I had with my mom. Don't worry, these conversations actually happened pretty much how I describe. I recently stopped home for a couple of days (my favorite conference to attend is but 2 hours away from my parent's house in VT) and eventually found myself arguing with her about the benefits of genetically modified organisms (GMOs). Her main comment was something along the lines of "How do you know what happens when you stick a lemon gene into corn. There could be horrible side effects". I found myself making the case that substantial scientific evidence exists concerning on the safety of GMOs and human health as well as describing how corn was completely different from it's non-domesticated (and hence non-genetically modified) ancestor teosinte. Standard stuff really, and the conversation ended in rhetorical standstill as is par for the course when I disagree with my parents.

A few hours later my mom asked me about my own research program. I started to tell her about horizontal gene transfer (HGT) in microbes, how the transfer of such genes is a driving force for microbial evolution, and finished by describing how we know very little about the side effects of HGT. Then it hit me, my research links up perfectly with the discussion about the side effects of GMOs. HGT is a natural process that is effectively indistinguishable from the creation of GMOs. At a forest through the trees level specific genes start out in species A and are transferred to species B. In the case of HGT, the vector for transfer can be a plasmid/phage/transposon/etc whereas for GMOs the vector can be a plasmid/phage/transposon/etc. In the former, random chance (and many other factors such as environmental proximity) determine which HGT events occur, whereas in the latter it's humans that determine which occur. The only (arguably subtle) difference between HGT and GMOs is what structures selection pressures. In the case of HGT, natural selection culls out unproductive combinations of genes and backgrounds whereas with GMOs humans directly select and screen for the most "productive" combinations. You could even argue, thinking about the selection pressures on the movement of antibiotic resistance genes in microbial pathogens, that there is substantial overlap even in selection pressures. If you just focus on the movement of genes and don't worry about the how, the natural process of HGT and artificial process of GMO creation are exactly the same. What we learn about the side effects of HGT will be directly applicable to understanding the side effects of GMOs, i.e. for figuring out how badly a single lemon gene would screw up your tasty corn. My research can actually be able to address my mom's original question.

"Ahh...but Dave", you might say, "microbes are different than corn". Well, it turns out that HGT occurs much more frequently in multicellular eukaryotes (like corn) than we previously thought. Aphids come in different colors because they have acquired carotenoids from fungus. A substantial portion of the genome that codes for your steak is potentially derived from snakes (arguing about the precise percentage can get a little hand-wavy since this may only be one HGT event). Michael Douglas may have gotten oral cancer because of viral HGT. Perhaps most relevant to this discussion, there is a gene in sorghum and rice that is has been acquired from a parasitic plant. The list goes on and on and will only grow as more genomes are sequenced. Yes mom, even though I study the transfer of microbial genes, I'm still studying nature's GMOs.

Tuesday, July 9, 2013

Just because you can do something doesn't mean you should...on using dN/dS with horizontally transferred regions

(Disclaimer: I'm writing this post because I've recently seen a couple of papers that measure dN/dS within obviously horizontally transferred genes and I view these analyses as sketchy for the reasons described below. If I'm making obvious mistakes or there is a body of literature that goes over how these analyses are OK, please point me in that direction!)

One morning during my senior year in college, I was driving my car around town and had the sneaky suspicion that my car's lighter wasn't working. I pushed the lighter in for a couple of minutes and then examined the end. Since the coils weren't red it only fueled my suspicion. What I did next stands firmly in the pantheon of stupid things I've done in my life...I placed my thumb over the coils to see if they were hot. I honestly can't tell you why I did this, but every sensation for the next couple of seconds has been burned into my memory. It took about four months for my thumbprint to grow back. Long story short, just because you can do something doesn't mean you should.

Molecular evolutionary analyses have gotten extremely easy to perform in the last couple of decades. There exists a variety of plug and play and freely accessible computational tools and programs that will chew up any DNA sequences you input and spit out numerical results related to evolutionary scenarios. These programs will work even if you input completely fake sequences and made up taxa. Like many programs (looking at you t-test in Excel) they are agnostic to the underlying assumptions. It's up to the user to make sure comparisons are valid.

dN/dS and it's cousin Ka/Ks are often used to measure whether positive selection has acted on a set of nucleotide sequences. Explained very briefly, for a given nucleotide sequence, dN is the number of non-synonymous nucleotide changes in the focal sequence compared to a reference sequence. dS is the number of synonymous changes in those same sequences. For these analyses, synonymous mutations are assumed to be selectively neutral and thus represent the background rate of evolution. If there is no selection acting on a protein sequence, it is assumed that non-synonymous changes should occur as frequently as synonymous. Therefore, all else equal (in a very unrealistic world), proteins under no selective pressure should have a dN/dS ratio around 1. A dN/dS ratio > 1 indicates positive selection because in this case non-synonymous changes occur more frequently than the baseline synonymous rate and selection must therefore be increasing dN. A dN/dS ratio < 1 indicates purifying selection because non-synonymous changes are being selected against and are therefore found in lower numbers than expected. I'm glossing over a lot of nuance in these analyses but that's the 30,000 foot view.

While you can calculate dN/dS for a bunch of sequence blind to their evolutionary relationships, this number doesn't really give you much insight into the evolutionary process. Alternatively, if you have a set of sequences for comparison and a phylogenetic tree showing evolutionary relationships between these sequences you can orient where and at what rates these nucleotide changes have occurred. If you use a program uses statistics to infer evolutionary models (like PAML), you will be asked to input a phylogeny for this very reason. To avoid confounding yourself with circular calculations, it's important to use phylogenies estimated from loci outside the scope of the analysis, even though in many cases phylogenies built from your loci of interest will (for the most part) match those built from the rest of the genome.

Genes undergoing horizontal gene transfer (by definition) have evolutionary histories and phylogenies that differ from the rest of the genome. You can certainly build a phylogeny using these sequences, and you can use this as your tree for calculation of evolutionary rates,  and PAML will accept your input files without questioning them. You will be given a numerical answer when calculating dN/dS (or more specifically omega). In the best case scenario you will then go on and publish papers, generate press releases, and maybe even sit in on a podcast to describe your awesome results. The problem that arises though, is you have no clue as to the true evolutionary relationships between the genes of interest. The phylogeny just tells you how similar a given set of sequences are, but it doesn't necessarily tell you the evolutionary history of horizontally transferred genes. Take for instance a specific gene present on a highly transmissible plasmid (Fig. 1).




Figure 1: If Your Gene of Interest Undergoes Horizontal Gene Transfer, Good Luck Guessing It's Evolutionary History

This plasmid can be passed around from strain to strain, and may experience a number of different environments. In some environments/backgrounds your gene of interest may be under positive selection, in some cases negative, in others no selection at all. Unless you know the complete transmission history of this plasmid you are stuck sampling gene sequences from the last strain it's found in. In the example in Fig. 1, you might assume that your gene of interest is under positive selection in Species C compared to A but in reality it's neutral in both cases. Moreover, if you were to compare this gene from species B and C you would think that there was no difference in selection between the two. When you build a phylogeny with your genes of interest, and sit down to calculate dN/dS (or even worse omega), all you can see is that non-synonymous changes have occurred at some point since the divergence point between two sequences.

Alternatively, bacterial loci may recombine with divergent orthologues from closely or distantly related strains through processes like natural transformation. In this case only a small fraction of the codons in a gene may be replaced with divergent sequences, but such events can totally skew dN/dS ratios (Fig. 2).



Figure 2: Site Specific Recombination Disrupts Interpretation of dN/dS

Although the neutral theory provides a decent null hypothesis for understanding selection on mutational events in natural populations, there is no comparable model for bacterial recombination. Are a majority of homologous recombination events neutral? Are a majority of recombination events subject to positive selection? We have very little understanding of how selection and homologous recombination interact in natural bacterial populations, but we do know that recombination is very frequent within some lineages even across "housekeeping" genes. Along these lines, a recent study investigating Staphlococcus aureus and Clostridium difficile suggests that frequent recombination events bring with them many more synonymous polymorphisms than non-synonymous. Offhand, I can think of many scenarios where dN/dS ratios wouldn't match the underlying evolutionary dynamics because of horizontal transfer but the fact of the matter is that we as a community have no clue as to what to expect.

All this brings me to the ultimate take home message of this post. If there's a hint that horizontal gene transfer may be affecting your analyses of selection, probably best to avoid having me as a reviewer because it's going to take a heck of a lot to convince me that the methods are sound.

Update: Matt Barber (@MattFBarber) reminded me of a great review paper from Jesse Shapiro relevant to this post, but which goes much more in depth into a variety of analyses. Also came across an awesome recent paper from Sheppard et al. (h/t Mark Pallen, who referrs to the "et al." as the Sheppard, Didelot, Falush posse) on how to use bacterial recombination to your advantage for GWAS studies.

Thursday, May 30, 2013

The story behind "Exploring the costs of horizontal transfer"

My new review in Trends in Ecology and Evolution went live last week. This paper, while not really experimental, took a little bit of a circuitous route and a bit of luck. For all of you out there sitting on ideas for reviews/opinions but not knowing how to get these published, here's how it happened (with a bit of philosophy of how to be a researcher thrown in).

One of the important skills I first learned in grad school was how to delve headfirst into a completely new topic and figure out the salient and relevant points. Sure, you might get sent a paper of interest by a colleague or have one picked for journal club, and these are great for getting a cliff's notes version of an area of research, but to understand a topic you need to know the background. One of the best ways for me to figure out the background (i.e why is the question interesting, what has been done in the past, what are directions for future research) was to find a review article and use that to point towards new references. These references can be found in the articles themselves, but its also quite helpful to search for other articles that cite the review. It doesn't even have to be a new article, because the rabbit hole of references eventually leads to the present day.

Somewhere along the way I developed a soft spot for the Trends family of journals (I know...Elsevier is evil...fully acknowledged, Frontiers is turning into a great place for reviews in the future). Trends articles were clear, concise, opinionated, and a great foothold for jumping into new areas. While there are other equally great resources for reviews like the Annual Reviews family, Bioessays, MMBR, etc...I set it as a goal early on in grad school to publish a first author article in Trends.

There are two different paths to write a Trends article: 1) you can receive an invitation from one of the editors or 2) you can submit a short pre-proposal and see if an Editor likes the idea. I was part of an invited review once, but this new article was the product of submitting and resubmitting pre-proposals.

When I started my lab I wanted to break away from what my postdoctoral advisor (Jeff Dangl) is known for and get back to my roots in microbial evolution. Sure, I still do a lot of phytopathology work in my lab, but I try to do this in the context of understanding how microbial pathogens evolve and adapt to new environments. This strategy has its positives and negatives, but long story short I found myself writing grant applications where the first page was devoted to explaining why the questions I was asking were interesting and important. This was too much space devoted to justifying my questions. Part of this is my lack of grant writing experiences, but part of this was that I felt I had to explain why I was asking the questions I was asking because, while there were many other articles preceding my ideas, there was no article (*as far as I can tell, point me towards these if they exist and I missed them please!) that laid these questions out in the context I was thinking about. I was simply using too many words to say something that could be easier said in a long review article and then cited.

My first stab at getting this review published was a pre-proposal to Trends in Microbiology. Obviously, this didn't work. I took a little bit of time, re-jiggered the ideas and foci, and submitted another pre-proposal to Trends in Ecology and Evolution (TREE). This time a very kind editor thought enough of what I wrote to give me a chance at a full article. I hadn't actually written the article yet, but the ideas were circulating in my head and most of the references I ended up using were gleaned from iterations of grants I was writing. I had about 3 months to write the full article, which is both plenty of time and not close to enough time, but I was able to pull together a draft and circulate it amongst my colleagues before submitting for formal peer review. This piece actually started out as an opinion, simply because I wasn't really sure if people were thinking about things the way I was. There is a feeling in science where you are both terrified and excited in the same moment. Either you are an idiot who is seeing things that other people have seen before and simply don't recognize this, or you are actually seeing things in a new way. The same feeling occurs when dealing with great new experimental results, except there is an extra option...either your experiment is 1) awesome sauce! 2) a trivial result that other people have seen before but which you haven't realized for one reason or another, or 3) a lab mistake.

 I got the first reviews back and realized that I was onto something that other people were thinking about already (so...no controversial opinion needed), but that there was definitely a place for what I was writing. It's always difficult to read critique, but the reviewers actually did a great job pointing me towards other papers and helping me discover/emphasize other interesting research findings and directions. Specifically, the first iteration was too heavy on history and specifics and too light on the evolutionary implications of horizontal gene transfer. I made these changes, added a figure (in retrospect, should have changed the font and cut down on whitespace but I'm very happy with the information that it conveys), and sent it back in for a second review. This time it went through without a  hitch, and I celebrated accordingly. Side note: I'm a slightly large person and have a tendency to break things in the house and the lab (My wife calls me Shrek). The sequence of the disruptive protein in figure 1 is an homage to my tendencies. A slightly less than subtle Easter egg, but I like trying to put those in my papers (there are plenty more subtle ones in other papers...). I'm also not alone in doing this.

The take home message is not to worry about whether something is good or not, just submit and see what happens. For a long time I was scared/worried about writing pre-proposals to the Trends editors, but it was a fairly seamless process once I went through it. You don't have to wait for the magical email invitation, just take the initiative and see if your idea flies. I'm writing my first grant after the review is out, and it's definitely much easier to write the background now. The last couple of paragraphs of the review also nicely set up some other manuscripts I'll be submitting this summer.


Tuesday, May 7, 2013

Follow the biology (pt. 5) Season Finale

Yesterday I described assembling a reference genome for Pseudomonas stutzeri strain 28a24, in order to identify causative mutations for a brown colony/culture phenotype which spontaneously popped up while I was playing around with this strain. It's a pretty striking phenotype:


In addition to Illumina data for the reference parent strain, I also received sequencing reads for this brown colony phenotype strain as well as a couple of other independent derivatives of strain 28a24 (which will also be useful in this mutation hunt). There are a variety of programs to align reads to the draft genome assembly, like Bowtie2 and Maq. However, because it's a bit tricky to align reads back to draft genomes and because I'd like to quickly and visually be able to inspect the alignments, I'm going to use a program called Geneious for this post.

The first step is to import the draft assembly into Geneious, and in this case I've collapsed all of the contigs into one fasta read so that they are separated by 100 N's (that way I can tell what contigs I'm artificially collapsing from real scaffolded contigs). Next I import both trimmed Illumina paired read files, and perform a reference assembly vs. the draft genome. In Geneious it's pretty easy to extract the relevant variant information, like places where coverage is 0 (indicating a deletion) or small variants like single nucleotides or insertions/deletions. Here's what the output looks like:


For the reads from the brown phenotype strain there are basically 167 regions where there are no reads mapped back to the draft genome. A quick bit of further inspection shows that these are all just places where I inserted the 100 N's to link together contigs. There are also 327 smaller variants that are backed up with sufficient coverage levels. My arbitrary threshold here was 10 reads per variant, but my coverage levels are way over that across the board (between 70 and 100x).

Here is where those other independent derivatives of the parent strain come into play. There are inevitably going to be assembly errors in the draft genome, and there are going to be places where reads are improperly mapped back to the draft genome. Aligning the independent non-brown isolate reads back against the draft genome and comparing lists of variants allows me to cull the list of variants I need to look more deeply at by disregarding variants shared by both. After this step I'm only left with 3  changes in the brown strain vs. the reference strain.

The first change is at position 2,184,445 in the draft genome, but remember that the number here is arbitrary because I've linked everything together. This variant is a deletion of a G in the brown genome.


Next step is to extract ~1000bp from around the variant and use blastx to give me an idea of what this protein is.

Basically it's a chemotaxis signaling gene. Not the best candidate for the brown phenotype, but an indication that the brown strain probably isn't as motile as 28a24. Next variant up is at position 3,555,964. It's a G->T transversion potentially involved in choline transport...still not the best candidate for the brown phenotype.


The last variant is the most interesting. It's a T->G transversion in a gene that codes for homogentisate 1,2-dioxygenase (hgmA)



To illustrate the function of this gene, I'm going to pull up the tyrosine metabolism pathway from the KEGG server (hgmA is highlighted in red).


The function of HgmA is to convert homogentisate to 4-Maleyl acetoacetate. Innocuous enough and I'm no biochemist so in pre-Internet world I'd be somewhat lost right now. Luckily I have the power of google and knowledge of the brown phenotype so voila (LMGTFY). Apparently brown pigment accumulation in a wide variety of bacteria is due to a build up of homogentisic acid. There is even this paper in Pseudomonas putida. Oxidation of these compounds leads to quinoid derivatives, which spontaneously polymerize to yield melanin like things. Without doing any more genetics I'm pretty sure this is what I've been looking for. It's a glutamate to an aspartate change at position 338 in the protein sequence, a fairly innocuous change but which is (if this truly is the causal variant) in a very important part of the protein sequence. From here, if I were interested, I would clone the wild type version of hgmA and naturally transform it back into the brown variant of 28a24 to complement the mutation and demonstrate direct causality. I might also try and set up media without tyrosine to see if this strain is auxotrophic, which is a quality of other brown variants (see the P. putida paper above). For now I'll just leave it at that and move on to another interesting project that I can blog about.

Monday, May 6, 2013

Follow the biology (pt. 4). Now we are getting somewhere...

First off, apologies on the complete lack of updates. In all honesty, there hasn't been much going on with this project since last summer but now I'm finally at the point of figuring out what kinds of genes are behind this mysterious brown colony phenotype in P. stutzeri. 

Picking up where I left off , I've been through many unsuccessful attempts to disrupt the brown phenotype with transposon mutagenesis. The other straightforward option for figuring out the genetic basis for this effect is to sequence the whole genome of the mutant strain and find differences between the mutant and the wild type. Luckily this is 2013 and is easily possible. Confession time, while I know my way around genome scale data and can handle these type of analyses, I am by no means a full fledged bioinformaticist. I'm a microbial geneticist that can run command line unix programs and program a little bit in perl and python, out of necessity. If you have advice or ideas or a better way to carry out the analyses I'm going to describe below, please leave a comment or contact me outside of the blog. I'm always up for learning new and better ways to analyze data!

Long story short, I decided to sequence a variety of genomes using Illumina 100bp PE libraries on a HiSeq. For a single bacterial genome one lane of Illumina HiSeq is complete and utter overkill, when you can multiplex and sequence 24 at a time it's still overkill but less so. In case you are wondering, my upper limit it 24 for this run because of money not because of lack of want. It's a blunt decision, but one of many cost-benefit types of decisions you have to weigh when running a lab on a time dependent budget.

Before I even start to look into the brown mutant genome (next post), the first thing that needs to be done is sequencing and assembly of the reference, "wild type" strain. The bacteria I'm working with here is Pseudomonas stutzeri, specifically strain 28a24 from this paper by Johannes Sikorski. I acquired a murder (not the right collective noun, but any microbiologist can sympathize) of strains from Johannes a few years ago because one of my interests is on the evolutionary effects of natural transformation in natural populations. At the time I had just started a postdoc in Jeff Dangl's lab working with P. syringae, saw the power of Pseudomonads as a system, and wanted to get my hands on naturally transformable strains. Didn't quite know what I was going to do with them at the time, but since I started my lab in Tucson this 28a23 strain has come in very handy as an evolutionary model system.

I received my Illumina read files last week from our sequencing center, and began the slog that can be assembly. While bacterial genome assembly is going to get much easier and exact in the next 5 years (see here) right now working with Illumina reads is kind of like cooking in that you start with a base dish that is seasoned to flavor. My dish of choice for working with Illumina reads is SOAPdenovo. Why you may ask? Frankly, most of the short read assemblers perform equally on bacterial genomes for Illumina PE. Way back when I started using Velvet as an assembler, but over time became frustrated with the implementation. I can't quite recall when it happened, but it might have been when Illumina came out with their GAII platform and the computer cluster I was working with at the time didn't have enough memory to actually assemble genomes with Velvet. Regardless...I mean no slight to Daniel Zerbino and the EMBL team, but at that point I went with SOAPdenovo and have been working with this since due to what I'd like to euphemistically call "research momentum".

Every time I've performed assemblies with 100bp PE reads, the best kmer size for the assemblies always falls around 47 and so I always start around that number and modify as necessary (In case you are curious, helpful tutorials for how De Bruijn graphs work can be found here). One more thing to mention, these reads have already been trimmed for quality. The first thing I noticed with this recent batch of data was that there was A HUGE AMOUNT of it, even when split into 24 different samples. The thing about bacterial genome assemblies is that they start to crap out somewhere above 70 or 80x coverage, basically because depth and random errors confuse the assemblers. If I run the assembler including all of the data, this is the result:


2251 Contigs of 2251 are over  bp -> 4674087 total bp
218 Contigs above 5000bp -> 1612110 total bp
30 Contigs above 10000 bp -> 352858 total bp
0 Contigs above 50000 bp -> 0 total bp
0 Contigs above 100,000 bp -> 0 total bp
0 Contigs above 200,000 bp -> 0 total bp
Mean Contig size = 2076.44913371835 bp
Largest Contig = 20517

The output is from a program Josie Reinhardt wrote when she was a grad student in Corbin Jones' lab (used again because of research momentum). The first line gives the total amount of contigs and scaffolds in the assembly, as well as the total assembly size. In this case, including all of the data yields 2251 total contigs, a total genome size of 4.67Mb, and a mean/largest contig of 2076/25,517bp. Not great, but I warned you about including everything.

Next, I'm going to lower the coverage. Judging by this assembly and other P. stutzeri genome sizes, 4.5Mb is right where this genome should be. For 70x or so coverage I am going to only want to include   approximately 1.6 million reads from each PE file ( (4,500,000*70) / 200). When I run the assembly this time, the output gets significantly better:


467 Contigs of 467 are over  bp -> 4695960 total bp
224 Contigs above 5000bp -> 4375593 total bp
158 Contigs above 10000 bp -> 3890697 total bp
11 Contigs above 50000 bp -> 709014 total bp
0 Contigs above 100,000 bp -> 0 total bp
0 Contigs above 200,000 bp -> 0 total bp
Mean Contig size = 10055.5888650964 bp
Largest Contig = 81785

I can show you more of this type of data with different amounts of coverage, but it's not going to change the outcome. The only other parameter to change a bit is kmer size. The figures above are for 47, but what happens if I use 49 with this reduced coverage?


382 Contigs of 382 are over  bp -> 4694856 total bp
178 Contigs above 5000bp -> 4429753 total bp
130 Contigs above 10000 bp -> 4065927 total bp
18 Contigs above 50000 bp -> 1317727 total bp
2 Contigs above 100,000 bp -> 211768 total bp
0 Contigs above 200,000 bp -> 0 total bp
Mean Contig size = 12290.1989528796 bp
Largest Contig = 110167

Even better (and the best assemblies I've gotten so far). Increasing kmer size to 51 makes it incrementally worse:


479 Contigs of 479 are over  bp -> 4633694 total bp
164 Contigs above 5000bp -> 4361629 total bp
124 Contigs above 10000 bp -> 4057109 total bp
19 Contigs above 50000 bp -> 1389519 total bp
1 Contigs above 100,000 bp -> 120161 total bp
0 Contigs above 200,000 bp -> 0 total bp
Mean Contig size = 9673.68267223382 bp
Largest Contig = 120161

Last question you might have, since there are other P. stutzeri genomes publicly available, is why not use those and carry out reference guided assembly?  I haven't really looked into this much yet, but with some quick analyses it doesn't seem like these genomes are really that similar to one another (~85% nucleotide identity, aside from many presence/absence polymorphisms) so it's not that easy to actually line them up by nucleotide sequence. Maybe better when I've got protein sequences, but that's for another post.

So now I'm ready to compare the brown phenotype genome vs. this assembled draft of 28a24. I know what the answer is, but I'm going to leave that until the next post.





Disqus for http://mychrobialromance.blogspot.com/