Following up on distribution of genome wide SNPs versus 1982 SNPs
So it turns out I made a mistake when making the genome wide graph in our meeting yesterday. It wasn't, in fact, genome wide. So here are the correct graphs. First, the graph using only SNPs that are fixed between A and B,F,K. Then, the graph with all SNPs genome wide. DBarb, DOak, and DMud, are not included in the second graph (to speed up computation), sorry that it makes things a bit more confusing.
Things to take away from this:
1) The pattern of prop alt for SNPs that are fixed between A and B,F,K, is different than the patterns of genome wide prop alt, mostly for the Dorset ponds (compare top panel of each graph). The SNPs fixed between A and B,F,K tend to be more modal.
2) Overall the distribution of the 1982 SNPs looks distinctly different (more modal) than the distribution of the genome wide SNPs for the Dorset ponds, supporting the idea that something is keeping these 1982 haplotypes in balance and/or from recombining.
3) If you look at DBunk and DOil, 1982 SNPs that are fixed between A and B,F,K (lower panel top graph), these SNPs are heterozygous (centered on 0.5). But when looking at all 1982 SNPs (lower panel bottom graph), we see additional peaks. For DOil, there is a similar high peak at 0.05-0.10 in both the genome wide and 1982 SNPs. I looked at these SNPs, and it turns out they are mostly DBarb/Oak SNPs. Thus, my DOil pooled sequencing sample must contain a small number (5-10%?) of the DBarb/Oak species. So, it appears that the true 1982 SNPs are still centered on 0.5 in DOil, regardless of whether they are fixed between A and B,F,K or not. However, I am still trying to figure out what the second (lower) peak is that is popping up in DBunk.
Other things to note:
1) We see that prop alt of 1 peak in most of the non-Dorset ponds. So I think it is just those northern populations being divergent from our reference genome that is leading to that peak. I think those ponds are likely all D. pulex, not some other species.
2) It looks like NM2 may be a mix of D. pulex and another species. I am thinking that because of the weird second peak. Although could that also be a signature of asexuals? I know NM2 was the pond that you found had the higher prevalence of asexual SNPs.
Ok, so getting back to that second peak in the DBunk 1982 SNPs. I pulled out those SNPs and ran a PCA on them. Here is the result (Each super clone included only once per population). EV1 explains 39% of the variance, EV2 explains 19%.
Same graph but now colored by super clone.
Here we can see that several of the DBunk super clones (D, H, and L) along with one DBunk other clone, and all of the D Lily and D Ramps clones, and three D10 clones, fall out separate from the rest of the Dorset clones. These clones appear to have input to 1982 from yet a third, unknown lineage? Interestingly these three D10 clones are the same three D10 clones that fall out as being divergent from the rest of D10 in the genome wide PCA (see below). If you look at the graphs up above, you can also see a lower peak, almost a shoulder (besides the 0.5 peak) in the D10 1982 SNPs when you include all SNPs and not just the ones fixed between A and B,F,K, similar to what we see in DBunk. Whatever this other contribution is, in DBunk/DRamps/DLily, these SNPs are almost exclusively heterozygous, less than 6% are homozygous alt. In the three D10 clones, however, the 57-32% of the SNPs are homozygous alt.
Comments
Post a Comment