Posts

Identifying AxB F1 hybrids in the D8 2018 individuals

Image
I was interested in identifying putative AxB F1 hybrids in the 2018 D8 individuals. First I took all A and B individuals from D8, regardless of year. I then found all SNPs that were fixed reference in A and fixed alternate in B, or vice versa. I next pulled out all the 2018 D8 individuals and called those same SNPs, and polarized the genotypes as B (dos0), A (dos2), or heterozygous. Not surprisingly, all 7 D8 2018 A individuals were 100% A. Another 8 individuals had a prop het between 0 and 0.9. These could be recombinant hybrids between A and B, or something else. March20_2018_D8_16 and March20_2018_D8_33 look a bit like backcrosses for example. March20_2018_D8_37 could be a F1 hybrid with high error rates in calling heterozygotes. Another 43 individuals look like they could be F1 hybrids between A and B. They are over 90% heterozygous for SNPs fixed between A and B. This is more F1 hybrids than you had on your piecharts Alan, why the discrepancy?  I was worried about al...

Figuring out how to map outgroups

Image
Trying to decide how to deal with mapping the more distant species. Specifically Obtusa and Simocephalus. A couple of different considerations. 1) Obtusa is not that divergent from Pulex. Simocephalus is VERY divergent. So maybe divergent mapping and using a different mapper is more of a concern for Simocephalus than Obtusa. 2) What are we using the outgroups for? Why do we want to map them? For Obtusa, which is less divergent, we have two reasons. 1) We want to use them as an outgroup to polarize SNPs in the Pulex dataset. 2) We want to construct a pseudoreference genome for Obstusa to use for competitive mapping with pooled data. Where I am right now: I looked at the input fastq file size, the final bam file size, and the number of reads mapped in that final bam (using samtools flagstat), and compared 12 Obtusa samples and 16 Pulex samples from the same plate of libraries. As you can see in the graph below, there is a positive correlation between incoming fastq ...

Mesocosm 2019 Hatchlings 20190813

Image
Looking at the hatching data for weeks 4 through 7. First looking at the number of hatchlings that emerged prior to and post vernalization broken down by the week the ephippia where sampled, and faceted out by clone.  Looking at the same data but as proportions. Overall it looks like the proportion of hatchlings emerging after vernalization is decreasing over time, because more individuals are emerging prior to vernalization. Next I wanted to look at how survival to reproduction varies between vernalized and non-vernalized hatchlings. Here we are only looking at data from weeks 4-6, because week 7 hatchlings have only been around for 1 week. This is looking at the number of hatchlings that died prior to reproduction (1, green) versus those that survived to reproduction (0, red). Sorry if this is counterintuitive. Broken down by sample week and clone. Now let's look at the interaction of vernalization and mortality prior to reproduction. The x-axis is vernalizat...

Vernalized versus Non-vernalized Hatchlings Mesocosms 2019

Image
We collected ephippia at sample weeks 4, 5, 6, 7, and 8. So far the plates from weeks 4, 5, and 6 have come out of the fridge. So we have both vernalized and non-vernalized hatchling counts for these three weeks. The plates from sample weeks 7 and 8 will come out of the fridge a week from today (August 2nd). So we only have non-vernalized hatchling counts for those two weeks. I took an initial look at vernalized versus non-vernalized hatchling counts for weeks 4, 5, and 6. First, I looked at the overall counts of vernalized versus non-vernalized hatchlings by sample week (summing across all tanks/clones). Here is what I get. Hmm, resolution is awful. But the green bars are vernalized hatchlings, and the red bars are non-vernalized hatchlings. The y-axis scale goes from 0 to 400, with ticks at 100. The x axis is weeks 4, 5, and 6. From this it looks like the ratio of vernalized to non-vernalized hatchlings decreases with sample week (i.e. fewer individuals appear to require vernal...

Initial look at total data

Image
Here is an initial look at the total data set. First of all, I looked at median read depth. I am disappointed that the median read depth for my newest set of libraries seems so low (green), despite the PCR duplicate rate also being low. Don't quite understand what is going on. Maybe we get fewer reads out of HiseqX lanes when doing duplex barcode sequencing? Also looked at IBS and super clone assignment. Here is the corrplot: Sequenced a good many As. In total, we sequenced 83 super clone individuals. 30 B super clone individuals, and 26 super clone C individuals (D10 clones, bottom right corner in figure above). For designating super clone assignment, I did two things. First of all, I looked at the distances in the IBS matrix between single moms and pooled moms for the 6 clones where I did libraries using both approaches. Most of the 6 grouped together tightly, but one set had an identity of 0.945. I then looked at a histogram of identity distances: You can see ...

Divergent blocks in D8 2017 - intermediate frequency in other ponds?

Image
Driving question: Are blocks of the genome that show elevated divergence between SCA and SCB in D8 2017 in terms of fixed SNPs, more likely to be maintained at intermediate frequencies in other ponds? I took the windows that showed the highest divergence between SCA and SCB, and then graphed the distribution of the alternate allele frequency for all SNPs within those same windows versus a randomly chosen set of windows (matched sample size, but not matched beyond that). Here is the result for D8 2012. We do see a shift in the expected direction (towards 0.5) in the divergent set of windows. Here are the results for DBunk and DOily 2017. Again, we see a shift in the right direction, though both sets are farther from 0.5 to start with. Maybe because there are more lineages (or species) contributing to this pooled data. DBunk  Doily However, once I look at D10, or more distant ponds (Bag, B1, W1, W6 so far), I don't see any signal. However, I think that it is ...

SCB SCA DBunk Doily

Image
Updated prop SCA SCB figure for DBunk (top) and DOily (bottom) using 100,000 window with a 10,000 step size. Graphing prop SCA and SCB against number of fixed snps between A and B DBunk geom_point() DBunk geom_count() Doily geom_point()  Doily geom_count() Maybe there is a narrowing towards intermediate frequencies in DBunk? Not so much Doily? Looking at correlations between DBunk and Doily for prop SCA and SCB prop SCA prop SCB