Looking at fixed SNPs between super clones A and B

Here I am going to look at SNPs that are fixed between super clones A and B (the two main super clones in D8 in April 2017).

First I looked for SNPs that are always homozygous reference in one super clone, and always homozygous alternate in the other super clone.

I found 5,616 SNPs that are always homozygous reference in super clone A and always homozygous alternate in super clone B. I found 2,817 SNPs that are always homozygous reference in super clone B and homozygous alternate in super clone A. This give a total of 8,433 fixed SNPs.

For comparison, I did a similar calculation between super clones A and C, and B and C (C was a super clone in D10 in 2016).

For A and C, there were 5,882 SNPs that were homozygous reference in A and homozygous alternate in C, and 3,313 SNPs that were homozygous reference in C and homozygous alternate in A, for a total of 9,195 fixed SNPs.

For B and C, there were 6,213 SNPs that were homozygous reference in B and homozygous alternate in C, and 5,980 SNPs that were homozygous reference in C and homozygous alternate in B, for a total of 12,193 fixed SNPs.

So the number of fixed SNPs between A and B, and A and C is similar, but there are a good many more fixed SNPs between B and C.

OK. So then I wanted to look at the frequency of these SNPs in our pooled population samples. I also wanted to get some idea of the frequency of these SNPs in our individual sequencing as well. So I made "pooled" allele frequencies by assuming each individual in my individual sequencing would contribute equally to the pool. For each individual at each variant, I divided the total coverage of that variant by 125 (the number of individuals in the vcf), and then multiplied that by the dosage to assign the number of reference reads, and by 2-dosage to assign the number of alternate reads. I then summed the number of reference and alternate reads across all individuals within a population for each variant, and calculated a proportion of alternate reads per variant for each population. I can then look at the distribution of frequency of variants that are fixed between super clone A and B in each pond. I am going to look at them all together and then divide them up by whether they are homozygous reference in A or homozygous reference in B.

D8_2012 (from pool)

D8_2016 (from ind seq - artificial pool)

D8_2017_April (from ind seq - artificial pool)


D8_2017_May (from pool)


 DBunk_2017_April (from ind seq - artificial pool)

DBunk_2017_May (from pool)

Doily_2017_May (from pool)
 Now breaking out by homozygous reference in super clone A or in super clone B.

D8_2012_HomRef A
D8_2012_HomRef B

D8_2016_HomRef A
D8_2016_HomRef B

D8_2017_April_HomRefA
D8_2017_April_HomRefB

D8_2017_May_HomRefA
D8_2017_May_HomRefB


DBunk_2017_April_HomRefA
DBunk_2017_April_HomRefB

DBunk_2017_May_HomRefA
Dbunk_2017_May_HomRefB

DOily_2017_May_HomRefA
DOily_2017_May_HomRefB

What do we get from this? I don't think it looks like either super clone A or B was present in D8 in 2012. Agreed? If we look at D8 in 2016, it looks like there is a mix of super clone B and heterozygotes between A and B, but no (or very little) A. In April 2017, there is a mix of A and B, with A represented more than B in our individual sequencing. In May 2017, there is now mostly super clone B. Agreed? I think that the May 2017 sample probably fairly accurately depicts the frequencies in the actual pond, however, for the April 2017 data, clones had to stay in the lab long enough to be expanded and then sequenced, so there could have been some selection for lab environment, and the frequencies seen in our data may not reflect the frequencies in the pond. Still, it does suggest that D8 went from having two dominant super clones in April to having only one present three weeks later in May.
The distributions of DBunk and Doily suggest that when looking at these SNPs, these ponds appear to be dominated by hybrids between super clones A and B. I wanted to look at this a bit more. So again looking at just the variants that came out as fixed between super clones A and B, I looked at the distribution of average heterozygosity per clone for each pond, only looking at these sites. So basically for these sites, for each individual, I determined the total number of sites that had a dosage of 1 (heterozygote), and divided that by the total number of sites I was examining. I then graphed the distribution of average heterozygosity per clone for each pond. I was expecting DBunk to have individuals with a higher heterozygosity than D8_2017, which is what I see. In addition, D8_2016 should also have individuals with higher heterozygosity that D8_2017, which I also see.

D8_2016

D8_2017

DBunk_2017











Comments