Is there an excess of divergence on a portion of chromosome 10 between super clones

I have been trying to address the question of what is different between super clone A and B. In doing this, I noticed that there were three scaffolds that initially looked like they might have an excess of fixed snps between A and B relative to their length, that all map to chromosome 10. These were scaffolds 2175, 1982, and 1927. I tried looking into the divergence between these super clones more, using a type of sliding window approach. First I identified all SNPs that were fixed between different D8 super clones (so homozygous ref in one super clone, and homozygous alt in another). I then scanned along the scaffolds with fixed differences in 2,500 bp chunks, and asked how many fixed snps are there within these windows, and then divided that by the 2,500bp. As I am writing this I realize a problem with this, in that if a scaffold was 3,000bp long, there would have been one 2,500 chunk, and then a 500 chunk, so some of these estimates will be low. But most windows should be ok. For now I will go ahead and show the graphs I have so far. I will try to fix this problem later.

Distribution of SNPs per bp for fixed differences between super clones A and B.
In the graph above, the window that has the highest SNPs per bp is a window on scaffold 1982.

Distribution of SNPs per bp for fixed differences between super clones A and K.
Similarly, when we compare super clones A and K (K is related to B), we see that a window on 1982 is among the divergent windows ( > 0.03), though there are other windows popping up now too.
When we compare the AB and AK divergences we see that 1982 pops out as a highly divergent window in both (the point that is at about 0.038 in both, the far right point).

Distribution of SNPs per bp for fixed differences between super clones A and F.
We again see something similar when comparing super clones A and F, where the most divergent window is a 1982 window.

Distribution of SNPs per bp for fixed differences between super clones A and B,F,K.
In fact, is we look at the SNPs that are all the same within B, F, and K but fixed differences between these clones and super clone A, we see the same thing. The most divergent window is in scaffold 1982. Two of the three windows with a SNPs per bp over 0.02 are consecutive 1982 windows. The B, F, and K super clones are related super clones. B is mostly found in D8 in spring 2017, though there was one  clone in D8 in spring 2016. F was only seen in D8 in spring 2016, and K was only see in D8 in spring 2017. It seems that this divergence on scaffold 1982 is older than the split of B, F, and K, and may be common to clones within this larger lineage.

I also filtered my VCF file for SNPs surrounding indels, removing SNPs within 15 bp of indels, and also only keeping one indel when groups of indels were 10bp apart or closer. After this filtering, there are less SNPs per bp, but when I look at the same set of SNPs as just above, fixed between super clone A and B,F,K, of the four 2,500 bp windows that are above 0.015 SNPs per bp, two of them are those same consecutive windows on 1982, and these are the two most divergent windows.

I also looked at differences between B, F, and K. Looking at B versus K, we find overall lower levels of SNPs per bp. In this case, more divergent windows seem to be over represented on scaffold 1429, which is on chromosome 9. Nine of 14 windows that are over 0.02, are on scaffold 1429.
I find something similar when I look at B versus F, in that again, more divergent windows cluster on 1429. Eight of 12 windows above 0.02 are on scaffold 1429, which maps to chromosome 9.
In terms of fixed sites, K and F actually look pretty similar, I only find two windows with SNPs per bp over 0.01.
However, the story changes once we also look at heterozygous sites (heterozygous in one super clone, but fixed reference in the other clone). Here there are many more SNPs per bp between these two super clones.





Comments