Areas of high heterozygosity may be duplications/deletions
When doing my initial attempts at chromosome painting, I was interested not only in the apparent deletion on scaffold 1982, but also the areas of high heterozygosity.
Notice the block of high heterozygosity immediately following the deletion. I decided to look into this block in a bit more detail. In this section there is an average of 1 SNP every 25 bp (276 SNPs over 7048 bps). I blasted this section on NCBI's blast X, and from this, it appears to cover the majority of the HEAT repeat containing protein 1.
Notice the block of high heterozygosity immediately following the deletion. I decided to look into this block in a bit more detail. In this section there is an average of 1 SNP every 25 bp (276 SNPs over 7048 bps). I blasted this section on NCBI's blast X, and from this, it appears to cover the majority of the HEAT repeat containing protein 1.
I then looked at this region of the scaffold in tview, and I noticed that the read depth appeared higher in this region. I then graphed the distribution of read depth along scaffold 1982. Here is that distribution for a super clone B individual, D8_125
That really high peak towards the beginning of the scaffold corresponds to that high heterozygosity block. You can see another peak farther down the scaffold. Thus, it looks like this region of high heterozygosity corresponds to a duplication (triplication? second peak a duplication?) in super clone B (or a deletion in the reference genome clone?)
Here is the same peak for an individual from super clone A, D8_103
You can see that the first peak is missing, and super clone A also doesn't have this region of high heterozygosity. The second peak is still present though.
Notice in the chromosome painting graph that super clone C (occurs in the D10 pond), also has the region of high heterozygosity. Here is the graph for a super clone C individual, D10_49
This individual also had the high peak at the beginning of the scaffold, and it again looks like a triplicate. So maybe this was the more ancestral state?
Anyways, it appears that heterozygosity is being inflated by duplications. This could cause a bias in the data across the genome. I wonder how common this variation in deletions/duplications in among the clones. These clones appear to have dynamics genomes. We do know that Daphnia do tend to duplicate genes (I think they talked about that in the original Nature paper?)

I think that your inference is spot on re: the copy number variants.There is a ton of good software out there to detect CNVs and we will need run it on the dataset to parse it. Like you suggest, any pop. gen analysis needs to take the CNVs into account else we run the risk of mistaking CNVs for heterozygosity.
ReplyDeleteAlso, it makes me wonder about the read depth distributions that you were plotting. Those clones that had an excess of zeros might simply have a lot of deletions.
Awesome result!