What happens if we leave PCR duplicates in
I reran the gvcfs and vcf without removing PCR duplicates to see how/if it changes anything. The PCA looks similar, and the super clone assignment also didn't change. I then looked at the distribution of prop alt when I artificially pool individuals within D8 2016, D8 2017 (April), DBunk 2017 (April), and D10 2016. For the D8 and D Bunk ponds it doesn't change much.
D8 2016 Original Prop Alt
D8 2016 with no PCR dups removed
D8 2017 April original prop alt
D8 2017 April no PCR dups removed
D Bunk 2017 April original prop alt
D Bunk 2017 April no PCR dups removed
We do see something slightly different when it comes to D10.
D10 2016 original prop alt
D10 2016 no PCR dups removed
With PCR duplicates being maintained, it appears that more SNPs are being called as heterozygous. Perhaps this is happening in D10, but not D8 and surrounding ponds, because the reference clone is from D8. Perhaps the program does keep the duplicate with the highest mapping quality, and that could lead to biases in D10, where there could be some mapping bias due to divergence.
I also ran the PCR duplicate program again, the one that checks whether PCR duplicates are actually duplicates or not based on polymorphism, and the results were similar as before, where most of them are actually PCR duplicates.
Overall, including/excluding PCR duplicates does not seem to change much in D8 and surrounding ponds. It could lead to bias in divergent ponds though, such as D10.
Thoughts?
Also means that my most recent set of libraries (which have an even higher PCR dup rate than the ones before), really are crummy. :(
Well, let's wait and see what is crummy or not once we look at the data...
ReplyDelete