Per Site Distributions of Read Depth

Now I am going to look at per site distributions of read depth. To start with I am looking at per site distributions of read depth for two Sp2017 clones. This way I am minimizing any confounding aspects of microbes. I chose two clones to start with, one, DBunk 147, was estimated to have a relatively low rate of PCR duplicates (~12%), where as the other, DBunk 132, was estimated to have a relatively high rate of PCR duplicates (~21%). First I just graphed the log10 read depth per site.

Here is DBunk 147:

 I graphed log10 read depth because there is quite the tail.

Here is DBunk 132:


I then wanted to look at the read depth distribution without doing log10, so I choose to only look at the distribution with sites that had read depths of 50 or less.

Here is DBunk 147:

And here is DBunk 132:

From this, it looks like DBunk 147 is a better library. That DBunk 132 does not have as much coverage. Maybe it being a lower quality library is leading to high PCR duplicates? But with an N of 2, maybe it is hard to conclude that right now.

Ok, went back and looked at two more Sp2017 clones. DBunk_133 has 25.63% PCR dups, the highest. While D8_175 has 10% PCR dups, the lowest.




Now I am going to look at some 2016 samples. I am going to look at four clones. D8_31 (8.67% PCR dups), D8_8 (12.09% PCR dups), D8_36 (30.09% PCR dups), and D10_63 (44.13% PCR dups)

Here is the per site distribution for these clones including all scaffolds.




Here are the distributions again, but only looking at "good" D84A scaffolds.





Comments