Initial look at total data

Here is an initial look at the total data set. First of all, I looked at median read depth. I am disappointed that the median read depth for my newest set of libraries seems so low (green), despite the PCR duplicate rate also being low. Don't quite understand what is going on. Maybe we get fewer reads out of HiseqX lanes when doing duplex barcode sequencing?


Also looked at IBS and super clone assignment. Here is the corrplot:

Sequenced a good many As. In total, we sequenced 83 super clone individuals. 30 B super clone individuals, and 26 super clone C individuals (D10 clones, bottom right corner in figure above).

For designating super clone assignment, I did two things. First of all, I looked at the distances in the IBS matrix between single moms and pooled moms for the 6 clones where I did libraries using both approaches. Most of the 6 grouped together tightly, but one set had an identity of 0.945. I then looked at a histogram of identity distances:
You can see the distribution is modal, with the farthest right peak corresponding to distance between super clone individuals (I am interpreting it that way). You can also see from this that the valley between the farthest peak to the right and the next peak corresponds to around 0.94, similar to the distance I found between the single mom and pooled mom libraries. So, I decided that everything with an IBS distance of 0.94 or above should be grouped into super clones.

One interesting this I noticed, is that while DBunk has 5 dominant super clones in my sequenced individuals collected in April, in my sequenced individuals collected in May two of those super clones were not found. So they seem to have dropped in frequency. 

For D8 April samples, there were 63A (71%), 20B (22%), 3K, and 3 other super clone individuals. In May there were 16A (53%), 8B (27%), 2K, and 4 other super clone individuals. This May composition clearly doesn't mirror the May pooled sample, where super clone B appears to be about 89%.




Comments