Distribution of Read Depth Per Site when all individuals are combined

Just wanted to go ahead and put this graph up because I think it looks really nice! This is looking at the distribution of read depth per site using the VCF file for all clones sequenced so far and just combining all clones. Since it is from the VCF file there should not be PCR duplicates, right? This is also only looking at the "good" scaffolds. Here is the code in case you want to check what I did.

genofile <- seqOpen("/mnt/spicy_3/Karen/Sp2017/SpFall2016Sp2017B.seq.gds")

snps <- data.table(variant.ids = seqGetData(genofile, "variant.id"),
chr = seqGetData(genofile, "chromosome"),
pos = seqGetData(genofile, "position"),
dp = seqGetData(genofile, "annotation/info/DP"))

TCOPA42 <- fread("TCOPA42merge.csv")
colnames(TCOPA42) <- c("Num","chr","RefContigx","RefLengthx","Evaluex","MatchLenx","NumMatchx","PA42","RefContigy","RefLeny","Evaluey","MatchLeny","NumMatchy","TCO","D84Alength","PA42good","TCOgood")

TCOPA42[, chr:=as.character(chr)]

setkey(TCOPA42, chr)
setkey(snps, chr)

m <- merge(snps, TCOPA42, all.x=T, all.y=T)

m[,dapBlast:=paste(PA42good, TCOgood, sep=" ")]

mgood <- m[dapBlast=="TRUE TRUE"]

hist(mgood$dp, breaks=100)

mgoodunder6000 <- mgood[dp < 6000]

hist(mgoodunder6000$dp, breaks=100)

Next I took the median read depth per "good" scaffold, and then graphed the histogram again. So now we are looking at median read depth per scaffold instead of read depth per site.

This does somewhat resemble the shape of the true/true read depth per scaffold I had graphed earlier.

Search This Blog

Karen's Lab Notebook

Distribution of Read Depth Per Site when all individuals are combined

Comments

Post a Comment