Distribution of Read Depth Per Site when all individuals are combined
Just wanted to go ahead and put this graph up because I think it looks really nice! This is looking at the distribution of read depth per site using the VCF file for all clones sequenced so far and just combining all clones. Since it is from the VCF file there should not be PCR duplicates, right? This is also only looking at the "good" scaffolds. Here is the code in case you want to check what I did.
genofile <- seqOpen("/mnt/spicy_3/Karen/Sp2017/SpFall2016Sp2017B.seq.gds")
snps <- data.table(variant.ids = seqGetData(genofile, "variant.id"),
chr = seqGetData(genofile, "chromosome"),
pos = seqGetData(genofile, "position"),
dp = seqGetData(genofile, "annotation/info/DP"))
TCOPA42 <- fread("TCOPA42merge.csv")
colnames(TCOPA42) <- c("Num","chr","RefContigx","RefLengthx","Evaluex","MatchLenx","NumMatchx","PA42","RefContigy","RefLeny","Evaluey","MatchLeny","NumMatchy","TCO","D84Alength","PA42good","TCOgood")
TCOPA42[, chr:=as.character(chr)]
setkey(TCOPA42, chr)
setkey(snps, chr)
m <- merge(snps, TCOPA42, all.x=T, all.y=T)
m[,dapBlast:=paste(PA42good, TCOgood, sep=" ")]
mgood <- m[dapBlast=="TRUE TRUE"]
hist(mgood$dp, breaks=100)
mgoodunder6000 <- mgood[dp < 6000]
hist(mgoodunder6000$dp, breaks=100)
genofile <- seqOpen("/mnt/spicy_3/Karen/Sp2017/SpFall2016Sp2017B.seq.gds")
snps <- data.table(variant.ids = seqGetData(genofile, "variant.id"),
chr = seqGetData(genofile, "chromosome"),
pos = seqGetData(genofile, "position"),
dp = seqGetData(genofile, "annotation/info/DP"))
TCOPA42 <- fread("TCOPA42merge.csv")
colnames(TCOPA42) <- c("Num","chr","RefContigx","RefLengthx","Evaluex","MatchLenx","NumMatchx","PA42","RefContigy","RefLeny","Evaluey","MatchLeny","NumMatchy","TCO","D84Alength","PA42good","TCOgood")
TCOPA42[, chr:=as.character(chr)]
setkey(TCOPA42, chr)
setkey(snps, chr)
m <- merge(snps, TCOPA42, all.x=T, all.y=T)
m[,dapBlast:=paste(PA42good, TCOgood, sep=" ")]
mgood <- m[dapBlast=="TRUE TRUE"]
hist(mgood$dp, breaks=100)
mgoodunder6000 <- mgood[dp < 6000]
hist(mgoodunder6000$dp, breaks=100)
Next I took the median read depth per "good" scaffold, and then graphed the histogram again. So now we are looking at median read depth per scaffold instead of read depth per site.
This does somewhat resemble the shape of the true/true read depth per scaffold I had graphed earlier.
Comments
Post a Comment