Assigning D84A scaffolds to chromosomes

Working to assign D84A scaffolds to chromosomes. I first took the D84A scaffolds that blasted to both TCO and PA42 and were over 10kb. For each D84A scaffold I asked which PA42 scaffolds it blasted too (match length 5000 or greater), and then asked which chromosomes those scaffolds were anchored too. This was at times slightly complicated as many of the PA42 scaffolds map to more than one chromosome. In some cases D84A scaffolds blasted to multiple PA42 scaffolds that mapped to the same chromosome. This increased my confidence in the assignment. In some cases D84A scaffolds blasted to a single PA42 scaffold that mapped to one chromosome. These were easy to call. But, in some cases D84A scaffolds blasted to one or more PA42 scaffolds that mapped to multiple chromosomes. In those cases I looked at what part of the PA42 scaffold the D84A scaffold mapped too and then which chromosome that part of the PA42 scaffold mapped too. For cases in which the portion the D84A scaffold blasted to on the PA42 scaffold was in between the two portions that mapped to the chromosome, I could not assign it to a chromosome. In some cases the D84A scaffold blasted to multiple PA42 scaffolds that mapped to different chromosomes, I could also not assign these D84A scaffolds to chromosomes. Finally, I was able to assign a few more D84A scaffolds to chromosomes using my network analysis. Here are the results.

1)    Overall I assigned 449 of 490 scaffolds (over 10kb, TRUE TRUE) to chromosomes.
2)    When summing scaffolds lengths for each chromosome, I have between 4,606,521 and 11,202,625 basepairs of scaffold assigned to each chromosome.
    PA42chr D84Alength.x
1        0      2839172
2        1      8103368
3        2     10411359
4        3      8979782
5        4      7535519
6        5      8584121
7        6      7088211
8        7      8838894
9        8      9724858
10       9      8120477
11      10     11202625
12      11      4606521
13      12      8089763

3)    When summing across all chromosomes, I have 101,285,498 basepairs of scaffolds assigned to chromosomes, and 2,839,172 basepairs of scaffolds that I was unable to assign.

4)    If we only use scaffolds that have two agreeing hits to anchor them to chromosomes, we still anchor a total of 70,559,409 basepairs to chromosomes, with chromosomes ranging from 3,662,907 to 8,483,858 of basepairs.

Next I added in scaffolds over 5kb (so 5-10kb). For this I was able to assign 176 of 209 scaffolds to chromosomes. When combining these with the over 10kb scaffolds, summing scaffolds lengths for each chromosome, I have between 4,624,982 and 11,392,960 basepairs of scaffold assigned to each chromosome.
6)     PA42chr D84Alength.x
7)  1        0      3036714
8)  2        1      8158404
9)  3        2     10504813
10) 4        3      9045392
11) 5        4      7552670
12) 6        5      8669302
13) 7        6      7160474
14) 8        7      9014528
15) 9        8      9986461
16) 10       9      8220021
17) 11      10     11392960
18) 12      11      4624982
19) 13      12      8191668

20) When summing across all chromosomes, I have 102,521,675 basepairs of scaffolds assigned to chromosomes, and 3,036,714 basepairs of scaffolds that I was unable to assign. 

21) If we only use scaffolds that have two agreeing hits to anchor them to chromosomes, we anchor a total of 70,653,595 basepairs to chromosomes, with chromosomes ranging from 3,669,680 to 8,483,858 of basepairs.

What I got from this was that adding in the 5-10kb scaffolds made very little difference. I think this is because there weren't that many of them (only ~200). Even if I could assign all 200 and they were all 10kb, that would still only add 2MB.

Looking at the distribution of D84Acontig lengths:




There are 690 contigs over 10kb, and 362 between 5 and 10kb. There are 1738 between 2.5 and 5kb, 1786 between 1.5 and 2.5kb, and 1550 between 1 and 1.5kb. (All our scaffolds are above 1kb, we only included those in the supernova output.)

12/05/17
I went back and added in more 10kb and higher scaffolds by dropping the match length cutoff from 5kb to 2.5kb. Some of these had matches to more than one PA42 scaffold that mapped to the same chromosome, giving me a few more high confidence assignments. The other ones I am a little less confident about that when I used the cutoff off 5kb match length, simply because of the lower match length. Just adding in the new high confidence assignments to my previous chromosome assignments leads to a total of 103MB being assigned to chromosomes, so a bit higher than before. If I add in the lower confidence ones as well, it brings us to 106.8MB being assigned to chromosomes, with chromosomes ranging from 4,666,601 to 11,794,172 bp assigned. For the high confidence "two hit" scaffolds we now have 71.6MB assigned to chromosomes, which is a bit higher than the 70.7MB we had before, with chromosomes ranging from 3,669,680 and 7,044,102 bp assigned.


Comments

  1. Great. And, presumably, you could figure out how much more genome you would ultimately map if you could could map those scaffolds with size < 5Kb. Presumably not that much more, even though there are lots of them?

    ReplyDelete
  2. So, working on the 2.5-5Kb scaffolds would only give us, at most, 4-8Mb.

    The 1.5-2.5 would give 2.7-4.5Mb

    The 1-1.5 would give 1.5-2.2Mb

    So, taken together we could (ideally) place 8-15Mb more. Possibly not worth it at this point...



    ReplyDelete

Post a Comment