What problem, you say? Roughly speaking, given a set of bacterial genomes, there has been no way to detect all the SNPs between them, including accessory genes. We solve this.
First i'll formulate the problem, give a long bioinformatic discussion, and at the end show some amazing figures/results for E coli. Feel free to skip to the figures 3/n
The problem is this. If you line up two human genomes, the alignable proportion will be 99% or something huge. So, you can go a long way mapping reads to the human reference, and doing SNP calling that way. Not so for bacteria, where that 99% figure could be just 50%. 4/n
Bacterial genomes are gene-rich (>80%), and follow a universal U-shaped gene frequency distribution - genes tend to be rare or common. There's a core genome shared by most, and a pool or rare and transient genes moving in and out of the population. 5/n
So - how can we possibly call SNPs in a situation like this? Whichever genome we use as reference will be missing many many genes, and therefore SNPs from the population? See this figure, with 36 SNPs, but no reference can see them all. 6/n
attaching end of thread! https://twitter.com/ZaminIqbal/status/1327134512862679043?s=20
You can follow @ZaminIqbal.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.