NGS Bioinformatics Doesnt Have To Be Scary Seq It Out 4
If you are new to Next Generation Sequencing,you ve heard about the large amount of data it generates and the challenges with findingmeaning in all that data. But, don t worry, just like working at the lab late at night,bioinformatics doesn t have to be scary. The power of next generation sequencing, orNGS, is the ability to interrogate 100 s, or 1000 s of genes or even whole genomes ina single sequencing run. While the throughput and speed are ideal for accelerating genetic research, the amount of data may be overwhelming Finding what you are looking for can sometimesfeel like searching for a needle in a haystack.
Fortunately, many NGS bioinformatics toolstake the pain out of data analysis and interpretation. Let s take a look at the general NGS dataanalysis workflow and see why it isn't so scary after all. The input into NGS systems is a collectionof DNA fragments, known as libraries. These library fragments can range in size, from about 50bp to 1000bp, depending on the system used. Basically, NGS systems sequence theselibrary fragments and automatically process the raw sequence data to make sure you gethigh quality sequences, referred to as reads. The reads are presented in a manner we labscientists all understand, like A, T, G and C.
Did you know that if you sequenced everybody'sgenome on earth, you would need about 21 Exabytes of space on your computer to keep their listof A,T,G and Csé That's about 21 billion gigabytes! You definitely need another external drivefor that one. Let s take a look at our lab book NGS can produce a bunch of A, T, C and G's,but how do we make sense of it allé The collection of sequencing reads can be aligned to a reference genome, generating a Binary Alignment Mapping file, or BAM file. This standard file is the input for many NGS software tools and can
be used for a variety of applications, includingdownstream variant detection. No reference genomeé No problem. The collection of reads can also be used by specialized NGS software for building a reference genome, called de novo assembly. So now you have a BAM file with aligned reads,what s nexté Let s use the example of variant detection and dive deeper into how to findthat needle in a haystack. Now with the help of our bioinformatics tools,we will determine if the sequence information contains a variant when compared to the referencegenome.
Variants can be single nucleotide polymorphisms, or SNPs, nucleotide insertionsor deletions, also known as indels, as well as, structural variants. The output of variantcalling is a Variant Call Format, or VCF file. These files contain a list of all variantsidentified depending on the settings used by the variant detection software. But what is the biological meaning of theobserved changesé This is where NGS bioinformatics analysis gets really interesting. There areseveral software tools that use the VCF file as input.
The tools compare that informationagainst a large collection of annotation databases that associate a variant to some type of function,process, pathway or disease. Filtering your data based on these annotations helps narrowyour focus to variants relevant to your research, getting you closer to that needle. High throughput NGS is revolutionizing genomics,getting us data faster than ever before. And with all the available NGS data analysis tools,uncovering critical genetic associations and trends or even putting together a new referencegenome isn't as daunting as it once was. NGS combined with advanced bioinformatics toolsmeans we are not just getting data faster,