I often hear this phrase “see the forest through the trees” and I think many students analyzing their very first NGS dataset can relate. The amount of data can be staggering and one can get lost in the details and forget the bigger picture. I still find myself doing this and it takes my PI to come in and say that phrase to me. For example, I am looking for a particular sequence in my dataset and I was completely sidetracked for days trying to figure out why I kept seeing this other sequence in my dataset that didn’t match the organism I sequence! I finally figured out that during library prep, my DNA was sheared below 100 bases and that unknown sequence was the sequence adapter. The interesting note is that, this doesn’t matter. I’m not going to publish that in a paper, I’m not even going to mention it in our paper!
-My advice would be to list out questions you want to answer using your dataset. Focus on those questions and don’t get sidetracked by the X’s and O’s….or in this case A’s, T’s, G’s, and C’s. I am still learning this but I think it’s vital to new students who are beginning with their very own datasets.