I recently performed an Illumina sequencing run with 3 libraries. The amount of data that was generated was enormous. We are talking about over 100 million sequence reads per library! Naturally, when I first received my data I was so excited! Then as the data analysis began, I truly realized there was more data than time. Now I am somewhat of a different graduate student. I have 4 different projects that I am juggling and I am my lab’s lab manager. I also am in charge of our 5 undergrads and their projects. So my time is often split between performing my own experiments, helping undergrads, and then the rest is data analysis. If you are starting out with your very own first dataset, try and use a friendly program to help you analyze it. CLC-bio is a great way to sort of break the ice on how to handle your dataset. The program is very user-friendly and allows you to trim, map, assembly your data. But, BEWARE! I have started introducing other graduate students to this program and they get so caught up in the easy-to-use interface that they don’t try and understand what it is they are doing. I always caution other students, do not be a robot! Understand everything you are doing to analyze your data and why.
Take home message: There are tons of free programs out there, and if this is your first time analyzing a NGS (next generation sequencing) dataset, I recommend using CLC-bio (you can get a free two week trial) as a sort of “getting our feet wet” approach.