Analyzing your first NGS dataset?

I recently performed an Illumina sequencing run with 3 libraries.  The amount of data that was generated was enormous.  We are talking about over 100 million sequence reads per library! Naturally, when I first received my data I was so excited!  Then as the data analysis began, I truly realized there was more data than time.  Now I am somewhat of a different graduate student.  I have 4 different projects that I am juggling and I am my lab’s lab manager. I also am in charge of our 5 undergrads and their projects.  So my time is often split between performing my own experiments, helping undergrads, and then the rest is data analysis.  If you are starting out with your very own first dataset, try and use a friendly program to help you analyze it.  CLC-bio is a great way to sort of break the ice on how to handle your dataset. The program is very user-friendly and allows you to trim, map, assembly your data. But, BEWARE! I have started introducing other graduate students to this program and they get so caught up in the easy-to-use interface that they don’t try and understand what it is they are doing. I always caution other students, do not be a robot! Understand everything you are doing to analyze your data and why.

Take home message: There are tons of free programs out there, and if this is your first time analyzing a NGS (next generation sequencing) dataset, I recommend using CLC-bio (you can get a free two week trial) as a sort of “getting our feet wet” approach.

Installing a new program and getting errors

Disclaimer!  I feel like must put one in here. I am by no means an expert in anything that has to do with computational biology. I have found through my 2.5 years of teaching myself and getting advice, that more and more microbiologist are getting DNA sequencing datasets and have no idea how to analyze them. I am just going to be blogging handing out free tips and offer advice to pitfalls I fell into when I first started dipping my feet in the computational world.

I have been helping a fellow graduate student (microbial ecologist) with trying to run this program called pplacer. She, like me, has a strong background in microbiology and zero background in running anything from a terminal. After she spent about a week trying to get the program to run, she visited me. I sat there and listened to her problem and remembered something I learned in a computational phylogenetics course. The professor of that course told me about running executables and having to change the permissions. I dug through my notes (yes, I keep notes on everything that I do with a computer) and viola found the command:

chmod ugo+x scriptyouwanttorun

This will literally save you a lot of time! Just be sure to check that