Home > Bioinformatics > SRA: Seriously Ridiculously “bentbackward” Archive

SRA: Seriously Ridiculously “bentbackward” Archive

OK, so i wanted to rerun some published 454 data with some inhouse shell scripts to test if they are giving me the same results. First, I needed to get hands on those fasta files, which I thought’d be “like this“, but instead it turned out to be opposite. All the data generated from 454 are stored in SRA (Short Read Archive) hosted by NCBI. SRA is shutting down due to budget constraints but it is still in service.

So, as per the paper, i went to the SRA website (I used google chrome) with the study# and the sample#. I entered the study# (It starts with SRP00**) in the search box, and the result page had all the samples from the study ready to download. However, there was one caveat, Asperasoft, a high speed file transfer utility must be installed to download the file. Additionally, the software only works in Firefox. So, i downloaded Asperasoft, but this plugin is not “double-click” type installation where u can double click the file and it installs somewhere i dont wanna know. For installing Aspersoft, there is an instruction here (Download the right version). All I needed to do was go to terminal and run the downloaded file (which is apparently an shell script)

sh aspera-connect-2.4.7.37118-linux-64.sh

the name of the file will be different for mac and windows. Duh!

After the installation, I fired up Firefox, went to SRA website, typed the study#, and download all the .SRA files that i needed.

Well, problem solved. But, wait a minute, I want fasta files, not SRA files. So, now how do i go about getting the fasta and qual files from the SRA files. Take a guess?

Yes, you are right. Another software.

A utility tool is provided by the ncbi here. I downloaded and untarred the software, which contained number of scripts that deal with SRA files, but all i need is a script that gives me fasta and a qual file. For that, i first converted .SRA files to a fastq file which contains both nucleotides and its quality scores.

COMMAND: ./fastq-dump -A SRR0**** SRR0****.sra

Remember to cd into the directory with the scripts and copy the .sra file into the directory.

Advertisements
Categories: Bioinformatics
  1. vijay
    September 21, 2012 at 12:39 pm

    Very true, this is the first time me hating anything from NCBI.

  2. December 10, 2012 at 2:57 pm

    Thx, I’m not a fan of SRA format eather, thx for you documentation 😉

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: