Home > Microbiology > AmpliconNoise: Silence please!

AmpliconNoise: Silence please!

Microbial community analysis using next generation sequencing of 16S rDNA is plagued by technical errors that are difficult to account. It is important to separate the noise from the actual data for not just the correct assessment of a microbial community but also to separate the novel organism from pyrosequencing noise.

There are many algorithms which are capable of approximate assesment of the community despite the errors. However, only few algorithms are capable of identifying and removing the erroneous sequences. AmpliconNoise is a popular choice for the job. Although the algortithm is computationally intensive it has been recommended for removal of both PCR and sequencing errors.

Here, I will try to decipher the mechanistic detail of AmpliconNoise. The algorithm is heavy on Math, so bear with me and correct me if i am wrong.

AmpliconNoise is the extension of the previous algorithm PyroNoise, which simultaneously accounted for both PCR and sequencing errors. In AmpliconNoise, the algorithm is divided into two: PyroNoise (accounts for pyrosequencing error and flowgram clustering without alignment) and SeqNoise(accounts for PCR error and sequence clustering).

In addition to removal of insertion/deletion due to PCR and sequencing errors, it also consists of an algorithm called Perseus that flag chimeras without the need of a reference database.

The algorithm starts with the removal of reads that doesnt pass the strict conditions. Any sequence that had the signal intensities that were less than 0.5 were truncated at that position. In the case of Titanium only those reads were kept that had its first noisy flow occurred on or after 360. The first step gets rid of approximately 15% of the reads.

In the second step, for removing Pyrosequencing Noise, the algorithm implements the distance generated using the flowgram for each signal that reflects the probability that a sequence is generated from true sequence given pyrosequencing error. Then, a true sequence is inferred using maximum likelihood using an expectation-maxmization (EM) algorithm.

In the third step, PCR Noise are removed. It uses a similar distance metrics that reflects the probability that a read is generated from a true sequence given PCR error.

However, in addition to running time, the learning curve is quite steeper with instruction not as intuitive and straight forward as in mothur. In addition, it requires a cluster to run the full or even half of the samples from a 454 run.

Categories: Microbiology
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: