Archive for the ‘Microbiology’ Category

Sunday Morning Links

December 9, 2012 Leave a comment

A link on how to reduce variables in multivariate analysis like CCA or RDA

April 23, 2012 Leave a comment

A list of R packages for environmental and ecological data analysis

April 16, 2012 Leave a comment

Diversity index: Simpson’s Index (D)

January 26, 2012 Leave a comment

Simpson diversity index is another diversity measure that takes into account the number of species and their abundance in a community. It is calculated as the proporotion (pi)  of species i to the total number of species , that is squared and summed for all the species and the reciprocal is taken. It has also been defined as the probability the two individual taken from the same sample will belong to same species.


∑ (Pi ^2)

The value of D ranges between 0 and 1, with 0 being infinite diversity and 1 being no diversity. This is not quite intuitive with lower number indicating higher diversity, so in these cases Simpson’s Index of Diversity (1 – D) is preferred, which makes more sense in terms of number as higher number donating higher diversity. In this case, (1-D) is defined as a probability the two individual taken from a sample will belong to two different species.


Categories: Microbial Ecology

Common OTUs across sample using mothur

November 23, 2011 Leave a comment

What do i have?
Three technical replicates of 16S amplification from a sample.
What do I intend to do?
Test the goodness of the reproducibility.
How am i going to do it?
Based on the Zhou et al. 2011 ISMEJ paper, I will compare the common OTUs across all the samples. At the end, I want to make a venn diagram that represents the number of OTUs that are common across the sample?
Let it begin……
I will use mothur v 1.22 to do this. Lets name my sample as A1, A2, and A3. I got a fasta file for each sample named: A1.fasta, A2.fasta, and A3.fasta. The sequences were already filtered to remove erroneous and chimeric reads. Additionally, they were trimmed to same region, and then aligned and filtered(remove the column-gaps from alignment) as well.
First, I created a group file. Its a two column file, which contains the name of the sequence in the first column and the sample name in the second column. For example:
G45667889 A1
G47879890 A2
G45454800 A3
G5803i808 A1
There is a command in mothur called, which creates this file, if you had not created it during demultiplexing step.,group=A1-A2-A3)

It will create a file called merge.groups, which will be used extensively for downstream analysis.

Since, we are going to find the common OTUs across samples, we will combine all the fasta using cat.

cat A1.fasta A2.fasta A3.fasta > A123.fasta

AmpliconNoise: Silence please!

Microbial community analysis using next generation sequencing of 16S rDNA is plagued by technical errors that are difficult to account. It is important to separate the noise from the actual data for not just the correct assessment of a microbial community but also to separate the novel organism from pyrosequencing noise.

There are many algorithms which are capable of approximate assesment of the community despite the errors. However, only few algorithms are capable of identifying and removing the erroneous sequences. AmpliconNoise is a popular choice for the job. Although the algortithm is computationally intensive it has been recommended for removal of both PCR and sequencing errors.

Here, I will try to decipher the mechanistic detail of AmpliconNoise. The algorithm is heavy on Math, so bear with me and correct me if i am wrong.

AmpliconNoise is the extension of the previous algorithm PyroNoise, which simultaneously accounted for both PCR and sequencing errors. In AmpliconNoise, the algorithm is divided into two: PyroNoise (accounts for pyrosequencing error and flowgram clustering without alignment) and SeqNoise(accounts for PCR error and sequence clustering).

In addition to removal of insertion/deletion due to PCR and sequencing errors, it also consists of an algorithm called Perseus that flag chimeras without the need of a reference database.

The algorithm starts with the removal of reads that doesnt pass the strict conditions. Any sequence that had the signal intensities that were less than 0.5 were truncated at that position. In the case of Titanium only those reads were kept that had its first noisy flow occurred on or after 360. The first step gets rid of approximately 15% of the reads.

In the second step, for removing Pyrosequencing Noise, the algorithm implements the distance generated using the flowgram for each signal that reflects the probability that a sequence is generated from true sequence given pyrosequencing error. Then, a true sequence is inferred using maximum likelihood using an expectation-maxmization (EM) algorithm.

In the third step, PCR Noise are removed. It uses a similar distance metrics that reflects the probability that a read is generated from a true sequence given PCR error.

However, in addition to running time, the learning curve is quite steeper with instruction not as intuitive and straight forward as in mothur. In addition, it requires a cluster to run the full or even half of the samples from a 454 run.

Categories: Microbiology

Nitrosopumilus maritimus: not that extreme.

For several years after the discovery of Archaea, it was always considered as extremophiles. Recently more and more evidences have pointed that they are ‘not what we thought they were‘. Molecular microbial ecology techniques have discovered Crenarchaeota (a phylum associated with Archaea) in moderate environments like ocean, soil, and human gut. One of the prime example of mesophilic Crenarchaeota is Nitrosopumilus maritimus isolated from the rocky bottoms of a marine aquarium in Seattle, WA.