First stop: I accidentally landed into this paper:
TEACHING MULTIVARIATE STATISTICS TO ECOLOGISTS AND THE DESIGN OF ECOLOGICAL EXPERIMENTS TO STATISTICIANS: LESSONS FROM BOTH SIDES
Snippets from: Link
“Ecologists generally become interested in multivariate analysis because they already have multivariate data”
“Incorrect inferences and conclusions can be drawn from ecological experiments that fail to take into account natural temporal and spatial variability. “
Which species are responsible for group differences?
This last line lead me to another paper by the author. At this time, I was generally interested to find out ways to identify species that are causing the difference between two communities: link
CANONICAL ANALYSIS OF PRINCIPAL COORDINATES: A USEFUL METHOD OF CONSTRAINED ORDINATION FOR ECOLOGY
An unconstrained ordination may be useful to visualize overall patterns of dispersion, but this simple example also demonstrates how real differences in location, which were masked in the PCA, were uncovered by the canonical approach.
In either case, correlations of species with canonical axes will provide a good indication of which species should be investigated in more detail with univariate analysis.
Clearly, this use of correlations with canonical axes is an indirect ‘‘post hoc’’ way of identifying possible contributions of individual species to differences among groups.
I failed to identify the right procedure, however it is unclear if it should be trusted anyways based on the last snippet that i posted here.
“The inflation of R2 statistics and the irregularities in the forward selection of eigenvectors indicate that the PCNM and MEM methods are unstable and vulnerable to statistical artefacts “link1
Some common ecological terms that i frequently run into in ecology papers followed by links to some relative articles or papers about them. The links are usually top google hits and highly relevant to understanding the jargon it follows. In some cases, the links are results of “midnight caffein driven search rashes” that explain the jargons well, and not always on top of google hits.
(link1) : “A variant form of multiple regression can be used to fit a nonlinear model of an explanatory variable x (or several explanatory variables xj) to a response variable y. ”
(link1):”The Hellinger transformation is relativization by row (sample unit) totals, followed by taking the square root of each element in the matrix.”
(link1):”a function f(x) is a unimodal function if for some value m, it is monotonically increasing for x ≤ m and monotonically decreasing for x ≥ m.”
(link1)”The technique represents the spatial configuration of sample points using principal coordinates of a truncated distance matrix amongst points. The resulting PCNM axes with positive eigenvalues are used as spatial components in variation partitioning, with each axis potentially modelling species clustering at different distances amongst sampling units. ”
(link2)”We need statistical methods to model spatial or temporal structures at all scales. ”
(link1): “In broad terms the spectral theorem provides conditions under which an operator or a matrix can be diagonalized (that is, represented as a diagonal matrix in some basis)”
(link1): “The result is that the axes of the final ordination, rather than simply reflecting the dimensions of the greatest variability in the species data, are a linear combination of the environmental variables and the species data.”
“The choice of environmental variables greatly influences the outcome of CCA and other constrained ordinations.”
“The length of the arrow is proportional to the rate of change, so a long pH arrow indicates a large change and indicates that change in pH is strongly correlated with the ordination axes and thus with the community variation shown by the diagram.”
“In any case, you can always remove superfluous variables if they are confusing or difficult to interpret”
(link1):”A contingency table is a tabular representation of categorical data .”
link1“it starts from assigning arbitrary numerical scores to one variable values”
(link1): “An unconstrained ordination procedure does not use a priori hypotheses in any way, but reduces dimensions on the basis of some general criterion, such as minimizing residual variance (as in PCA) or minimizing a stress function (NMDS) ”
principal component analysis (PCA), correspondence analysis (CA), metric multidimensional scaling (also called principal coordinate analysis or PCO)and nonmetric multidimensional scaling.
(link1)” The first premise of this distance function is that it is calculated on relative counts, and not on the original ones, and the second is that it standardizes by the mean and not by the variance. ”
(link1)”locations close to each other exhibit more similar values than those further apart”.
(link1)”new techniques were developed to constrain the ordination according to the table E of explanatory environmental variables (‘‘direct compari- son,’’ ‘‘direct gradient analysis’’; ”
“Technically, direct gradient analysis can be viewed as an extension of multiple regression, which has a single response variable, to the case of a multi-species response table: ”
(link1)“Historically, ecologists have first used indirect ap- proaches for interpreting the structures of species assemblages (structural information extracted by the eigenanalysis of Y) in relation to environmental vari- ability: site scores along the ordination axes, which are composite indices of species abundances contained in Y, were compared a posteriori to environmental variables (‘‘indirect comparison,’’ ‘‘indirect gradient analysis’’)
”
(link1)”concentrates on the eigenanalysis of the fitted community table, allowing the direct analysis of the variation in species abundances explained by the environmental variability. ”
Anyways, PNAS has latex template, along with the class and style file, which can be downloaded from here http://www.pnas.org/site/authors/LaTex.xhtml (or just google Latex PNAS).
However, when i tried to compile it, it DIDN’T (I use TexShop 2.47) and TexLive-2009 (Ok! I need to update.) But after a quick google, I found the solution here
http://www.latex-community.org/viewtopic.php?f=23&t=1470
which involved changing fonts in the .sty file provided by PNAS.
$awk ‘/^>/{$0=”>”++i}1’ test.fna > test1.fna
Open Terminal
Second,
Connect to NCBI genome FTP
$ftp ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
Third,
Check out the list of genomes
ftp>ls
Fourth,
cd into the directory of your organism
ftp>cd <favorite_microbe>
ftp>mget *.gbk
mget *.gbk [anpqy?]?
type y and enter and the file will be downloaded in your computer (at the same directory from where you connected to ftp)