monologues with multivariate analysis
I wanted to rererefresh my understanding of multivariate analysis in ecology. So, here is my monologue of my googleventures.
First stop: I accidentally landed into this paper:
TEACHING MULTIVARIATE STATISTICS TO ECOLOGISTS AND THE DESIGN OF ECOLOGICAL EXPERIMENTS TO STATISTICIANS: LESSONS FROM BOTH SIDES
Snippets from: Link
“Ecologists generally become interested in multivariate analysis because they already have multivariate data”
“Incorrect inferences and conclusions can be drawn from ecological experiments that fail to take into account natural temporal and spatial variability. “
Which species are responsible for group differences?
This last line lead me to another paper by the author. At this time, I was generally interested to find out ways to identify species that are causing the difference between two communities: link
CANONICAL ANALYSIS OF PRINCIPAL COORDINATES: A USEFUL METHOD OF CONSTRAINED ORDINATION FOR ECOLOGY
An unconstrained ordination may be useful to visualize overall patterns of dispersion, but this simple example also demonstrates how real differences in location, which were masked in the PCA, were uncovered by the canonical approach.
In either case, correlations of species with canonical axes will provide a good indication of which species should be investigated in more detail with univariate analysis.
Clearly, this use of correlations with canonical axes is an indirect ‘‘post hoc’’ way of identifying possible contributions of individual species to differences among groups.
I failed to identify the right procedure, however it is unclear if it should be trusted anyways based on the last snippet that i posted here.
Dilemma in multivariate testing in ecology: My test is better than yours.
Snippets from papers that conclude one multivariate test is better than others for variance partitioning. Remember that these are just snippets and does not relay the overall message. However, if i list a pro here, you can be sure that there is a con somewhere else in the paper (follow link) and vice versa. At the end of the day, none of the tests are perfect, but are the best if used and interpreted as per authors’ manual.
 “Regardless of the philosophical merits of distancebased or rawdata based methods for testing beta diversity (Legendre, Bor card & PeresNeto 2005; Tuomisto & Ruokolainen 2006), it is clear that correlations based on distance matrices are inferior to RDA for modelling spatial patterns. ” from link1

“The inflation of R2 statistics and the irregularities in the forward selection of eigenvectors indicate that the PCNM and MEM methods are unstable and vulnerable to statistical artefacts “link1
Jargons from Ecology
Some common ecological terms that i frequently run into in ecology papers followed by links to some relative articles or papers about them. The links are usually top google hits and highly relevant to understanding the jargon it follows. In some cases, the links are results of “midnight caffein driven search rashes” that explain the jargons well, and not always on top of google hits.
 Polynomial Trend Surface Analysis
(link1) : “A variant form of multiple regression can be used to fit a nonlinear model of an explanatory variable x (or several explanatory variables xj) to a response variable y. ”
 Hellinger transformation
(link1):”The Hellinger transformation is relativization by row (sample unit) totals, followed by taking the square root of each element in the matrix.”
 Unimodal relationships
(link1):”a function f(x) is a unimodal function if for some value m, it is monotonically increasing for x ≤ m and monotonically decreasing for x ≥ m.”
 PCNM(principal coordinates of neighbor matrices )
(link1)”The technique represents the spatial configuration of sample points using principal coordinates of a truncated distance matrix amongst points. The resulting PCNM axes with positive eigenvalues are used as spatial components in variation partitioning, with each axis potentially modelling species clustering at different distances amongst sampling units. ”
(link2)”We need statistical methods to model spatial or temporal structures at all scales. ”
 Spectral decomposition
(link1): “In broad terms the spectral theorem provides conditions under which an operator or a matrix can be diagonalized (that is, represented as a diagonal matrix in some basis)”
 Canonical Correspondence Analysis
(link1): “The result is that the axes of the final ordination, rather than simply reflecting the dimensions of the greatest variability in the species data, are a linear combination of the environmental variables and the species data.”
“The choice of environmental variables greatly influences the outcome of CCA and other constrained ordinations.”
“The length of the arrow is proportional to the rate of change, so a long pH arrow indicates a large change and indicates that change in pH is strongly correlated with the ordination axes and thus with the community variation shown by the diagram.”
“In any case, you can always remove superfluous variables if they are confusing or difficult to interpret”
 Contingency Table
(link1):”A contingency table is a tabular representation of categorical data .”
 Reciprocal Averaging
link1“it starts from assigning arbitrary numerical scores to one variable values”
 Unconstrained Methods (Ordination)
(link1): “An unconstrained ordination procedure does not use a priori hypotheses in any way, but reduces dimensions on the basis of some general criterion, such as minimizing residual variance (as in PCA) or minimizing a stress function (NMDS) ”
principal component analysis (PCA), correspondence analysis (CA), metric multidimensional scaling (also called principal coordinate analysis or PCO)and nonmetric multidimensional scaling.
 ChiSquare distance
(link1)” The first premise of this distance function is that it is calculated on relative counts, and not on the original ones, and the second is that it standardizes by the mean and not by the variance. ”
 Spatial autocorrelation
(link1)”locations close to each other exhibit more similar values than those further apart”.
 Direct gradient analysis
(link1)”new techniques were developed to constrain the ordination according to the table E of explanatory environmental variables (‘‘direct compari son,’’ ‘‘direct gradient analysis’’; ”
“Technically, direct gradient analysis can be viewed as an extension of multiple regression, which has a single response variable, to the case of a multispecies response table: ”
 Indirect gradient analysis
(link1)“Historically, ecologists have first used indirect ap proaches for interpreting the structures of species assemblages (structural information extracted by the eigenanalysis of Y) in relation to environmental vari ability: site scores along the ordination axes, which are composite indices of species abundances contained in Y, were compared a posteriori to environmental variables (‘‘indirect comparison,’’ ‘‘indirect gradient analysis’’)
”
 Constrained ordination (or canonical analysis)
(link1)”concentrates on the eigenanalysis of the fitted community table, allowing the direct analysis of the variation in species abundances explained by the environmental variability. ”
Diversity Index: Shannon Index/ShannonWeaver Index (H)
A diversity index is the measure of species diversity in a given community. It is different from species richness in that unlike richness it also shows community composition and takes into account the relative abundance of species that are present in the community.
Shannon Index is a commonly used diversity index that takes into account both abundance and evenness of species present in the community. It is explained by the formula:
s
H = ∑ (P_{i} * ln P_{i})
i=1
where,
H = the Shannon diversity index
P_{i }= fraction of the entire population madeup of species i (proportion of a species i relative to TOTAL number of species present, not encountered)
S = numbers of species encountered
Here, a high value of H would be a representative of a diverse and equally distributed community and lower values represent less diverse community. A value of 0 would represent a community with just one species.
Shannon’s equitability (E_{H}) measures the evenness of a community and can be easily calculated by diving the value of H with H_max, which equals to lnS(S=number of species encountered). Its value ranges between 0 and 1, with being complete evenness.
E_{H=H/H_max=H/lnS}
Source:
http://www.tiem.utk.edu/~gross/bioed/bealsmodules/shannonDI.html