Dealing with Multidimensional datasets- Non-metric Multidimensional Scaling (NMDS)

mahimagupta950
Jun 11, 2020
2 min read

This blog post deals with the basics of what NMDS is and under what circumstances it is applicable. For further queries or help regarding specific problems, reach out to us

Data analysis usually entails dealing with large, complex data sets, consisting of multiple variables, differing from each- other in more than one way/dimension, especially when we are working with large ecological/biological data sets.

Ordination techniques comprises of a group of methodologies which can be employed for projecting multidimensional data sets on to a low dimensional space (2D or 3D), in order to facilitate its visualization and further analysis. They are used for ordering large and complex data sets.

NMDS is one of the ordination techniques available for ordering large data sets; some of the other techniques that you might have come across includes:

Different samples have been color coded and then plotted using NMDS, such that similar samples are close to each- other. — Fig 1. Relationships among bacterial communities from different host species and habitats.

PCA (Principal Component Analysis)
PCoA (Principal Coordinates Analysis)
CCA (Canonical Correspondence Analysis)
PO (Polar Ordination)
RDA (Redundancy Analysis)

NMDS is a distance based technique, which means it relies on distance matrix or dissimilarity matrix as input (Eg. difference in species population between two different sites). For ordering the data, it uses a rank based approach, which means that the magnitude of distance is substituted with ranks (Eg. Instead of plot A being 0.45 units distant from plot B and 0.25 units distant from plot C, it ranks plot B as 1; farthest from plot A, and plot C as 2)

Basic Methodology

The number of dimensions that the data needs to be projected on to is determined.
An appropriate distance matrix is decided upon.
A symmetrical distance matrix of the samples is generated.
The samples are then plotted on a 2D graph, known as ordination space and the distance between each sample is depicted.
The distance in ordination space is compared with the calculated distance in the matrix.
Correlation between the two distances is calculated.
The sample points are then moved iteratively in the ordination space until an arrangement of data points is found that gives the best correlational fit.
A stress value is calculated in order to determine how well the 2D space represents the actual multidimensional space.

Lower the stress value, better is the representation of the data in the actual multidimensional space.

Further Reading: (Some research papers with interesting applications of NMDS)

Trammell, Tara L. E.; Carreiro, Margaret M. 2011. Vegetation composition and structure of woody plant communities along urban interstate corridors in Louisville, KY, U.S.A. Urban Ecosyst 14:501-524.

Bik, E., Costello, E., Switzer, A.et al.Marine mammals harbor unique microbiotas shaped by and yet distinct from the sea.Nat Commun7,10516 (2016). https://doi.org/10.1038/ncomms10516

Dealing with Multidimensional datasets- Non-metric Multidimensional Scaling (NMDS)

Recent Posts

Comments