Given a set of distances (dis-similarities) between objects, is it possible to recreate a dimensional representation of those objects?
Model: Distance = square root of sum of squared distances on k dimensions dxy = √∑(xi-yi)2
Data: a matrix of distances
Find the dimensional values in k = 1, 2, … dimensions for the objects that best reproduces the original data.
Example: Consider the distances between nine American cities. Can we represent these cities in a two dimensional space.
library(psych)
library(psychTools)
data(cities)
cities
## ATL BOS ORD DCA DEN LAX MIA JFK SEA SFO MSY
## ATL 0 934 585 542 1209 1942 605 751 2181 2139 424
## BOS 934 0 853 392 1769 2601 1252 183 2492 2700 1356
## ORD 585 853 0 598 918 1748 1187 720 1736 1857 830
## DCA 542 392 598 0 1493 2305 922 209 2328 2442 964
## DEN 1209 1769 918 1493 0 836 1723 1636 1023 951 1079
## LAX 1942 2601 1748 2305 836 0 2345 2461 957 341 1679
## MIA 605 1252 1187 922 1723 2345 0 1092 2733 2594 669
## JFK 751 183 720 209 1636 2461 1092 0 2412 2577 1173
## SEA 2181 2492 1736 2328 1023 957 2733 2412 0 681 2101
## SFO 2139 2700 1857 2442 951 341 2594 2577 681 0 1925
## MSY 424 1356 830 964 1079 1679 669 1173 2101 1925 0
The output gives us the the original distance matrix (just to make sure we put it in correctly, the x,y coordinates for each city, and then the following graph.
city.location <- cmdscale(cities, k=2) #ask for a 2 dimensional solution
round(city.location,0) #print the locations to the screen
## [,1] [,2]
## ATL -571 248
## BOS -1061 -548
## ORD -264 -251
## DCA -861 -211
## DEN 616 10
## LAX 1370 376
## MIA -959 708
## JFK -970 -389
## SEA 1438 -607
## SFO 1563 88
## MSY -301 577
This solution can be represented graphically:
plot(city.location,type="n", xlab="Dimension 1", ylab="Dimension 2",main ="cmdscale(cities)") #put up a graphics window
text(city.location,labels=names(cities)) #put the cities into the map
Note that the solution is not quite what we expected (it is giving us a mirrored Australian orientation to American cities.) However, by reversing the signs in city.location, we get the more conventional representation:
city.location <- -city.location
plot(city.location,type="n", xlab="Dimension 1", ylab="Dimension 2",main ="cmdscale(cities)") #put up a graphics window
text(city.location,labels=names(cities)) #put the cities into the map
A useful feature is R is most commands have an extensive help file. Asking for help(cmdscale) shows that R includes a distance matrix for 20 European cities. The following commands (taken from the help file) produce a nice two dimensional solution. (Note that since dimensions are arbitrary, the second dimension needs to be flipped to produce the conventional map of Europe.)
loc <- cmdscale(eurodist, k = 2)
x <- loc[,1]
y <- -loc[,2]
plot(x, y, type="n", xlab="", ylab="", main="cmdscale(eurodist)")
text(x, y, colnames(as.matrix(eurodist)), cex=0.8)
For gene expression matrices, use limma::plotMDS
function, http://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/limma/html/plotMDS.html