Complete linkage clustering merges two clusters whose maximum pairwise distance between their members is smallest among all cluster pairs. It tends to produce compact, spherical clusters and is less affected by chaining than single linkage.
We use the same height and weight dataset as before.
weight.kg <- c(15,49,13,45,85,66,12,10)
height.cm <- c(95,156,95,160,178,176,90,78)
data <- matrix(cbind(weight.kg, height.cm), nrow = 8)
dimnames(data)[[2]] <- c("weight.kg", "height.cm")
data
## weight.kg height.cm
## [1,] 15 95
## [2,] 49 156
## [3,] 13 95
## [4,] 45 160
## [5,] 85 178
## [6,] 66 176
## [7,] 12 90
## [8,] 10 78
Compute Euclidean distances between all pairs of observations.
dist <- dist(data, diag = TRUE)
round(dist, 3)
## 1 2 3 4 5 6 7 8
## 1 0.000
## 2 69.836 0.000
## 3 2.000 70.831 0.000
## 4 71.589 5.657 72.450 0.000
## 5 108.577 42.190 109.877 43.863 0.000
## 6 95.718 26.249 96.799 26.401 19.105 0.000
## 7 5.831 75.664 5.099 77.389 114.337 101.548 0.000
## 8 17.720 87.207 17.263 89.157 125.000 112.872 12.166 0.000
Find the smallest nonzero distance to identify the first merge.
min(as.dist(dist)[as.dist(dist) != 0]) # 2
## [1] 2
The closest pair is (1,3) — these two will be merged first.
In complete linkage, when merging clusters, the distance between the new cluster and another item is the maximum distance between any members of the two clusters.
old.dist <- as.matrix(dist)
new.dist <- as.matrix(dist)
# Merge (1,3)
new.dist[1,3] <- new.dist[3,1] <- 0
# Update distances using max rule
for (i in 1:8) {
new.dist[1,i] <- new.dist[i,1] <- max(old.dist[1,i], old.dist[3,i])
new.dist[3,i] <- new.dist[i,3] <- max(old.dist[1,i], old.dist[3,i])
}
# Check matrix symmetry
all.equal(new.dist, t(new.dist))
## [1] TRUE
round(as.dist(new.dist), 3)
## 1 2 3 4 5 6 7
## 2 70.831
## 3 2.000 70.831
## 4 72.450 5.657 72.450
## 5 109.877 42.190 109.877 43.863
## 6 96.799 26.249 96.799 26.401 19.105
## 7 5.831 75.664 5.831 77.389 114.337 101.548
## 8 17.720 87.207 17.720 89.157 125.000 112.872 12.166
Next merge will occur between the two clusters with the smallest nonzero distance.
min(as.dist(new.dist)[as.dist(new.dist) != 0]) # 5.657
## [1] 2
Now, we merge observations 2 and 4. Their distance to other points is determined by the maximum of pairwise distances.
new.dist2 <- new.dist
new.dist2[2,4] <- new.dist2[4,2] <- 0
# Update distances for cluster (2,4)
for (i in 1:8) {
new.dist2[2,i] <- new.dist2[i,2] <-
max(old.dist[2,i], old.dist[4,i])
new.dist2[4,i] <- new.dist2[i,4] <-
max(old.dist[2,i], old.dist[4,i])
}
# Check symmetry and new distances
all.equal(new.dist2, t(new.dist2))
## [1] TRUE
round(as.dist(new.dist2), 3)
## 1 2 3 4 5 6 7
## 2 71.589
## 3 2.000 72.450
## 4 71.589 5.657 72.450
## 5 109.877 43.863 109.877 43.863
## 6 96.799 26.401 96.799 26.401 19.105
## 7 5.831 77.389 5.831 77.389 114.337 101.548
## 8 17.720 89.157 17.720 89.157 125.000 112.872 12.166
# Identify next merge
min(as.dist(new.dist2)[as.dist(new.dist2) != 0]) # 5.831
## [1] 2
Next, the cluster (1,3) merges with observation 7. Distance updates again use the maximum rule.
new.dist3 <- new.dist2
# Merge cluster (1,3) with 7
new.dist3[1,7] <- new.dist3[7,1] <- new.dist3[3,7] <- new.dist3[7,3] <- 0
# Update distances using max rule
for (i in 1:8) {
new.dist3[1,i] <- new.dist3[i,1] <-
max(old.dist[1,i], old.dist[3,i], old.dist[7,i])
new.dist3[3,i] <- new.dist3[i,3] <-
max(old.dist[1,i], old.dist[3,i], old.dist[7,i])
new.dist3[7,i] <- new.dist3[i,7] <-
max(old.dist[1,i], old.dist[3,i], old.dist[7,i])
}
# Check matrix and display
all.equal(new.dist3, t(new.dist3))
## [1] TRUE
round(as.dist(new.dist3), 3)
## 1 2 3 4 5 6 7
## 2 75.664
## 3 5.099 75.664
## 4 77.389 5.657 77.389
## 5 114.337 43.863 114.337 43.863
## 6 101.548 26.401 101.548 26.401 19.105
## 7 5.831 75.664 5.831 77.389 114.337 101.548
## 8 17.720 89.157 17.720 89.157 125.000 112.872 17.720
# Find next merge candidate
min(as.dist(new.dist3)[as.dist(new.dist3) != 0]) # 17.72
## [1] 5.09902
hclust()Rather than continuing manually, we verify the steps using R’s
hclust() function.
hclust.complete <- hclust(dist(data), method = "complete")
hclust.complete$height
## [1] 2.000000 5.656854 5.830952 17.720045 19.104973 43.863424 125.000000
Output:
2.000000 5.656854 5.830952 17.720045 19.104973 43.863424 125.000000
These values correspond to the distances at which merges occurred, matching our manual calculations.
plot(hclust.complete, main = "Complete Linkage Dendrogram",
xlab = "Observation", ylab = "Distance")
Complete linkage produces tighter, more compact clusters.
hclust(method = "complete") confirms manual iterations
exactly.Would you like me to provide this version as a downloadable
.Rmd file (e.g.,
complete_linkage.Rmd)?