Introduction

Complete linkage clustering merges two clusters whose maximum pairwise distance between their members is smallest among all cluster pairs. It tends to produce compact, spherical clusters and is less affected by chaining than single linkage.

1. Setting up the data

We use the same height and weight dataset as before.

weight.kg <- c(15,49,13,45,85,66,12,10)
height.cm <- c(95,156,95,160,178,176,90,78)

data <- matrix(cbind(weight.kg, height.cm), nrow = 8)
dimnames(data)[[2]] <- c("weight.kg", "height.cm")
data

##      weight.kg height.cm
## [1,]        15        95
## [2,]        49       156
## [3,]        13        95
## [4,]        45       160
## [5,]        85       178
## [6,]        66       176
## [7,]        12        90
## [8,]        10        78

2. Pairwise distances

Compute Euclidean distances between all pairs of observations.

dist <- dist(data, diag = TRUE)
round(dist, 3)

##         1       2       3       4       5       6       7       8
## 1   0.000                                                        
## 2  69.836   0.000                                                
## 3   2.000  70.831   0.000                                        
## 4  71.589   5.657  72.450   0.000                                
## 5 108.577  42.190 109.877  43.863   0.000                        
## 6  95.718  26.249  96.799  26.401  19.105   0.000                
## 7   5.831  75.664   5.099  77.389 114.337 101.548   0.000        
## 8  17.720  87.207  17.263  89.157 125.000 112.872  12.166   0.000

Find the smallest nonzero distance to identify the first merge.

min(as.dist(dist)[as.dist(dist) != 0])  # 2

## [1] 2

The closest pair is (1,3) — these two will be merged first.

3. Merge (1,3): update distances using maximum rule

In complete linkage, when merging clusters, the distance between the new cluster and another item is the maximum distance between any members of the two clusters.

old.dist <- as.matrix(dist)
new.dist <- as.matrix(dist)

# Merge (1,3)
new.dist[1,3] <- new.dist[3,1] <- 0

# Update distances using max rule
for (i in 1:8) {
  new.dist[1,i] <- new.dist[i,1] <- max(old.dist[1,i], old.dist[3,i])
  new.dist[3,i] <- new.dist[i,3] <- max(old.dist[1,i], old.dist[3,i])
}

# Check matrix symmetry
all.equal(new.dist, t(new.dist))

## [1] TRUE

round(as.dist(new.dist), 3)

##         1       2       3       4       5       6       7
## 2  70.831                                                
## 3   2.000  70.831                                        
## 4  72.450   5.657  72.450                                
## 5 109.877  42.190 109.877  43.863                        
## 6  96.799  26.249  96.799  26.401  19.105                
## 7   5.831  75.664   5.831  77.389 114.337 101.548        
## 8  17.720  87.207  17.720  89.157 125.000 112.872  12.166

Next merge will occur between the two clusters with the smallest nonzero distance.

min(as.dist(new.dist)[as.dist(new.dist) != 0])  # 5.657

## [1] 2

4. Merge (2,4): update distances to new cluster (2,4)

Now, we merge observations 2 and 4. Their distance to other points is determined by the maximum of pairwise distances.

new.dist2 <- new.dist
new.dist2[2,4] <- new.dist2[4,2] <- 0

# Update distances for cluster (2,4)
for (i in 1:8) {
  new.dist2[2,i] <- new.dist2[i,2] <- 
    max(old.dist[2,i], old.dist[4,i])
  new.dist2[4,i] <- new.dist2[i,4] <- 
    max(old.dist[2,i], old.dist[4,i])
}

# Check symmetry and new distances
all.equal(new.dist2, t(new.dist2))

## [1] TRUE

round(as.dist(new.dist2), 3)

##         1       2       3       4       5       6       7
## 2  71.589                                                
## 3   2.000  72.450                                        
## 4  71.589   5.657  72.450                                
## 5 109.877  43.863 109.877  43.863                        
## 6  96.799  26.401  96.799  26.401  19.105                
## 7   5.831  77.389   5.831  77.389 114.337 101.548        
## 8  17.720  89.157  17.720  89.157 125.000 112.872  12.166

# Identify next merge
min(as.dist(new.dist2)[as.dist(new.dist2) != 0])  # 5.831

## [1] 2

5. Merge (1,3,7): update distances

Next, the cluster (1,3) merges with observation 7. Distance updates again use the maximum rule.

new.dist3 <- new.dist2

# Merge cluster (1,3) with 7
new.dist3[1,7] <- new.dist3[7,1] <- new.dist3[3,7] <- new.dist3[7,3] <- 0

# Update distances using max rule
for (i in 1:8) {
  new.dist3[1,i] <- new.dist3[i,1] <-
    max(old.dist[1,i], old.dist[3,i], old.dist[7,i])
  new.dist3[3,i] <- new.dist3[i,3] <-
    max(old.dist[1,i], old.dist[3,i], old.dist[7,i])
  new.dist3[7,i] <- new.dist3[i,7] <-
    max(old.dist[1,i], old.dist[3,i], old.dist[7,i])
}

# Check matrix and display
all.equal(new.dist3, t(new.dist3))

## [1] TRUE

round(as.dist(new.dist3), 3)

##         1       2       3       4       5       6       7
## 2  75.664                                                
## 3   5.099  75.664                                        
## 4  77.389   5.657  77.389                                
## 5 114.337  43.863 114.337  43.863                        
## 6 101.548  26.401 101.548  26.401  19.105                
## 7   5.831  75.664   5.831  77.389 114.337 101.548        
## 8  17.720  89.157  17.720  89.157 125.000 112.872  17.720

# Find next merge candidate
min(as.dist(new.dist3)[as.dist(new.dist3) != 0])  # 17.72

## [1] 5.09902

6. Automated verification with `hclust()`

Rather than continuing manually, we verify the steps using R’s hclust() function.

hclust.complete <- hclust(dist(data), method = "complete")
hclust.complete$height

## [1]   2.000000   5.656854   5.830952  17.720045  19.104973  43.863424 125.000000

Output:

2.000000  5.656854  5.830952  17.720045  19.104973  43.863424 125.000000

These values correspond to the distances at which merges occurred, matching our manual calculations.

7. Dendrogram visualization

plot(hclust.complete, main = "Complete Linkage Dendrogram",
     xlab = "Observation", ylab = "Distance")

Complete linkage produces tighter, more compact clusters.

Summary

Complete linkage merges clusters based on maximum pairwise distance.
Produces compact and well-separated clusters.
Less sensitive to chaining than single linkage.
hclust(method = "complete") confirms manual iterations exactly.

Would you like me to provide this version as a downloadable .Rmd file (e.g., complete_linkage.Rmd)?

Complete Linkage Hierarchical Clustering