PROC KCLUS. How do you estimate the number of clusters in a dataset? In SAS, it's as easy as ABC.
The classic way of identifying clusters is to use the elbow method: plot the within-cluster sum of squares (WCSS), then identify the "elbow" where the rate of decrease in WCSS slows down. With PROC KCLUS, you can use the ABC method instead to help identify the optimal number of clusters.
The ABC method estimates the number of clusters for well-separated clusters. It uses within-cluster dispersion from the results of clustering as an error measure, making the ABC method independent of the method that is used for clustering.
In order to estimate the number of clusters, the ABC method compares the change in the error measure with the change that is expected under an appropriate reference null distribution.
How do you use it? Just say ABC! You also have plenty of options to work with, including:
🔘 Min Clusters
🔘 Max Clusters
🔘 Selection Criteria
🔘 Reference data to use
🔘 Reference data alignment
Next time you need to do k-means clustering, give ABC a try!
🗒️ Code:
https://github.com/stu-code/sas-tips/blob/main/proc_kclus.sas
👉 Documentation:
https://go.documentation.sas.com/doc/en/pgmsascdc/v_066/casstat/casstat_kclus_overview.htm
👉 How ABC works:
https://go.documentation.sas.com/doc/en/pgmsascdc/v_066/casstat/casstat_kclus_details05.htm