K-Means Pros & Cons

Pros & Cons

K-Means

Advantages

Programming Category (English)160x600

1- High Performance

K-Means algorithm has linear time complexity and it can be used with large datasets conveniently. With unlabeled big data K-Means offers many insights and benefits as an unsupervised clustering algorithm.

2- Easy to Use

K-Means is also easy to use. It can be initialized using default parameters in the Scikit-Learn implementation. According to this approach, parameters like number of clusters (8 by default), maximum iterations (300 by default), initial centroid initialization (10 by default) can easily be adjusted later on to suit the task goals.

3- Unlabeled Data

This one is a general unsupervised machine learning algorithm that also applies to K-Means.

If your data has no labels (class values or targets) or even column headers, K-Means will still successfully cluster your data. This is an example to machine learning harvesting data and extracting useful insights from data that can be totally useless to human eye.

Customer segmentation, scientific categorization, logistic optimization (identifying inventories or optimizing routes), user suggestions, patient management, trial management and fraud detection are just a few example use cases.

4- Result Interpretation

K-Means returns clusters which can be easily interpreted and even visualized. This simplicity makes it highly useful in some cases when you need a quick overview of the data segments.

Additionally, inertia values produced by K-Means algorithm can be meaningful to interpret as well. K-Means inertia sum of squared means for each point to their respective cluster center (centroid). Higher inertia values can be helpful to question cluster number or algorithm's inner workings such as initialization or maximum iteration.
You can read more details about K-Means settings in the following link:
Optimization of K-Means parameters.

machine learning

Holy Python is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Editor’s Pick

Special Offer up to 75% off

Udacity Data Science Course

You can now enroll in Udacity Data Science for Business Nanodegree program with 75% off and enable lucrative career opportunities.

5-star course rating
5/5

*Financial aid available

K-Means

Disadvantages

1- Result repeatability

One of the inconsistencies of K-Means algorithm is that results will differ based due to random centroid initialization.

Unless you pick the centroids at fixed positions, which is not a common practice K-Means can come up with different clusters after its iterations.

Also, if you introduce new data or change the order of existing dataset K-Means will likely produce different results. This feature makes K-Means a not-so-robust machine learning algorithm.

Luckily, in most cases, cluster differences won't be very major or unsuitable for the goals especially if you chose K-Means for clustering at first place.

2- Spherical Clustering Only

K-Means generates spherical clusters. So, if you have overlapping clusters or arbitrary shapes K-Means won't be able to cluster those.

3- Manual Work

It's not that much but K-Means might involve some manual labor. Most importantly, K-Means performs on a previously given cluster amount or number and this parameter is actually very significant. This means in most cases n_clusters will need to be optimized, adjusted and reassessed at least a few times.

Additionally, K-Means have other parameters that can be manually adjusted such as max_iter and init.

4- Clusters Everything

Another point about how K-Means works is that it will include every data sample in the clusters it generates. This means, if you would like to exclude outliers or certain sample groups it won't be possible with K-Means algorithm which creates spherical clusters that cover the whole dataset.

This can be a disadvantage in the situations mentioned above but it can also be an advantage if you are looking for a clustering algorithm to cover everything and not leave any sample out.

wrap-up

K-Means Pros & Cons Summary

Why K-Means?

K-Means is the most popular unsupervised algorithm for a few reasons.

It's easy to use, easy to interpret, computationally efficient and offers meaningful insights.

When data is unlabeled, a quick clustering implementation can offer many insights and other benefits which then can be used to steer the project in a more strategic and smart direction.

Given K-Means clustering is the suitable algorithm for your task, cons of K-Means are very tolerable and mostly involve centroid initialization which can be quickly mastered.

Easy Usage

For what it does K-Means is very easy to use. Yes you might have to decide a logical n_clusters value and then adjust it even. But still, K-Means is the work horse of the unlabeled data analysis task.

Fast

K-Means doesn't mind large datasets. It will work fast thanks to its efficient time complexity and it will also scale well compared to other clustering algorithms.

Many Use Cases

Emergence of K-Means algorithm in mid 20th century is a very welcome event especially since this algorithm matured throughout the following decades and accumulation of big data skyrocketed. K-Means can be used in pretty much every industry and profession to harvest data and achieve meaningful insights.
Suggested Read:
History of K-Means

Inertia

Inertia is a useful attribute of K-Means implementation as it will give hints about the health and optimization of the distance relation between centroids and samples.

Some parameter Work

K-Means might need its human to set the cluster amount or in some cases other parameters such as maximum iteration to function optimally.

Spherical

We also mentioned that K-Means only creates spherical clusters.

No Outliers

K-Means doesn't have an outlier concept. It will throw in everything in its clusters. Sometimes good sometimes not so good.