Stanislas Morbieu

en fr

Accuracy: from classification to clustering evaluation

Posted on Tue 04 June 2019 in machine learning • Tagged with evaluation measure, clustering, Python • 4 min read

Accuracy is often used to measure the quality of a classification. It is also used for clustering. However, the scikit-learn accuracy_score function only provides a lower bound of accuracy for clustering. This blog post explains how accuracy should be computed for clustering.

Animate intermediate results of your algorithm

Posted on Tue 19 February 2019 in machine learning • Tagged with clustering, R, machine learning • 5 min read

The R package gganimate enables to animate plots. It is particularly interesting to visualize the intermediate results of an algorithm, to see how it converges towards the final results. The following illustrates this with K-means clustering.

Chaining effect in clustering

Posted on Mon 21 January 2019 in machine learning • Tagged with clustering, R, machine learning • 5 min read

How to detect Christmas tinsels on a tree? Let's understand why hierarchical clustering with single linkage is a good candidate.

How many red Christmas baubles on the tree?

Posted on Sat 05 January 2019 in machine learning • Tagged with clustering, R, machine learning • 6 min read

Christmas time is over. It is time to remove the Cristmas tree. But just before removing it, one can ask: How many red Christmas baubles are on the tree? Let's leverage k-means criterion to answer this question.

Gaussian mixture models: k-means on steroids

Posted on Sat 22 December 2018 in machine learning • Tagged with clustering, R, machine learning • 5 min read

The k-means algorithm assumes the data is generated by a mixture of Gaussians, each having the same proportion and variance, and no covariance. These assumptions can be alleviated with a more generic algorithm: the CEM algorithm applied on a mixture of Gaussians.

K-means is not all about sunshines and rainbows

Posted on Sun 09 December 2018 in machine learning • Tagged with clustering, R, machine learning • 6 min read

K-means is the most known and used clustering algorithm. It makes however strong assumptions on the data. These assumptions are illustrated through generated datasets. The criterion optimized by k-means is also explained to fully understand its behavior.

Generate datasets to understand some clustering algorithms behavior

Posted on Sun 11 November 2018 in machine learning • Tagged with clustering, R, machine learning • 7 min read

In order to understand how a clustering algorithm works, good sample datasets are useful to highlight its behavior under certain circumstances. This post shows how to generate 9 datasets which will be used in other posts of this series on clustering.