AI Paris 2019 in one picture

Posted on Mon 17 June 2019 in Meeting • Tagged with Python • 3 min read

This week, I was at the AI Paris 2019 event to represent Kernix. We had great talks with so many people, and I barely had time to go around to look what other companies were working on. That is why I look at this afterwards. Can we have a big picture of the event without being there?


Continue reading

Accuracy: from classification to clustering evaluation

Posted on Tue 04 June 2019 in machine learning • Tagged with evaluation measure, clustering, Python • 4 min read

Accuracy is often used to measure the quality of a classification. It is also used for clustering. However, the scikit-learn accuracy_score function only provides a lower bound of accuracy for clustering. This blog post explains how accuracy should be computed for clustering.


Continue reading

Animate intermediate results of your algorithm

Posted on Tue 19 February 2019 in machine learning • Tagged with clustering, R, machine learning • 5 min read

The R package gganimate enables to animate plots. It is particularly interesting to visualize the intermediate results of an algorithm, to see how it converges towards the final results. The following illustrates this with K-means clustering.


Continue reading

Dense matrices implementation in Python

Posted on Mon 04 February 2019 in code • Tagged with Python • 8 min read

Machine learning algorithms often use matrices to store data and compute operations such as multiplications or singular value decomposition. The purpose of this article is to see how matrices are implemented in Python: how the data is stored and how much memory it consumes.


Continue reading

Chaining effect in clustering

Posted on Mon 21 January 2019 in machine learning • Tagged with clustering, R, machine learning • 5 min read

How to detect Christmas tinsels on a tree? Let's understand why hierarchical clustering with single linkage is a good candidate.


Continue reading

How many red Christmas baubles on the tree?

Posted on Sat 05 January 2019 in machine learning • Tagged with clustering, R, machine learning • 6 min read

Christmas time is over. It is time to remove the Cristmas tree. But just before removing it, one can ask: How many red Christmas baubles are on the tree? Let's leverage k-means criterion to answer this question.


Continue reading

Gaussian mixture models: k-means on steroids

Posted on Sat 22 December 2018 in machine learning • Tagged with clustering, R, machine learning • 5 min read

The k-means algorithm assumes the data is generated by a mixture of Gaussians, each having the same proportion and variance, and no covariance. These assumptions can be alleviated with a more generic algorithm: the CEM algorithm applied on a mixture of Gaussians.


Continue reading

K-means is not all about sunshines and rainbows

Posted on Sun 09 December 2018 in machine learning • Tagged with clustering, R, machine learning • 6 min read

K-means is the most known and used clustering algorithm. It makes however strong assumptions on the data. These assumptions are illustrated through generated datasets. The criterion optimized by k-means is also explained to fully understand its behavior.


Continue reading

XebiCon 2018

Posted on Sun 25 November 2018 in Meeting • 4 min read

I attended XebiCon'18, a conference on data engineering organised by Xebia. Here are some highlights of the talks.


Continue reading

Generate datasets to understand some clustering algorithms behavior

Posted on Sun 11 November 2018 in machine learning • Tagged with clustering, R, machine learning • 7 min read

In order to understand how a clustering algorithm works, good sample datasets are useful to highlight its behavior under certain circumstances. This post shows how to generate 9 datasets which will be used in other posts of this series on clustering.


Continue reading