Last week, I was at the XXVth Meeting of the Société Francophone de Classification, both as a participant and a member of the steering committee.
Such an event was beneficial in several ways:
- It made me discover several other fields of applications and other methods.
- It was an occasion to meet people working in the same field, talk about oportunities to work together, and go deeper into the motivations and advantages of different approaches.
- It made me improve my organisational skills and reactions to unexpected problems.
- It was an additional experience to talk in front of a large public and present my work.
The focus of the conference was on the french term classification which means in english both clustering if unsupervised, and classification if supervised.
Works on the well known clustering and classification fields were presented, applied for instance to:
- users or consumers
- breast cancer detection
- viral sequences
- semantic web data
- association rules
- spatial data
- temporal data
The variety of data leads to several methods which should take care of different difficulties. For instance, when clustering data evolving with the time, you should keep track of the cluster associated to a piece of data at some point in order to keep a coherence of the clusters at another date. When clustering data having a spatial component, you should consider a trade off between the spatial coherence of the clusters and the coherence with the notion of similarity given by the other attributes. When dealing with medical data, you often should consider data augmentation since not enough data is available for an effective clustering of pictures. Sparsity issues happen with user items matrices, veracity issues and incohences happen in semantic web data, etc.
Classification and clustering were applied to broader tasks, such as:
- planning of itineraries for electric vehicles
- prediction of road traffic
- calculating an insufficient heart risk score
- generating smart shopping lists
- analysing how people classify things
- determining causality networks
For instance, clustering was used as a preliminary step to plan itineraries for electric vehicles: Since the planning algorithm is resource-hungry, reducing the number of nodes enables its use for large geographical areas.
Other works were on associated fields such as:
- distances between communities
- ensemble learning
- networks of data tables
Some works were on extensions of clustering such as:
- biclustering and coclustering, i.e. clustering simultaneously the rows and the columns of a matrix
- clusterwise, i.e. combining clustering with a local regression
For instance, I presented my work on a Python package for coclustering and a web application providing an interactive interface to use it.
My presentation led to some interesting questions and discussions about the applicability of the framework to other fields and also how other methods can benefit from visualizations. Visualization is indeed a good way to explore how a method behaves with data at hand. It simplifies the experiments and also the evaluation of the benefits of a method. It provides an entrance for students studying the framework since they can see practical applications of the implemented algorithms.
I consider meeting people as the most important part of being at a conference: you benefit from the experience, ideas and criticism (in a positive sense) of others. Discussions make explicit what is implicitly written or even sometimes omitted in papers. It offers the opportunity to talk about some problems that are often eluded by the community.
Being a member of the steering committee provided me an additional visibility, leading me to talk to very different people: working in a near field or a very different one, people with several years of experience or Phd students, from France or abroad (Canada, Algeria...).
Organising an event
Organising an event is not as easy as it might look:
- First, you should prepare things in advance: booking rooms, print and display posters with indications (taking into consideration the path you want people to follow, and also alternative ways since people sometimes don't like to follow indications and are free to do so), deliver the list of attending people to the security guards, make the badges, evaluate how many people will come to reserve food, etc.
- Second, you are responsible of the schedule and should handle things if something goes wrong. In this case, you should make it transparent for the people attending the event, so they don't notice it, or at least are not disturbed. Everything should go smoothly.
The sources of unexpected, or expected, problems are numerous. For instance, the server running the web site crashed just before the begining of the conference, early at the morning and many people rely on it to see where precisely the event took place. You should therefore think of what can go wrong and how to deal with it. You should always think that everything can go wrong, even the most improbable can happen.
Here is another example of annoyment for an organiser: some people talk more than the time planned on the schedule but you prefer to let interesting discussions continue. It is thus important to provide enough breaks to face delays.