The talk is devoted to the data mining and data analysis for which the representation of variables has limits. There are currently a significant number of clustering methods, but the majority are not always adapted to the particularities of data (categorical, binary, mixed, sequences, stream …). Two families of unsupervised machine learning models are distinguished: probabilistic models and « deterministic » models. In the talk I will describe recent work on scalable clustering using a new paradigm and distributed framework (MapReduce…etc). In addition to the difficulties raised by these new paradigms, the development of current approaches requires to address different challenges, in particular: incremental processing of data stream and visualization.