Thi Thanh Yen Nguyen (MAP5)

Optimal transport-based machine learning sheds light on Huntington’s disease

vendredi 31 janvier 2020, 13h30 - 14h30

Salle de réunion, espace Turing

In this paper, we present a novel method for learning a pattern of correspondence between two data sets which is belived to have a many-to-many mirrored relationship. The proposed method is to solve the optimization problem, that minimize a loss between the first data set and the tranformed second one over the space of linear function in order to obtain the transform function and the coupling matrix. In particular, we use an optimal transport (OT)-based loss called Sinkhorn loss in order to compute the distance between the weighted empirical measures defined on the data after the transform mapping. The element weight is evaluated based on the kernel density estimation method.
The coupling matrix is further co-clustered to yield good row and column partitionings. There are many co-clustering algorithm, however, we rely on the spectral co-clustering method which is more adapted with our data. Experimental results on synthetic datasets show the efficiency of our model in the context of high dimensional data with many noises. In consequence, the resulting matrix contains the noises, we propose an adapted strectral co-clustering to detect and remove the noise then continue co-clustering the remaining matrix to obtain the partitions. Based on the partitioning result, we can match the row and column clusters.

Keywords: Spectral co-clustering, optimal transport, Sinkhorn loss, matching problem, Sinkhorn algorithm.