Keanu Sisouk
Wasserstein Barycenters of Persistence Diagrams and Applications
Topological Data Analysis (TDA) is a family of techniques developed to efficiently and robustly highlight implicit structural patterns in complex datasets. These techniques involve computing a topological descriptor for each element of a dataset by encoding its main topological features in a concise manner. A prominent example is the persistence diagram. However, even though they are concise representations, persistence diagrams can still require significant storage space and may be too complex to be analyzed easily. In this thesis, our goal is to develop an encoding method for ensembles of persistence diagrams while maintaining the same descriptive power. First, we develop a non-linear dictionary encoding for persistence diagrams. Then, we strengthen our approach by making it more robust to outliers within an ensemble of persistence diagrams by using robust barycenters. This dictionary-based approach involves computing Wasserstein distances, which are known to be computationally expensive depending on the size of the input diagrams. One way to address this problem is through Sliced Optimal Transport, more specifically the Sliced Wasserstein distance. We present applications of this work in data reduction to further compress an ensemble of persistence diagrams; in dimensionality reduction by creating a planar view that provides insight into the arrangement of the data; and in robustness to outliers in the context of a clustering problem.
