Alberto Suárez (Escuela Politécnica Superior, Universidad Autónoma de Madrid)
Machine learning with functional data: near-perfect classification
Functional data consists of observations that depend on a continuous parameter, such as curves, surfaces, and volumes. These types of data appear in numerous areas of application, such as medicine (for example, diagnosis of heart conditions from electrocardiograms), the environment (e.g., characterization of meteorological patterns), or economics (e.g., prediction of economic indicators). Due to their continuous nature, the statistical analysis of functional data presents specific challenges. In the general case, they have infinite dimensions. Because of this characteristic, some quantities, such as the probability density or the likelihood function are ill-defined. Therefore, standard methods of statistical inference and machine learning, most of which are based on multivariate statistics, must be adapted, extended, or reformulated to deal with these types of data. In this work, we present different methods to derive optimal classification rules for binary classification problems in which the trajectories are sampled from two different Gaussian processes (GPs), depending on the class. These methods rely on discretization, spectral analysis of the covariance functions of the GPs, or the theory of reproducing kernel Hilbert spaces. Especial attention is paid to classification problems in which the GPs are orthogonal. In such cases, the classification rules involve singular terms, and zero Bayes error is obtained asymptotically.