Maria Cherifa
Dynamics and learning in online allocation problems
The thesis analyzes how decision-making systems evolve, learn, and adapt in uncertain environments. It focuses on matching problems on bipartite graphs, which model real-time decision-making scenarios such as organ allocation, appointment scheduling, volunteer coordination, ride-sharing, and online advertising. Particular attention is given to the latter, where matchings between users and advertisers are carried out through large-scale automated auctions.
The work lies at the intersection of online matching and the multi-armed bandit problem. The first concerns sequential, irrevocable decisions, while the second captures the trade-off between exploration and exploitation in uncertain environments. Combining these perspectives provides a framework for understanding situations in which digital systems must learn and adapt simultaneously.
The first contribution introduces an online matching model with budget refills, inspired by advertising settings where advertisers’ resources can be replenished over time. Two settings—adversarial and stochastic—are analyzed, and theoretical bounds are established on the efficiency of standard algorithms, highlighting the impact of refill frequency and structure on competitiveness.
The second contribution studies online matching on graphs generated by a Stochastic Block Model, representing heterogeneous communities of users or advertisers. When compatibility probabilities are unknown, the problem becomes a bandit setting. An algorithm combining Explore-Then-Commit and Balance is proposed to estimate these probabilities and maximize expected matching size, with theoretical guarantees on convergence and competitiveness.
The final contribution addresses the optimization of collective gain in online advertising auctions through a structured bandit approach. Each decision selects a coalition of advertisers participating in a second-price auction. Leveraging the structure of the reward function, the thesis proposes greedy algorithms that balance exploration and exploitation. Regret analysis and numerical experiments demonstrate their efficiency and robustness.
Keywords:
online matching, bipartite graphs, random graphs, stochastic approximations, multi-arm bandit algorithms.
