Claire Boyer (IMO, Université Paris Saclay)
Single location regression and attention models
Single location regression and attention models
We will begin by defining a new regression task that can easily be illustrated in a natural language processing (NLP) context, for example to analyse sentiment in texts. We will propose a predictor to solve this task which can be interpreted as a very simplified architecture of transform (architecture corresponding to the T in ChatGPT). We will discuss some of its asymptotic statistical properties, and we will show that we can learn the optimal parameters of the problem by projected gradient descent, despite the non-convexity of the problem.