This mini-symposium proposal targets an audience that spans both theoretical researchers interested
in the open questions posed by modern machine learning developments and practitioners
of machine learning keen on understanding the mathematical modeling behind the tools they
use. We have assembled four promising young researchers who possess a strong background in
theoretical mathematics and are actively involved in the field of machine learning. Specifically,
the presentations will explore the analysis of transformers, neural ordinary differential equations
(Neural ODEs), and diffusion models, offering deep insights into these cutting-edge areas.
The burgeoning intersection of machine learning with more mature mathematical domains
such as optimal control, differential calculus, optimal transport, and mean field theory has paved
promising pathways for the advancement of both computational and theoretical methods within
the field. Three significant examples underscore the impact of such analysis: Firstly, the recent
breakthrough proofs of the convergence properties of specific neural network architectures
(featuring a single hidden layer in the asymptotic regime of infinite width) have been demonstrated
concurrently by several research groups. These groups employed diverse approaches,
including gradient flows in the infinite-dimensional Wasserstein space of probability measures
[3], partial differential equations with vanishing viscosities [6], and mean-field optimization analysis
[5]. Secondly, the introduction and analysis of Neural ODEs [2], viewed as the continuous
counterpart of Residual Networks, have been grounded in the theory of differential calculus and
classical numerical analysis. This has led to the development of promising architectures [1] and
offered new insights into the behaviors of deep neural networks [7]. Lastly, diffusion models,
which fundamentally rely on stochastic differential equations, have achieved remarkable success
in generating high-fidelity samples [4], showcasing the potential of integrating advanced mathematical
concepts into machine learning frameworks. These approaches did not only clarify the
operational mechanisms of current models but also opens the door to new theoretical problems,
laying the groundwork for future breakthroughs in machine learning research.
These research areas are prominently featured in current literature due to their success, yet
they remain in their nascent stages, leaving a plethora of open questions. On the other hand,
the rapid pace of development in new machine learning methods means that the volume of
theoretical questions is expanding just as quickly. This situation necessitates a considerable effort
from the theoretical mathematics community to achieve significant breakthroughs. Conversely,
this dynamic also fosters the emergence of new and interesting theoretical questions, offering a
rich ground for further exploration and discovery. This interplay between rapid technological
advancement and the foundational theoretical challenges it presents underscores a vibrant cycle
of innovation and inquiry within the field.
References
[1] Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham.
Reversible architectures for arbitrarily deep residual neural networks. In Proceedings of the
AAAI conference on artificial intelligence, volume 32, 2018.
[2] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary
differential equations. Advances in neural information processing systems, 31, 2018.
[3] L´ena¨ıc Chizat and Francis Bach. On the global convergence of gradient descent for overparameterized
models using optimal transport. In S. Bengio, H. Wallach, H. Larochelle,
K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information
Processing Systems 31, pages 3036–3046. Curran Associates, Inc., 2018.
[4] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.
Advances in Neural Information Processing Systems, 34, 2021.
[5] Kaitong Hu, Zhenjie Ren, David Siska, and Lukasz Szpruch. Mean-field langevin dynamics
and energy landscape of neural networks, 2019.
[6] Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Mean-field theory of two-layers
neural networks: dimension-free bounds and kernel limit, 2019.
[7] Michael Sander, Pierre Ablin, and Gabriel Peyr´e. Do residual neural networks discretize
neural ordinary differential equations? Advances in Neural Information Processing Systems,
35:36520–36532, 2022.
Mini symposium organizers:
Riccardo Bonalli (Laboratoire des Signaux et Systèmes, Université Paris-Saclay)
Ziad Kobeissi (Laboratoire des Signaux et Systèmes, Université Paris-Saclay)
Mathieu Laurière (NYU Shangai)
Session 1. Room A6, Tuesday 18:00-20:00.
Chair: Ziad Kobeissi (Laboratoire des Signaux et Systèmes, Université Paris-Saclay)
Speakers:
Benoît Bonnet-Weill (Laboratoire des Signaux et Systèmes, Université Paris-Saclay) A Meanfield Control Perspective on the Training of NeurODEs
Michael E. Sander (Ecole Normale Supérieure, CNRS) Residual Networks: from Infinitely Deep Neural ODEs to Transformers in Actions
Berjan Geshkovski (Inria, Sorbonne Université) On the emergence of clusters in Transformers
Pierre Marion (Institute of Mathematics, EPFL, Lausanne) Implicit Diffusion: Efficient Optimization through Stochastic Sampling