{"id":1237,"date":"2024-06-11T14:03:00","date_gmt":"2024-06-11T12:03:00","guid":{"rendered":"https:\/\/www.unioviedo.es\/fgs2024\/?page_id=1237"},"modified":"2024-06-11T14:18:03","modified_gmt":"2024-06-11T12:18:03","slug":"ms12-theoretical-approaches-to-modern-machine-learning-methods","status":"publish","type":"page","link":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/scientific-program\/ms12-theoretical-approaches-to-modern-machine-learning-methods\/","title":{"rendered":"MS12: Theoretical Approaches to Modern Machine Learning Methods"},"content":{"rendered":"\nThis mini-symposium proposal targets an audience that spans both theoretical researchers interested\nin the open questions posed by modern machine learning developments and practitioners\nof machine learning keen on understanding the mathematical modeling behind the tools they\nuse. We have assembled four promising young researchers who possess a strong background in\ntheoretical mathematics and are actively involved in the field of machine learning. Specifically,\nthe presentations will explore the analysis of transformers, neural ordinary differential equations\n(Neural ODEs), and diffusion models, offering deep insights into these cutting-edge areas.\nThe burgeoning intersection of machine learning with more mature mathematical domains\nsuch as optimal control, differential calculus, optimal transport, and mean field theory has paved\npromising pathways for the advancement of both computational and theoretical methods within\nthe field. Three significant examples underscore the impact of such analysis: Firstly, the recent\nbreakthrough proofs of the convergence properties of specific neural network architectures\n(featuring a single hidden layer in the asymptotic regime of infinite width) have been demonstrated\nconcurrently by several research groups. These groups employed diverse approaches,\nincluding gradient flows in the infinite-dimensional Wasserstein space of probability measures\n[3], partial differential equations with vanishing viscosities [6], and mean-field optimization analysis\n[5]. Secondly, the introduction and analysis of Neural ODEs [2], viewed as the continuous\ncounterpart of Residual Networks, have been grounded in the theory of differential calculus and\nclassical numerical analysis. This has led to the development of promising architectures [1] and\noffered new insights into the behaviors of deep neural networks [7]. Lastly, diffusion models,\nwhich fundamentally rely on stochastic differential equations, have achieved remarkable success\nin generating high-fidelity samples [4], showcasing the potential of integrating advanced mathematical\nconcepts into machine learning frameworks. These approaches did not only clarify the\noperational mechanisms of current models but also opens the door to new theoretical problems,\nlaying the groundwork for future breakthroughs in machine learning research.\nThese research areas are prominently featured in current literature due to their success, yet\nthey remain in their nascent stages, leaving a plethora of open questions. On the other hand,\nthe rapid pace of development in new machine learning methods means that the volume of\ntheoretical questions is expanding just as quickly. This situation necessitates a considerable effort\nfrom the theoretical mathematics community to achieve significant breakthroughs. Conversely,\nthis dynamic also fosters the emergence of new and interesting theoretical questions, offering a\nrich ground for further exploration and discovery. This interplay between rapid technological\nadvancement and the foundational theoretical challenges it presents underscores a vibrant cycle\nof innovation and inquiry within the field.\n<br>\nReferences<br>\n[1] Bo Chang, Lili Meng, Eldad Haber, Lars Ruthotto, David Begert, and Elliot Holtham.\nReversible architectures for arbitrarily deep residual neural networks. In Proceedings of the\nAAAI conference on artificial intelligence, volume 32, 2018.\n<br>\n[2] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary\ndifferential equations. Advances in neural information processing systems, 31, 2018.\n<br>\n[3] L\u00b4ena\u00a8\u0131c Chizat and Francis Bach. On the global convergence of gradient descent for overparameterized\nmodels using optimal transport. In S. Bengio, H. Wallach, H. Larochelle,\nK. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information\nProcessing Systems 31, pages 3036\u20133046. Curran Associates, Inc., 2018.\n<br>\n[4] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.\nAdvances in Neural Information Processing Systems, 34, 2021.\n<br>\n[5] Kaitong Hu, Zhenjie Ren, David Siska, and Lukasz Szpruch. Mean-field langevin dynamics\nand energy landscape of neural networks, 2019.\n<br>\n[6] Song Mei, Theodor Misiakiewicz, and Andrea Montanari. Mean-field theory of two-layers\nneural networks: dimension-free bounds and kernel limit, 2019.\n<br>\n[7] Michael Sander, Pierre Ablin, and Gabriel Peyr\u00b4e. Do residual neural networks discretize\nneural ordinary differential equations? Advances in Neural Information Processing Systems,\n35:36520\u201336532, 2022.<br>\n\n<strong>Mini symposium organizers:<\/strong><br>\nRiccardo Bonalli (Laboratoire des Signaux et Syst\u00e8mes, Universit\u00e9 Paris-Saclay)<br>\nZiad Kobeissi (Laboratoire des Signaux et Syst\u00e8mes, Universit\u00e9 Paris-Saclay)<br>\nMathieu Lauri\u00e8re (NYU Shangai)<br>\n<\/p>\n<p>\n<strong>Session 1. Room A6, Tuesday 18:00-20:00.<\/strong><br>\n <strong>Chair:<\/strong> Ziad Kobeissi (Laboratoire des Signaux et Syst\u00e8mes, Universit\u00e9 Paris-Saclay)<\/strong><br>\n<strong>Speakers:<\/strong><br>\nBeno\u00eet Bonnet-Weill (Laboratoire des Signaux et Syst\u00e8mes, Universit\u00e9 Paris-Saclay) <em>A Meanfield Control Perspective on the Training of NeurODEs<\/em><br>\nMichael E. Sander (Ecole Normale Sup\u00e9rieure, CNRS) <em>Residual Networks: from Infinitely Deep Neural ODEs to Transformers in Actions<\/em><br>\nBerjan Geshkovski (Inria, Sorbonne Universit\u00e9) <em>On the emergence of clusters in Transformers<\/em><br>\nPierre Marion (Institute of Mathematics, EPFL, Lausanne) <em>Implicit Diffusion: Efficient Optimization through Stochastic Sampling<\/em><br>\n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This mini-symposium proposal targets an audience that spans both theoretical researchers interested in the open questions posed by modern machine&hellip; <\/p>\n","protected":false},"author":3,"featured_media":0,"parent":618,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1237","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/pages\/1237","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/comments?post=1237"}],"version-history":[{"count":3,"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/pages\/1237\/revisions"}],"predecessor-version":[{"id":1246,"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/pages\/1237\/revisions\/1246"}],"up":[{"embeddable":true,"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/pages\/618"}],"wp:attachment":[{"href":"https:\/\/www.unioviedo.es\/fgs2024\/index.php\/wp-json\/wp\/v2\/media?parent=1237"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}