Recently I joined an interdisciplinary study group called the 'TMI group', which consists of students from various backgrounds including AI, neuroscience, physics, education studies and philosophy. This group is one of the most enthusiastic study groups I have ever attended, providing actually productive discussions and critical issues.
Last week, I hosted a zoom seminar for this group titled 'Statistical-physics approaches to deep learning: dynamical and structural perspectives (DL x SP)'. I aimed to convey that deep learning can be understood as a 'non-linear many-body system with non-deterministic dynamics', therefore being a legitimate topic of SP.
Even within SP, there are many incommensurable views for the analysis of DL. Some are indeed useful but relatively phenomenological, while others fundamentally tackle the structural aspects of DNNs. Recently, I believe 'disorder' is the most central concept of the SP view of DL.
NNs are highly complex, since they are non-linear combinations of functions with vastly different weights. But they are distinguishable from completely random fields since they learn something and form good representations. They are indeed complex, but they are somehow 'structured' and (especially thanks to the fact that they involves super high-dimensional space) the situation is not so bad.
This subtle regime is effectively tackled by theories developed for disordered systems including replica method. These calculations have been successfully connected to actually important concepts in DL community, such as lazy (kernel) learning versus feature learning, flat minima, linear mode connectivity (for example, see B. L. Annesi et al., "Star-shaped space of solutions of the spherical negative perceptron," Physical Review Letters 131 (2023)).
In fact, this perspective traces back to the ancient era of deep learning. Even at the 1980s and 90s, statistical physicists have focused on analyzing the phase behavior during the learning of multi-layer NNs (including quite practical issues such as teacher-student scenario). The Nobel Prize awarded last year to J. J. Hopfield was a controversial topic among physicists. But one lesser-known fact is that when G. Parisi won the prize in 2021, the citation already referred to AI theory—specifically in relation to his theories on glassy disordered systems and its relevance to DL.
Of course, as one of the TMI group members pointed out, to make these more practically applicable, they must be further refined by scholars in statistical learning theory, and should be integrated with the fields of computational theory and optimization. For example, one may see 'dynamical mean-field theory' approaches by T. Suzuki group (U Tokyo) and how they rigorously explain feature learning and in-context learning.
Next month, I will host a focus review session introducing high-dimensional random geometry (which also employs SP methods for disordered systems) in the context of modern deep learning. It begins with a simple and general geometric problem—separating labeled points (or ellipsoids) with a hyperplane—but ultimately explains the surprising success of modern AI, including few-shot learning and in-context learning.
For the former, see B. Sorscher, S. Ganguli and H Sompolinsky, "Neural representation geometry underlies few-shot concept learning," Prog. Natl. Acad. Sci. 119 (2022). For the latter, see A. J. Wakhloo, T. J. Sussman and SueYeon Chung, "Linear classification of neural manifolds with correlated variability," Phys. Rev. Lett. 131 (2023) and A. Kirsanov, C. N. Chou, Kyunghyun Cho and SueYeon Chung, "The geometry of prompting: Unveiling distinct mechanisms of task adaptation in language models," arXiv:2502.08009 (2025).
On a lighter note, in a KIAS lecture series planned to held at early August this year, Prof. SueYeon Chung is invited as a lecturer. I am looking forward the lecture since it is first time for me to attend Prof. Chung's lecture in-person.