Research



Spin Glass Model of In-Context Learning

Abstract: Large language models show a surprising in-context learning ability - being able to use a prompt to form a prediction for a query, yet without additional training, in stark contrast to old-fashioned supervised learning. Providing a mechanistic interpretation and linking the empirical phenomenon to physics are thus challenging and remain unsolved. We study a simple yet expressive transformer with linear attention, and map this structure to a spin glass model with real-valued spins, where the couplings and fields explain the intrinsic disorder in data. The spin glass model explains how the weight parameters interact with each other during pre-training, and most importantly why an unseen function can be predicted by providing only a prompt yet without training. Our theory reveals that for single instance learning, increasing the task diversity leads to the emergence of the in-context learning, by allowing the Boltzmann distribution to converge to a unique correct solution of weight parameters. Therefore the pre-trained transformer displays a prediction power in a novel prompt setting. The proposed spin glass model thus establishes a foundation to understand the empirical success of large language models.

The preprint can be found on arXiv.


Generalization Error of Perceptron With Continuous Weight in Binary Classification Task

Abstract: In the teacher-student model and Bayesian optimal framework, we analyze the generalization error of perceptrons with continuous weights in binary classification tasks. Firstly, we derive the generalized approximate message passing (GAMP) equation based on the belief propagation (BP) equation, and get the generalization errors of different data densities through iteration. Then, the state evolution (SE) equation is derived based on the messaging equation, and the theoretical curve of generalization error varying with data density is obtained through iteration, which is consistent with the results obtained by using the generalized messaging equation. Finally, we use the replica method to calculate and analyze, verify the results of the state evolution equation, and analyze the asymptotic behavior of generalization error when the data density tends to infinity.

This project was awarded excellent in the final defense of College Students' Innovative Entrepreneurial Training Plan Program in 2023 in the School of Physics of Sun Yat-sen University.

A pdf version of the report can be downloaded here, the version of the ppt presented at the group meeting can be downloaded here, and the ppt version of the CSIETPP conclusion defense report can be downloaded here.