Skip to navigation Skip to content

Local Theories of Diffusion Model Generalization in High Dimensions

Date and Time
-
Location
zoom
Speaker
Mason Kamb (Stanford)

Modern generative diffusion models are distinguished by their ability to generalize, consistently and robustly, in very high dimensional spaces. They produce a combinatorial explosion of novel images from a relatively small training set, subverting normal concerns about the curse of dimensionality. Yet, their generations also sometimes fall short, exhibiting distinctive flaws such as spatial inconsistency (e.g. excessive limbs). I will discuss an analytical theory that, making only the assumptions of A) locality and B) (broken) equivariance, explains 1) how models are able to generalize combinatorially, mixing and matching from different images in their training data, 2) why models are able to generalize consistently and robustly in high dimensional spaces, and 3) mechanistically explains the origins of spatial consistency issues such as the “excess limbs” phenomenon. This theory is totally solvable in terms of the training dataset, and we show that it is able to predict on a case-by-case basis the behavior of certain classes of weak diffusion models: 1) small convolutional neural networks, and 2) diffusion models early in their training process. I will then comment on what is still needed to further explain the mysteries of generalization in more powerful models.