The paradox of over-parameterization in learning algorithm
DOI:
https://doi.org/10.56947/amcs.v34.824Keywords:
Algorithmic Stability, Over-parameterization, Generalization GapAbstract
Traditional learning theory fails to explain why over-parameterized networks generalize. We propose a physical synthesis treating learning as a dissipative process. By integrating Differential Geometry, Dynamical Systems, and Algorithmic Stability, we show generalization depends on dynamical stability rather than architecture. Utilizing the Polyak-Lojasiewicz condition and Restricted Strong Convexity, we establish that gradient flow acts as a contractive operator, preventing trajectory divergence. Finally, McDiarmid’s inequalities convert this contraction into rigorous probabilistic bounds. Our framework proves that over-parameterization facilitates geometric properties that inherently regularize the learning process, ensuring robust generalization despite high model capacity.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Annals of Mathematics and Computer Science

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.