
Scale Determines Whether Language Models Organize Representation Geometry for Prediction
Researchers have identified a scale-dependent shift in how language models organize their internal geometry during training. Using a new metric called Subspace PGA, they found that smaller models (under 1B parameters) progressively abandon prediction-aligned representations in later layers even as training loss improves, while larger models maintain this alignment. This divergence suggests that model scale fundamentally changes how neural networks structure learned representations, with implications for interpretability work and our understanding of what drives scaling laws beyond raw performance metrics.62





















