Probing emergent geometry in speech models via replica theory

The success of deep neural networks in visual tasks have motivated recent theoretical and empirical work to understand how these networks operate. Meanwhile, deep neural networks have also achieved impressive performance in audio processing applications, both as sub-components of larger systems or as complete end-to-end systems. In this work, we employ a recently developed statistical mechanical theory that connects geometric properties of network representations with class separability to probe how information is untangled within neural networks trained to recognize speech. We find that speech recognition models carry out significant layerwise and temporal untangling of words by efficiently extracting task-relevant features. This untangling results from a decrease in the per-class radius and dimension, and a reduction in the correlation between class centers.

PDF Abstract


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here