Understanding how large neural networks avoid memorizing training data is key to explaining their high generalization performance.
In addition, we find that the emergence of linear separability in these manifolds is driven by a combined reduction of manifolds' radius, dimensionality and inter-manifold correlations.
Higher level concepts such as parts-of-speech and context dependence also emerge in the later layers of the network.
In this work we introduce a semi-supervised approach to the voice conversion problem, in which speech from a source speaker is converted into speech of a target speaker.
The success of deep neural networks in visual tasks have motivated recent theoretical and empirical work to understand how these networks operate.
Due to the use of a single encoder, our method can generalize to converting the voice of out-of-training speakers to speakers in the training dataset.
We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set.
We propose an algorithm to denoise speakers from a single microphone in the presence of non-stationary and dynamic noise.
Although the matrix determined by the output weights is dependent on a set of known speakers, we only use the input vectors during inference.