On Feature Importance and Interpretability of Speaker Representations

19 Oct 2023 · Frederik Rautenberg, Michael Kuhlmann, Jana Wiechmann, Fritz Seebauer, Petra Wagner, Reinhold Haeb-Umbach ·

Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker's voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.

PDF Abstract