no code implementations • 10 Apr 2024 • Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng
CoVoMix is capable of first converting dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers.
no code implementations • 16 Nov 2021 • Midia Yousefi, John H. L. Hansen
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity.
no code implementations • 30 Oct 2021 • Midia Yousefi, John H. L. Hanse
The speaker conditioning process allows the acoustic model to perform computation in the context of target-speaker auxiliary information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Oct 2021 • Midia Yousefi, John H. L. Hansen
Most current speech technology systems are designed to operate well even in the presence of multiple active speakers.
no code implementations • 4 Aug 2019 • Midia Yousefi, Soheil Khorram, John H. L. Hansen
Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error.