no code implementations • 17 Mar 2024 • Claudio Pinhanez, Raul Fernandez, Marcelo Grave, Julio Nogima, Ron Hoory
Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use.
no code implementations • 20 Sep 2023 • Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory
Experimental results show that LLM2Speech maintains the teacher's quality while reducing the latency to enable natural conversations.
no code implementations • 1 Mar 2022 • Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory
Beyond that, a common engine should be capable of supporting distributed training with client in-house private data.
no code implementations • 21 Feb 2022 • Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury
The NNSI reduces the need for manual labeling by automatically selecting highly-ambiguous samples and labeling them with high accuracy.
no code implementations • ICASSP 2022 • Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz
Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation.
no code implementations • 2 Feb 2022 • Itai Gat, Hagai Aronowitz, Weizhong Zhu, Edmilson Morais, Ron Hoory
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases.
Ranked #1 on Speech Emotion Recognition on IEMOCAP (AUC metric)
1 code implementation • 8 Apr 2021 • Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Oct 2020 • Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny
Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system.
no code implementations • 30 Sep 2020 • Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras
For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts.
no code implementations • 28 Jul 2020 • Shai Rozenberg, Hagai Aronowitz, Ron Hoory
With the rise of voice-activated applications, the need for speaker recognition is rapidly increasing.
no code implementations • 2 May 2019 • Zvi Kons, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, Ron Hoory
We first demonstrate the ability of the system to produce high quality speech when trained on large, high quality datasets.
Audio and Speech Processing Sound