no code implementations • EMNLP (Louhi) 2020 • Tarek Sakakini, Jong Yoon Lee, Aditya Duri, Renato F.L. Azevedo, Victor Sadauskas, Kuangxiao Gu, Suma Bhat, Dan Morrow, James Graumlich, Saqib Walayat, Mark Hasegawa-Johnson, Thomas Huang, Ann Willemsen-Dunlap, Donald Halpin
We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting.
1 code implementation • NAACL 2022 • John Harvill, Roxana Girju, Mark Hasegawa-Johnson
In this paper we focus on patterns of colexification (co-expressions of form-meaning mapping in the lexicon) as an aspect of lexical-semantic organization, and use them to build large scale synset graphs across BabelNet’s typologically diverse set of 499 world languages.
no code implementations • ACL 2022 • Liming Wang, Siyuan Feng, Mark Hasegawa-Johnson, Chang Yoo
Phonemes are defined by their relationship to words: changing a phoneme changes the word.
no code implementations • 25 Jan 2025 • Satwinder Singh, Qianli Wang, Zihan Zhong, Clarion Mendes, Mark Hasegawa-Johnson, Waleed Abdulla, Seyed Reza Shahamiri
In this paper, we present a speaker-independent dysarthric speech recognition system, with a focus on evaluating the recently released Speech Accessibility Project (SAP-1005) dataset, which includes speech data from individuals with Parkinson's disease (PD).
no code implementations • 21 Oct 2024 • Sandeep Nagar, Mark Hasegawa-Johnson, David G. Beiser, Narendra Ahuja
We then demonstrate the robustness of this ROI selection method when coupled to the Plane-Orthogonal-to-Skin (POS) rPPG method when applied to videos of patients presenting to an Emergency Department for respiratory complaints.
no code implementations • 29 Sep 2024 • Xiuwen Zheng, Bornali Phukon, Mark Hasegawa-Johnson
This paper enhances dysarthric and dysphonic speech recognition by fine-tuning pretrained automatic speech recognition (ASR) models on the 2023-10-05 data package of the Speech Accessibility Project (SAP), which contains the speech of 253 people with Parkinson's disease.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 7 Sep 2024 • Junkai Wu, Xulin Fan, Bo-Ru Lu, Xilin Jiang, Nima Mesgarani, Mark Hasegawa-Johnson, Mari Ostendorf
However, after carefully examining Gaokao's questions, we find the correct answers to many questions can be inferred from the conversation transcript alone, i. e.\ without speaker segmentation and identification.
no code implementations • 11 Aug 2024 • Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 25 Jun 2024 • Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam
Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa.
1 code implementation • 12 Jun 2024 • Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 21 Mar 2024 • Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo
Through a series of observations, we find that the prompt choice significantly affects the calibration in CLIP, where the prompts leading to higher text feature dispersion result in better-calibrated predictions.
no code implementations • 18 Mar 2024 • SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo
In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 10 Feb 2024 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
To understand why self-supervised learning (SSL) models have empirically achieved strong performances on several speech-processing downstream tasks, numerous studies have focused on analyzing the encoded information of the SSL layer representations in adult speech.
no code implementations • 30 Nov 2023 • Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou
We further extend our method to a novel image editing task: substituting the subject in an image through textual manipulations.
1 code implementation • 3 Oct 2023 • Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo
Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands.
no code implementations • 13 Sep 2023 • Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios
The assessment of children at risk of autism typically involves a clinician observing, taking notes, and rating children's behaviors.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 16 Aug 2023 • Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo
Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction.
1 code implementation • 9 Jun 2023 • Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo
Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 25 May 2023 • Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo
INTapt is trained simultaneously in the following two manners: (1) adversarial training to reduce accent feature dependence between the original input and the prompt-concatenated input and (2) training to minimize CTC loss for improving ASR performance to a prompt-concatenated input.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 21 May 2023 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
To perform automatic family audio analysis, past studies have collected recordings using phone, video, or audio-only recording devices like LENA, investigated supervised learning methods, and used or fine-tuned general-purpose embeddings learned from large pretrained models.
no code implementations • 14 Dec 2022 • Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang D. Yoo
To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word.
no code implementations • 9 Jul 2022 • Zhongweiyang Xu, Xulin Fan, Mark Hasegawa-Johnson
Most current research upsamples the visual features along the time dimension so that audio and video features are able to align in time.
1 code implementation • International Conference on Machine Learning 2022 • Haeyong Kang, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, Chang D. Yoo
Inspired by Lottery Ticket Hypothesis that competitive subnetworks exist within a dense network, we propose a continual learning method referred to as Winning SubNetworks (WSN), which sequentially learns and selects an optimal subnetwork for each task.
1 code implementation • 19 May 2022 • Wonjune Kang, Mark Hasegawa-Johnson, Deb Roy
Zero-shot voice conversion is becoming an increasingly popular research topic, as it promises the ability to transform speech to sound like any speaker.
1 code implementation • 20 Apr 2022 • Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.
1 code implementation • 7 Apr 2022 • Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing
Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e. g., a convolutional neural network incorporates translation equivariance.
1 code implementation • 29 Mar 2022 • Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson
We show that WavPrompt is a few-shot learner that can perform speech understanding tasks better than a naive text baseline.
1 code implementation • 29 Mar 2022 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain
We demonstrate that our high-quality visualizations capture major types of family vocalization interactions, in categories indicative of mental, behavioral, and developmental health, for both labeled and unlabeled LB audio.
1 code implementation • 29 Mar 2022 • Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson
An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 26 Mar 2022 • Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson
SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner.
1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak
In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
2 code implementations • 23 Sep 2021 • Junghyun Lee, Gwangsu Kim, Matt Olfat, Mark Hasegawa-Johnson, Chang D. Yoo
This paper defines fair principal component analysis (PCA) as minimizing the maximum mean discrepancy (MMD) between dimensionality-reduced conditional distributions of different protected classes.
1 code implementation • 16 Jun 2021 • Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson
In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.
no code implementations • NAACL 2021 • Kiran Ramnath, Leda Sari, Mark Hasegawa-Johnson, Chang Yoo
Three sub-tasks are proposed: (1) speech-to-text based, (2) end-to-end, without speech-to-text as an intermediate component, and (3) cross-lingual, in which the question is spoken in a language different from that in which the KG is recorded.
no code implementations • 31 Dec 2020 • Kiran Ramnath, Mark Hasegawa-Johnson
Therefore, being able to reason over incomplete KGs for QA is a critical requirement in real-world applications that has not been addressed extensively in the literature.
2 code implementations • 24 Nov 2020 • Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson
Beyond the model, we also propose a metric on how to evaluate source separation with variable number of speakers.
Ranked #5 on
Speech Separation
on WSJ0-4mix
1 code implementation • ICCV 2021 • Zhonghao Wang, Kai Wang, Mo Yu, JinJun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi
Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space.
Ranked #4 on
Visual Question Answering (VQA)
on CLEVR
1 code implementation • 23 Oct 2020 • Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg
This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes.
1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 8 Aug 2020 • Leda Sari, Mark Hasegawa-Johnson
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
no code implementations • 31 Jul 2020 • Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg
For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences.
1 code implementation • 22 May 2020 • Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari
In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately.
no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 May 2020 • Ali Abavisani, Mark Hasegawa-Johnson
In this article, we provide a model to estimate a real-valued measure of the intelligibility of individual speech segments.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
6 code implementations • ICML 2020 • Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson
Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.
1 code implementation • 15 Apr 2020 • Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore
Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.
no code implementations • 25 Sep 2019 • Hui Shi, Yang Zhang, Hao Wu, Shiyu Chang, Kaizhi Qian, Mark Hasegawa-Johnson, Jishen Zhao
Convolutional neural network (CNN) for time series data implicitly assumes that the data are uniformly sampled, whereas many event-based and multi-modal data are nonuniform or have heterogeneous sampling rates.
1 code implementation • 16 Sep 2019 • Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow
We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table.
11 code implementations • 14 May 2019 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson
On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.
no code implementations • 5 Nov 2018 • Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen
In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.
no code implementations • 15 May 2018 • Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen
Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur
Developing speech technologies for low-resource languages has become a very active research field over the last decade.
no code implementations • 15 Feb 2018 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson
On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.
no code implementations • 14 Feb 2018 • Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography.
no code implementations • 7 Feb 2018 • Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson
The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
2 code implementations • NeurIPS 2017 • Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang
To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.
Ranked #24 on
Sequential Image Classification
on Sequential MNIST
no code implementations • 13 Dec 2016 • Xiang Kong, Preethi Jyothi, Mark Hasegawa-Johnson
Mismatched transcriptions have been proposed as a mean to acquire probabilistic transcriptions from non-native speakers of a language. Prior work has demonstrated the value of these transcriptions by successfully adapting cross-lingual ASR systems for different tar-get languages.
no code implementations • WS 2016 • Wenda Chen, Mark Hasegawa-Johnson, Nancy Chen, Preethi Jyothi, Lav Varshney
We evaluate our techniques using mismatched transcriptions for Cantonese speech acquired from native English and Mandarin speakers.
no code implementations • 10 Nov 2016 • Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel
Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN).
7 code implementations • CVPR 2017 • Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do
In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditioning on the available data.
2 code implementations • 13 Feb 2015 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.
1 code implementation • ICASSP 2014 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis
In this paper, we study deep learning for monaural speech separation.
no code implementations • LREC 2014 • Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi
In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model adaptation is applied.
no code implementations • LREC 2014 • Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi
A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources.