Search Results for author: Mark Hasegawa-Johnson

Found 59 papers, 26 papers with code

Detection of Acoustic-Phonetic Landmarks in Mismatched Conditions using a Biomimetic Model of Human Auditory Processing

no code implementations • COLING 2012 • Sarah King, Mark Hasegawa-Johnson

Paper
Add Code

Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech

no code implementations • LREC 2014 • Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi

In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model adaptation is applied.

Language Modelling speech-recognition +1

Paper
Add Code

Development of a TV Broadcasts Speech Recognition System for Qatari Arabic

no code implementations • LREC 2014 • Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi

A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources.

Arabic Speech Recognition Language Modelling +3

Paper
Add Code

Deep learning for monaural speech separation

1 code implementation • ICASSP 2014 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

In this paper, we study deep learning for monaural speech separation.

Multi-Speaker Source Separation Speech Separation

364

Paper
Code

A PAC-Bayesian Approach to Minimum Perplexity Language Modeling

no code implementations • COLING 2014 • Sujeeth Bharadwaj, Mark Hasegawa-Johnson

Language Modelling Machine Translation +1

Paper
Add Code

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

2 code implementations • 13 Feb 2015 • Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis

In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.

Denoising Speech Denoising +1

151

Paper
Code

Semantic Image Inpainting with Deep Generative Models

7 code implementations • CVPR 2017 • Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do

In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditioning on the available data.

Image Inpainting

1,308

Paper
Code

Landmark-based consonant voicing detection on multilingual corpora

no code implementations • 10 Nov 2016 • Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel

Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN).

Paper
Add Code

Clustering-based Phonetic Projection in Mismatched Crowdsourcing Channels for Low-resourced ASR

no code implementations • WS 2016 • Wenda Chen, Mark Hasegawa-Johnson, Nancy Chen, Preethi Jyothi, Lav Varshney

We evaluate our techniques using mismatched transcriptions for Cantonese speech acquired from native English and Mandarin speakers.

Clustering

Paper
Add Code

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints

no code implementations • 13 Dec 2016 • Xiang Kong, Preethi Jyothi, Mark Hasegawa-Johnson

Mismatched transcriptions have been proposed as a mean to acquire probabilistic transcriptions from non-native speakers of a language. Prior work has demonstrated the value of these transcriptions by successfully adapting cross-lingual ASR systems for different tar-get languages.

Cross-Lingual ASR TAR

Paper
Add Code

Dilated Recurrent Neural Networks

2 code implementations • NeurIPS 2017 • Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.

Ranked #24 on Sequential Image Classification on Sequential MNIST

Sequential Image Classification

342

Paper
Code

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

no code implementations • 7 Feb 2018 • Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson

The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

no code implementations • 14 Feb 2018 • Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography.

Paper
Add Code

Deep Learning Based Speech Beamforming

no code implementations • 15 Feb 2018 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson

On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.

Speech Enhancement

Paper
Add Code

Bayesian Models for Unit Discovery on a Very Low Resource Language

no code implementations • 16 Feb 2018 • Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur

Developing speech technologies for low-resource languages has become a very active research field over the last decade.

Acoustic Unit Discovery Segmentation

Paper
Add Code

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

no code implementations • 15 May 2018 • Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

When CTC Training Meets Acoustic Landmarks

no code implementations • 5 Nov 2018 • Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.

Automatic Speech Recognition (ASR)

Paper
Add Code

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

11 code implementations • 14 May 2019 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson

On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.

Style Transfer Voice Conversion

962

Paper
Code

Fast transcription of speech in low-resource languages

1 code implementation • 16 Sep 2019 • Mark Hasegawa-Johnson, Camille Goudeseune, Gina-Anne Levow

We present software that, in only a few hours, transcribes forty hours of recorded speech in a surprise language, using only a few tens of megabytes of noisy text in that language, and a zero-resource grapheme to phoneme (G2P) table.

Language Modelling

Paper
Code

Continuous Convolutional Neural Network forNonuniform Time Series

no code implementations • 25 Sep 2019 • Hui Shi, Yang Zhang, Hao Wu, Shiyu Chang, Kaizhi Qian, Mark Hasegawa-Johnson, Jishen Zhao

Convolutional neural network (CNN) for time series data implicitly assumes that the data are uniformly sampled, whereas many event-based and multi-modal data are nonuniform or have heterogeneous sampling rates.

Time Series Time Series Analysis

Paper
Add Code

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

1 code implementation • 15 Apr 2020 • Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.

Style Transfer Voice Conversion

Paper
Code

Unsupervised Speech Decomposition via Triple Information Bottleneck

6 code implementations • ICML 2020 • Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson

Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.

Style Transfer Voice Conversion

962

Paper
Code

Automatic Estimation of Intelligibility Measure for Consonants in Speech

no code implementations • 12 May 2020 • Ali Abavisani, Mark Hasegawa-Johnson

In this article, we provide a model to estimate a real-valued measure of the intelligibility of individual speech segments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Identify Speakers in Cocktail Parties with End-to-End Attention

1 code implementation • 22 May 2020 • Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari

In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately.

Speaker Identification Speech Separation

Paper
Code

Evaluating Automatically Generated Phoneme Captions for Images

no code implementations • 31 Jul 2020 • Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg

For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences.

Image Captioning

Paper
Add Code

Deep F-measure Maximization for End-to-End Speech Understanding

no code implementations • 8 Aug 2020 • Leda Sari, Mark Hasegawa-Johnson

We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.

Fairness Intent Detection +1

Paper
Add Code

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Show and Speak: Directly Synthesize Spoken Description of Images

1 code implementation • 23 Oct 2020 • Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes.

Paper
Code

Interpretable Visual Reasoning via Induced Symbolic Space

1 code implementation • ICCV 2021 • Zhonghao Wang, Kai Wang, Mo Yu, JinJun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi

Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space.

Ranked #3 on Visual Question Answering (VQA) on CLEVR

Visual Question Answering (VQA) Visual Reasoning

Paper
Code

Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

no code implementations • 24 Nov 2020 • Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson

Beyond the model, we also propose a metric on how to evaluate source separation with variable number of speakers.

Ranked #5 on Speech Separation on WSJ0-4mix

Speech Separation Vocal Bursts Intensity Prediction

Paper
Add Code

Seeing is Knowing! Fact-based Visual Question Answering using Knowledge Graph Embeddings

no code implementations • 31 Dec 2020 • Kiran Ramnath, Mark Hasegawa-Johnson

Therefore, being able to reason over incomplete KGs for QA is a critical requirement in real-world applications that has not been addressed extensively in the literature.

Common Sense Reasoning Knowledge Graph Embeddings +4

Paper
Add Code

Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering

no code implementations • NAACL 2021 • Kiran Ramnath, Leda Sari, Mark Hasegawa-Johnson, Chang Yoo

Three sub-tasks are proposed: (1) speech-to-text based, (2) end-to-end, without speech-to-text as an intermediate component, and (3) cross-lingual, in which the question is spoken in a language different from that in which the KG is recorded.

Knowledge Graphs Question Answering +2

Paper
Add Code

Global Rhythm Style Transfer Without Text Transcriptions

1 code implementation • 16 Jun 2021 • Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.

Representation Learning Style Transfer

249

Paper
Code

Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel Manifold

2 code implementations • 23 Sep 2021 • Junghyun Lee, Gwangsu Kim, Matt Olfat, Mark Hasegawa-Johnson, Chang D. Yoo

This paper defines fair principal component analysis (PCA) as minimizing the maximum mean discrepancy (MMD) between dimensionality-reduced conditional distributions of different protected classes.

Fairness

Paper
Code

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks

1 code implementation • 26 Mar 2022 • Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson

SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner.

Disentanglement Voice Conversion

122

Paper
Code

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

1 code implementation • 29 Mar 2022 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

We demonstrate that our high-quality visualizations capture major types of family vocalization interactions, in categories indicative of mental, behavioral, and developmental health, for both labeled and unlabeled LB audio.

speaker-diarization Speaker Diarization

Paper
Code

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

1 code implementation • 29 Mar 2022 • Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

We show that WavPrompt is a few-shot learner that can perform speech understanding tasks better than a naive text baseline.

Few-Shot Learning Language Modelling +1

Paper
Code

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

1 code implementation • 29 Mar 2022 • Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Equivariance Discovery by Learned Parameter-Sharing

1 code implementation • 7 Apr 2022 • Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing

Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e. g., a convolutional neural network incorporates translation equivariance.

Inductive Bias Translation

Paper
Code

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

1 code implementation • 20 Apr 2022 • Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.

Disentanglement Self-Supervised Learning

409

Paper
Code

End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions

1 code implementation • 19 May 2022 • Wonjune Kang, Mark Hasegawa-Johnson, Deb Roy

Zero-shot voice conversion is becoming an increasingly popular research topic, as it promises the ability to transform speech to sound like any speaker.

Speech Synthesis Style Transfer +1

Paper
Code

Forget-free Continual Learning with Winning Subnetworks

1 code implementation • International Conference on Machine Learning 2022 • Haeyong Kang, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, Chang D. Yoo

Inspired by Lottery Ticket Hypothesis that competitive subnetworks exist within a dense network, we propose a continual learning method referred to as Winning SubNetworks (WSN), which sequentially learns and selects an optimal subnetwork for each task.

Continual Learning

Paper
Code

Dual-Path Cross-Modal Attention for better Audio-Visual Speech Extraction

no code implementations • 9 Jul 2022 • Zhongweiyang Xu, Xulin Fan, Mark Hasegawa-Johnson

Most current research upsamples the visual features along the time dimension so that audio and video features are able to align in time.

Speech Extraction

Paper
Add Code

SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation

no code implementations • 14 Dec 2022 • Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang D. Yoo

To the best of our knowledge, this is the first attempt to apply mixup in NLP while preserving the meaning of a specific word.

Data Augmentation Sentence +1

Paper
Add Code

Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio

no code implementations • 21 May 2023 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

To perform automatic family audio analysis, past studies have collected recordings using phone, video, or audio-only recording devices like LENA, investigated supervised learning methods, and used or fine-tuned general-purpose embeddings learned from large pretrained models.

speaker-diarization Speaker Diarization

Paper
Add Code

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

no code implementations • 25 May 2023 • Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo

INTapt is trained simultaneously in the following two manners: (1) adversarial training to reduce accent feature dependence between the original input and the prompt-concatenated input and (2) training to minimize CTC loss for improving ASR performance to a prompt-concatenated input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Theory of Unsupervised Speech Recognition

1 code implementation • 9 Jun 2023 • Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

Unsupervised speech recognition (ASR-U) is the problem of learning automatic speech recognition (ASR) systems from unpaired speech-only and text-only corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction

no code implementations • 16 Aug 2023 • Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo

Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction.

Sentence

Paper
Add Code

Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

no code implementations • 13 Sep 2023 • Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios

In this study, we leverage the self-supervised learning model, Wav2Vec 2. 0 (W2V2), pretrained on 4300h of home recordings of children under 5 years old, to build a unified system that performs both speaker diarization (SD) and vocalization classification (VC) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching

1 code implementation • 3 Oct 2023 • Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo

Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands.

speech-recognition Speech Recognition +1

Paper
Code

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

no code implementations • 30 Nov 2023 • Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou

We further extend our method to a novel image editing task: substituting the subject in an image through textual manipulations.

Denoising Image Generation

Paper
Add Code

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations

no code implementations • 10 Feb 2024 • Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain

To understand why self-supervised learning (SSL) models have empirically achieved strong performances on several speech-processing downstream tasks, numerous studies have focused on analyzing the encoded information of the SSL layer representations in adult speech.

Self-Supervised Learning

Paper
Add Code

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

no code implementations • 18 Mar 2024 • SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion

no code implementations • 21 Mar 2024 • Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo

Through a series of observations, we find that the prompt choice significantly affects the calibration in CLIP, where the prompts leading to higher text feature dispersion result in better-calibrated predictions.

Test-time Adaptation

Paper
Add Code

Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity

1 code implementation • NAACL 2022 • John Harvill, Roxana Girju, Mark Hasegawa-Johnson

In this paper we focus on patterns of colexification (co-expressions of form-meaning mapping in the lexicon) as an aspect of lexical-semantic organization, and use them to build large scale synset graphs across BabelNet’s typologically diverse set of 499 world languages.

Paper
Code

Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition

no code implementations • ACL 2022 • Liming Wang, Siyuan Feng, Mark Hasegawa-Johnson, Chang Yoo

Phonemes are defined by their relationship to words: changing a phoneme changes the word.

Representation Learning Self-Supervised Learning +2

Paper
Add Code

Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains

no code implementations • EMNLP (Louhi) 2020 • Tarek Sakakini, Jong Yoon Lee, Aditya Duri, Renato F.L. Azevedo, Victor Sadauskas, Kuangxiao Gu, Suma Bhat, Dan Morrow, James Graumlich, Saqib Walayat, Mark Hasegawa-Johnson, Thomas Huang, Ann Willemsen-Dunlap, Donald Halpin

We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting.

Text Simplification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.