Search Results for author: James Glass

Found 153 papers, 66 papers with code

Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings

no code implementations LREC 2022 Christopher Song, David Harwath, Tuka Alhanai, James Glass

We present Speak, a toolkit that allows researchers to crowdsource speech audio recordings using Amazon Mechanical Turk (MTurk).

Curiosity-driven Red-teaming for Large Language Models

1 code implementation29 Feb 2024 Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James Glass, Akash Srivastava, Pulkit Agrawal

To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i. e., test cases) that elicit undesirable responses from LLMs.

Reinforcement Learning (RL)

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

no code implementations16 Jan 2024 Alexander H. Liu, Sung-Lin Yeh, James Glass

We use linear probes to estimate the mutual information between the target information and learned representations, showing another insight into the accessibility to the target information from speech representations.

Representation Learning Self-Supervised Learning +2

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

no code implementations15 Nov 2023 Heng-Jui Chang, James Glass

This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin).

Clustering Representation Learning

Self-Specialization: Uncovering Latent Expertise within Large Language Models

no code implementations29 Sep 2023 Junmo Kang, Hongyin Luo, Yada Zhu, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky

Recent works have demonstrated the effectiveness of self-alignment in which a large language model is, by itself, aligned to follow general instructions through the automatic generation of instructional data using a handful of human-written seeds.

Hallucination Instruction Following +2

Joint Audio and Speech Understanding

1 code implementation25 Sep 2023 Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Humans are surrounded by audio signals that include both speech and non-speech sounds.

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2 code implementations7 Sep 2023 Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS

no code implementations8 Jun 2023 Cheng-Han Chiang, Yung-Sung Chuang, James Glass, Hung-Yi Lee

We also show that even if two SEs have similar performance on STS benchmarks, they can have very different behavior on HEROS.

Negation Sentence +1

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

1 code implementation26 May 2023 Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass

We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering.

Open-Domain Question Answering Passage Retrieval +1

Entailment as Robust Self-Learner

1 code implementation26 May 2023 Jiaxin Ge, Hongyin Luo, Yoon Kim, James Glass

Experiments on binary and multi-class classification tasks show that SimPLE leads to more robust self-training results, indicating that the self-trained entailment models are more efficient and trustworthy than large language models on language understanding tasks.

Multi-class Classification Natural Language Understanding +1

SAIL: Search-Augmented Instruction Learning

no code implementations24 May 2023 Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.

Denoising Fact Checking +3

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations21 May 2023 Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

Listen, Think, and Understand

1 code implementation18 May 2023 Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.

Ranked #3 on Music Question Answering on MusicQA (using extra training data)

Language Modelling Large Language Model +1

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

1 code implementation18 May 2023 Heng-Jui Chang, Alexander H. Liu, James Glass

Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.

Acoustic Unit Discovery Clustering +3

Interpretable Unified Language Checking

1 code implementation7 Apr 2023 Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass

Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge.

Fact Checking Fairness +2

Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning

1 code implementation10 Mar 2023 Hongyin Luo, James Glass

Due to their similarity-based learning objectives, pretrained sentence encoders often internalize stereotypical assumptions that reflect the social biases that exist within their training corpora.

Natural Language Inference Sentence +1

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

1 code implementation20 Dec 2022 Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov

In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data.

Text Generation

On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration

no code implementations14 Nov 2022 Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass

Dropout-based Uncertainty-driven Self-Training (DUST) proceeds by first training a teacher model on source domain labeled data.

Pseudo Label Pseudo Label Filtering +1

PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation

1 code implementation14 Oct 2022 Jingyu Zhang, James Glass, Tianxing He

Existing work on controlled text generation (CTG) assumes a control interface of categorical attributes.

Attribute Text Generation

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

1 code implementation7 Oct 2022 Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.

Knowledge Distillation Retrieval +2

Contrastive Audio-Visual Masked Autoencoder

1 code implementation2 Oct 2022 Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

 Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +4

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

no code implementations17 May 2022 Sameer Khurana, Antoine Laurent, James Glass

We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.

Retrieval Sentence +5

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

1 code implementation6 May 2022 Yuan Gong, Jin Yu, James Glass

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.

Audio Classification

Controlling the Focus of Pretrained Language Generation Models

1 code implementation Findings (ACL) 2022 Jiabao Ji, Yoon Kim, James Glass, Tianxing He

This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.

Abstractive Text Summarization Response Generation +1

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

1 code implementation CVPR 2022 Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Action Localization Retrieval +2

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

1 code implementation8 Dec 2021 Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Retrieval +2

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations1 Dec 2021 Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations19 Oct 2021 Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Audio Classification Emotion Recognition +4

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

1 code implementation14 Oct 2021 Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass

These results show that models trained on other datasets and then evaluated on Spoken ObjectNet tend to perform poorly due to biases in other datasets that the models have learned.

Image Retrieval Language Modelling +1

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

no code implementations7 Oct 2021 Sameer Khurana, Antoine Laurent, James Glass

We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models

1 code implementation6 Sep 2021 Tianxing He, Kyunghyun Cho, James Glass

Prompt-based knowledge probing for 1-hop relations has been used to measure how much world knowledge is stored in pretrained language models.

Knowledge Probing Prompt Engineering +1

Cross-Modal Discrete Representation Learning

no code implementations ACL 2022 Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

no code implementations CVPR 2021 Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva

With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.

Contrastive Learning Retrieval +1

AST: Audio Spectrogram Transformer

3 code implementations5 Apr 2021 Yuan Gong, Yu-An Chung, James Glass

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Audio Classification Audio Tagging +4

Cooperative Self-training of Machine Reading Comprehension

1 code implementation NAACL 2022 Hongyin Luo, Shang-Wen Li, Mingye Gao, Seunghak Yu, James Glass

Pretrained language models have significantly improved the performance of downstream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings.

Extractive Question-Answering Machine Reading Comprehension +6

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks

1 code implementation EMNLP (ClinicalNLP) 2020 Hongyin Luo, Shang-Wen Li, James Glass

Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis.

Goal-Oriented Dialog

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

no code implementations ACL 2021 Wei-Ning Hsu, David Harwath, Christopher Song, James Glass

In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision.

Image Captioning Speech Synthesis +1

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

1 code implementation1 Nov 2020 Alexander H. Liu, Yu-An Chung, James Glass

Self-supervised speech representations have been shown to be effective in a variety of speech applications.

Representation Learning

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation26 Oct 2020 Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Similarity Analysis of Self-Supervised Speech Representations

no code implementations22 Oct 2020 Yu-An Chung, Yonatan Belinkov, James Glass

We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.

Representation Learning

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

no code implementations3 Jun 2020 Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.

Representation Learning Self-Supervised Learning +1

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

no code implementations19 May 2020 Hongyin Luo, Shang-Wen Li, James Glass

Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.

Few-Shot Learning

Vector-Quantized Autoregressive Predictive Coding

2 code implementations17 May 2020 Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Similarity Analysis of Contextual Word Representation Models

1 code implementation ACL 2020 John M. Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass

We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation.

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

no code implementations ACL 2020 Yu-An Chung, James Glass

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.

speech-recognition Speech Recognition +1

SemEval-2015 Task 3: Answer Selection in Community Question Answering

no code implementations SEMEVAL 2015 Preslav Nakov, Lluís Màrquez, Walid Magdy, Alessandro Moschitti, James Glass, Bilal Randeree

Community Question Answering (cQA) provides new interesting research directions to the traditional Question Answering (QA) field, e. g., the exploitation of the interaction between users and the structure of related posts.

Answer Selection Community Question Answering

Neural Multi-Task Learning for Stance Prediction

no code implementations WS 2019 Wei Fang, Moin Nadeem, Mitra Mohtarami, James Glass

We present a multi-task learning model that leverages large amount of textual information from existing datasets to improve stance prediction.

Multi-Task Learning

Generative Pre-Training for Speech with Autoregressive Predictive Coding

2 code implementations23 Oct 2019 Yu-An Chung, James Glass

Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.

Representation Learning Speaker Identification +4

Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models

no code implementations16 Oct 2019 Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.

Response Generation Text Generation +1

Contrastive Language Adaptation for Cross-Lingual Stance Detection

no code implementations IJCNLP 2019 Mitra Mohtarami, James Glass, Preslav Nakov

In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language.

Stance Detection

DARTS: Dialectal Arabic Transcription System

no code implementations26 Sep 2019 Sameer Khurana, Ahmed Ali, James Glass

We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.

Language Modelling Transfer Learning

Automatic Fact-Checking Using Context and Discourse Information

1 code implementation4 Aug 2019 Pepa Atanasova, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Georgi Karadzhov, Tsvetomila Mihaylova, Mitra Mohtarami, James Glass

We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information.

Fact Checking

Transfer Learning from Audio-Visual Grounding to Speech Recognition

no code implementations9 Jul 2019 Wei-Ning Hsu, David Harwath, James Glass

Transfer learning aims to reduce the amount of data required to excel at a new task by re-using the knowledge acquired from learning other related tasks.

speech-recognition Speech Recognition +2

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

1 code implementation9 Jul 2019 Yonatan Belinkov, Ahmed Ali, James Glass

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

FAKTA: An Automatic End-to-End Fact Checking System

no code implementations NAACL 2019 Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, James Glass

We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis.

Fact Checking Retrieval +2

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

no code implementations11 May 2019 Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Language Modeling with Graph Temporal Convolutional Networks

no code implementations ICLR 2019 Hongyin Luo, Yichen Li, Jie Fu, James Glass

Recently, there have been some attempts to use non-recurrent neural models for language modeling.

Language Modelling

VoiceID Loss: Speech Enhancement for Speaker Verification

no code implementations7 Apr 2019 Suwon Shon, Hao Tang, James Glass

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.

Speaker Verification Speech Enhancement

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations5 Apr 2019 Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media

no code implementations NAACL 2019 Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov

In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems: (i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles.

Negative Training for Neural Dialogue Response Generation

1 code implementation ACL 2020 Tianxing He, James Glass

Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses.

Response Generation

Towards Visually Grounded Sub-Word Speech Unit Discovery

no code implementations21 Feb 2019 David Harwath, James Glass

In this paper, we investigate the manner in which interpretable sub-word speech units emerge within a convolutional neural network model trained to associate raw speech waveforms with semantically related natural image scenes.

Adversarial Domain Adaptation for Stance Detection

no code implementations6 Feb 2019 Brian Xu, Mitra Mohtarami, James Glass

This paper studies the problem of stance detection which aims to predict the perspective (or stance) of a given document with respect to a given claim.

Domain Adaptation Fact Checking +1

Analysis Methods in Neural Language Processing: A Survey

no code implementations TACL 2019 Yonatan Belinkov, James Glass

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems.

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

1 code implementation21 Dec 2018 Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass

We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models?

Language Modelling Machine Translation +1

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks

2 code implementations21 Dec 2018 Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, James Glass

We present a toolkit to facilitate the interpretation and understanding of neural network models.

Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain

no code implementations4 Dec 2018 Suwon Shon, Ahmed Ali, James Glass

An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition.

Dialect Identification

Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion

no code implementations27 Nov 2018 Suwon Shon, Tae-Hyun Oh, James Glass

In this paper, we present a multi-modal online person verification system using both speech and visual signals.

Towards Unsupervised Speech-to-Text Translation

no code implementations4 Nov 2018 Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.

Denoising Language Modelling +3

On The Inductive Bias of Words in Acoustics-to-Word Models

no code implementations31 Oct 2018 Hao Tang, James Glass

In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.

Inductive Bias

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

1 code implementation12 Sep 2018 Suwon Shon, Hao Tang, James Glass

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Speaker Recognition Text-Independent Speaker Recognition

Unsupervised Representation Learning of Speech for Dialect Identification

no code implementations12 Sep 2018 Suwon Shon, Wei-Ning Hsu, James Glass

In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID).

Dialect Identification Disentanglement

Detecting egregious responses in neural sequence-to-sequence models

no code implementations ICLR 2019 Tianxing He, James Glass

We adopt an empirical methodology, in which we first create lists of egregious output sequences, and then design a discrete optimization algorithm to find input sequences that will cause the model to generate them.

Response Generation

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

1 code implementation17 Jul 2018 Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.

Audio and Speech Processing Sound

On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition

no code implementations9 Jul 2018 Hao Tang, James Glass

In this paper, we study recurrent networks' ability to learn long-term dependency in the context of speech recognition.

speech-recognition Speech Recognition

A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

no code implementations13 Jun 2018 Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass

Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute.

Data Augmentation Distant Speech Recognition +3

Role-specific Language Models for Processing Recorded Neuropsychological Exams

no code implementations NAACL 2018 Tuka Al Hanai, Rhoda Au, James Glass

Neuropsychological examinations are an important screening tool for the presence of cognitive conditions (e. g. Alzheimer{'}s, Parkinson{'}s Disease), and require a trained tester to conduct the exam through spoken interactions with the subject.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

no code implementations29 May 2018 Wei-Ning Hsu, James Glass

In this paper, we present a partitioned variational autoencoder (PVAE) and several training objectives to learn disentangled representations, which encode not only the shared factors, but also modality-dependent ones, into separate latent variables.

Representation Learning Variational Inference

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

no code implementations NeurIPS 2018 Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

1 code implementation NAACL 2018 Adam Poliak, Yonatan Belinkov, James Glass, Benjamin Van Durme

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena.

Machine Translation Natural Language Inference +4

Integrating Stance Detection and Fact Checking in a Unified Corpus

no code implementations NAACL 2018 Ramy Baly, Mitra Mohtarami, James Glass, Lluis Marquez, Alessandro Moschitti, Preslav Nakov

A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e. g., news websites, social media, etc.

Fact Checking Retrieval +1

Automatic Stance Detection Using End-to-End Memory Networks

no code implementations NAACL 2018 Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluis Marquez, Alessandro Moschitti

We present a novel end-to-end memory network for stance detection, which jointly (i) predicts whether a document agrees, disagrees, discusses or is unrelated with respect to a given target claim, and also (ii) extracts snippets of evidence for that prediction.

Stance Detection

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

no code implementations9 Apr 2018 David Harwath, Galen Chuang, James Glass

In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images.

Retrieval speech-recognition +1

Scalable Factorized Hierarchical Variational Autoencoder Training

2 code implementations9 Apr 2018 Wei-Ning Hsu, James Glass

Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.

Disentanglement Hyperparameter Optimization +5

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

no code implementations ECCV 2018 David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.

Retrieval

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

3 code implementations23 Mar 2018 Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.

Learning Word Embeddings Word Similarity

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

2 code implementations12 Mar 2018 Suwon Shon, Ahmed Ali, James Glass

Although the Siamese network with language embeddings did not achieve as good a result as the end-to-end DID system, the two approaches had good synergy when combined together in a fused system.

Sound Audio and Speech Processing

Fact Checking in Community Forums

3 code implementations8 Mar 2018 Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass

Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information.

Community Question Answering Fact Checking

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

no code implementations7 Mar 2018 Wei-Ning Hsu, James Glass

The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Learning Modality-Invariant Representations for Speech and Images

no code implementations11 Dec 2017 Kenneth Leidal, David Harwath, James Glass

In this paper, we explore the unsupervised learning of a semantic embedding space for co-occurring sensory inputs.

Information Retrieval Retrieval +3

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations NAACL 2018 Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering speech-recognition +2

Learning Word Embeddings from Speech

no code implementations5 Nov 2017 Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.

Learning Word Embeddings Word Similarity

Spoken Language Biomarkers for Detecting Cognitive Impairment

1 code implementation20 Oct 2017 Tuka Alhanai, Rhoda Au, James Glass

In this study we developed an automated system that evaluates speech and language features from audio recordings of neuropsychological examinations of 92 subjects in the Framingham Heart Study.

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

3 code implementations NeurIPS 2017 Wei-Ning Hsu, Yu Zhang, James Glass

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

1 code implementation NeurIPS 2017 Yonatan Belinkov, James Glass

In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

no code implementations28 Aug 2017 Suwon Shon, Ahmed Ali, James Glass

In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches.

Arabic Speech Recognition Dialect Identification +2

Learning Latent Representations for Speech Generation and Transformation

no code implementations13 Apr 2017 Wei-Ning Hsu, Yu Zhang, James Glass

In this paper, we apply a convolutional VAE to model the generative process of natural speech.

Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

2 code implementations23 Feb 2017 Hongyin Luo, Jie Fu, James Glass

However, it has been argued that this is not biologically plausible because back-propagating error signals with the exact incoming weights are not considered possible in biological neural systems.

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

no code implementations25 Sep 2016 Yonatan Belinkov, James Glass

Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair.

Machine Translation Translation

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations19 Sep 2016 Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Deep Multimodal Semantic Embeddings for Speech and Images

no code implementations11 Nov 2015 David Harwath, James Glass

In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities.

Image Retrieval

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.

Distant Speech Recognition speech-recognition

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition.

speech-recognition Speech Recognition +1

Unsupervised Lexicon Discovery from Acoustic Input

no code implementations TACL 2015 Chia-Ying Lee, Timothy J. O{'}Donnell, James Glass

We present a model of unsupervised phonological lexicon discovery{---}the problem of simultaneously learning phoneme-like and word-like units from acoustic input.

Language Acquisition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.