Search Results for author: James Glass

Found 125 papers, 46 papers with code

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

no code implementations17 May 2022 Sameer Khurana, Antoine Laurent, James Glass

We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.

Frame Sentence Embedding +4

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

1 code implementation6 May 2022 Yuan Gong, Jin Yu, James Glass

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.

Audio Classification

Controlling the Focus of Pretrained Language Generation Models

1 code implementation Findings (ACL) 2022 Jiabao Ji, Yoon Kim, James Glass, Tianxing He

This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.

Abstractive Text Summarization Response Generation +1

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

no code implementations8 Dec 2021 Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Video Retrieval

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations1 Dec 2021 Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations19 Oct 2021 Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Audio Classification Classification +5

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

1 code implementation14 Oct 2021 Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass

These results show that models trained on other datasets and then evaluated on Spoken ObjectNet tend to perform poorly due to biases in other datasets that the models have learned.

Image Retrieval Language Modelling

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

no code implementations7 Oct 2021 Sameer Khurana, Antoine Laurent, James Glass

We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.

Automatic Speech Recognition Cross-Lingual Transfer +1

An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models

1 code implementation6 Sep 2021 Tianxing He, Kyunghyun Cho, James Glass

Prompt-based knowledge probing for 1-hop relations has been used to measure how much world knowledge is stored in pretrained language models.

Pretrained Language Models

Cross-Modal Discrete Representation Learning

no code implementations ACL 2022 Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Frame +3

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

no code implementations CVPR 2021 Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva

With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.

Contrastive Learning Video Understanding

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation ICCV 2021 Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Contrastive Learning Self-Supervised Learning +3

AST: Audio Spectrogram Transformer

2 code implementations5 Apr 2021 Yuan Gong, Yu-An Chung, James Glass

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Audio Classification Audio Tagging +5

Cooperative Learning of Zero-Shot Machine Reading Comprehension

no code implementations12 Mar 2021 Hongyin Luo, Shang-Wen Li, Seunghak Yu, James Glass

REGEX is built upon a masked answer extraction task with an interactive learning environment containing an answer entity REcognizer, a question Generator, and an answer EXtractor.

Machine Reading Comprehension Pretrained Language Models +4

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks

no code implementations EMNLP (ClinicalNLP) 2020 Hongyin Luo, Shang-Wen Li, James Glass

Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis.

Goal-Oriented Dialog

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

no code implementations ACL 2021 Wei-Ning Hsu, David Harwath, Christopher Song, James Glass

In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision.

Image Captioning Speech Synthesis +1

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

1 code implementation1 Nov 2020 Alexander H. Liu, Yu-An Chung, James Glass

Self-supervised speech representations have been shown to be effective in a variety of speech applications.

Representation Learning

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation26 Oct 2020 Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Similarity Analysis of Self-Supervised Speech Representations

no code implementations22 Oct 2020 Yu-An Chung, Yonatan Belinkov, James Glass

We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.

Representation Learning

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

no code implementations3 Jun 2020 Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.

Representation Learning Self-Supervised Learning +1

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

no code implementations19 May 2020 Hongyin Luo, Shang-Wen Li, James Glass

Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.

Few-Shot Learning

Vector-Quantized Autoregressive Predictive Coding

2 code implementations17 May 2020 Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Similarity Analysis of Contextual Word Representation Models

1 code implementation ACL 2020 John M. Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass

We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation.

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

no code implementations ACL 2020 Yu-An Chung, James Glass

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.

Frame Speech Recognition +1

SemEval-2015 Task 3: Answer Selection in Community Question Answering

no code implementations SEMEVAL 2015 Preslav Nakov, Lluís Màrquez, Walid Magdy, Alessandro Moschitti, James Glass, Bilal Randeree

Community Question Answering (cQA) provides new interesting research directions to the traditional Question Answering (QA) field, e. g., the exploitation of the interaction between users and the structure of related posts.

Answer Selection Community Question Answering

Neural Multi-Task Learning for Stance Prediction

no code implementations WS 2019 Wei Fang, Moin Nadeem, Mitra Mohtarami, James Glass

We present a multi-task learning model that leverages large amount of textual information from existing datasets to improve stance prediction.

Multi-Task Learning

Generative Pre-Training for Speech with Autoregressive Predictive Coding

2 code implementations23 Oct 2019 Yu-An Chung, James Glass

Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.

Representation Learning Speaker Identification +3

Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models

no code implementations16 Oct 2019 Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.

Response Generation Text Generation +1

Contrastive Language Adaptation for Cross-Lingual Stance Detection

no code implementations IJCNLP 2019 Mitra Mohtarami, James Glass, Preslav Nakov

In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language.

Stance Detection

DARTS: Dialectal Arabic Transcription System

no code implementations26 Sep 2019 Sameer Khurana, Ahmed Ali, James Glass

We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.

Language Modelling Transfer Learning

Automatic Fact-Checking Using Context and Discourse Information

1 code implementation4 Aug 2019 Pepa Atanasova, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Georgi Karadzhov, Tsvetomila Mihaylova, Mitra Mohtarami, James Glass

We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information.

Fact Checking

Transfer Learning from Audio-Visual Grounding to Speech Recognition

no code implementations9 Jul 2019 Wei-Ning Hsu, David Harwath, James Glass

Transfer learning aims to reduce the amount of data required to excel at a new task by re-using the knowledge acquired from learning other related tasks.

Speech Recognition Transfer Learning +1

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

1 code implementation9 Jul 2019 Yonatan Belinkov, Ahmed Ali, James Glass

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions.

Automatic Speech Recognition

FAKTA: An Automatic End-to-End Fact Checking System

no code implementations NAACL 2019 Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, James Glass

We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis.

Fact Checking Stance Detection

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

1 code implementation ACL 2019 Hongyin Luo, Lan Jiang, Yonatan Belinkov, James Glass

In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase.

Language Modelling

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

no code implementations11 May 2019 Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).

Automatic Speech Recognition Contrastive Learning +1

Language Modeling with Graph Temporal Convolutional Networks

no code implementations ICLR 2019 Hongyin Luo, Yichen Li, Jie Fu, James Glass

Recently, there have been some attempts to use non-recurrent neural models for language modeling.

Language Modelling

VoiceID Loss: Speech Enhancement for Speaker Verification

no code implementations7 Apr 2019 Suwon Shon, Hao Tang, James Glass

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.

Speaker Verification Speech Enhancement

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations5 Apr 2019 Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media

no code implementations NAACL 2019 Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov

In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems: (i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles.

Negative Training for Neural Dialogue Response Generation

1 code implementation ACL 2020 Tianxing He, James Glass

Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses.

Response Generation

Towards Visually Grounded Sub-Word Speech Unit Discovery

no code implementations21 Feb 2019 David Harwath, James Glass

In this paper, we investigate the manner in which interpretable sub-word speech units emerge within a convolutional neural network model trained to associate raw speech waveforms with semantically related natural image scenes.

Adversarial Domain Adaptation for Stance Detection

no code implementations6 Feb 2019 Brian Xu, Mitra Mohtarami, James Glass

This paper studies the problem of stance detection which aims to predict the perspective (or stance) of a given document with respect to a given claim.

Domain Adaptation Fact Checking +1

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

1 code implementation21 Dec 2018 Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass

We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models?

Language Modelling Machine Translation

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks

2 code implementations21 Dec 2018 Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, James Glass

We present a toolkit to facilitate the interpretation and understanding of neural network models.

Analysis Methods in Neural Language Processing: A Survey

no code implementations TACL 2019 Yonatan Belinkov, James Glass

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems.

Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain

no code implementations4 Dec 2018 Suwon Shon, Ahmed Ali, James Glass

An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition.

Dialect Identification

Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion

no code implementations27 Nov 2018 Suwon Shon, Tae-Hyun Oh, James Glass

In this paper, we present a multi-modal online person verification system using both speech and visual signals.

Towards Unsupervised Speech-to-Text Translation

no code implementations4 Nov 2018 Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.

Denoising Language Modelling +3

On The Inductive Bias of Words in Acoustics-to-Word Models

no code implementations31 Oct 2018 Hao Tang, James Glass

In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

1 code implementation12 Sep 2018 Suwon Shon, Hao Tang, James Glass

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Frame Speaker Recognition +1

Unsupervised Representation Learning of Speech for Dialect Identification

no code implementations12 Sep 2018 Suwon Shon, Wei-Ning Hsu, James Glass

In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID).

Dialect Identification Disentanglement

Detecting egregious responses in neural sequence-to-sequence models

no code implementations ICLR 2019 Tianxing He, James Glass

We adopt an empirical methodology, in which we first create lists of egregious output sequences, and then design a discrete optimization algorithm to find input sequences that will cause the model to generate them.

Response Generation

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

1 code implementation17 Jul 2018 Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.

Audio and Speech Processing Sound

On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition

no code implementations9 Jul 2018 Hao Tang, James Glass

In this paper, we study recurrent networks' ability to learn long-term dependency in the context of speech recognition.

Speech Recognition

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition

no code implementations13 Jun 2018 Wei-Ning Hsu, Hao Tang, James Glass

However, it is relatively inexpensive to collect large amounts of unlabeled data from domains that we want the models to generalize to.

Automatic Speech Recognition

A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

no code implementations13 Jun 2018 Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass

Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute.

Data Augmentation Distant Speech Recognition +2

Role-specific Language Models for Processing Recorded Neuropsychological Exams

no code implementations NAACL 2018 Tuka Al Hanai, Rhoda Au, James Glass

Neuropsychological examinations are an important screening tool for the presence of cognitive conditions (e. g. Alzheimer{'}s, Parkinson{'}s Disease), and require a trained tester to conduct the exam through spoken interactions with the subject.

Automatic Speech Recognition Epidemiology +1

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

no code implementations29 May 2018 Wei-Ning Hsu, James Glass

In this paper, we present a partitioned variational autoencoder (PVAE) and several training objectives to learn disentangled representations, which encode not only the shared factors, but also modality-dependent ones, into separate latent variables.

Representation Learning Variational Inference

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

no code implementations NeurIPS 2018 Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.

Automatic Speech Recognition Cross-Lingual Word Embeddings +3

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

1 code implementation NAACL 2018 Adam Poliak, Yonatan Belinkov, James Glass, Benjamin Van Durme

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena.

Machine Translation Natural Language Inference +1

Integrating Stance Detection and Fact Checking in a Unified Corpus

no code implementations NAACL 2018 Ramy Baly, Mitra Mohtarami, James Glass, Lluis Marquez, Alessandro Moschitti, Preslav Nakov

A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e. g., news websites, social media, etc.

Fact Checking Stance Detection

Automatic Stance Detection Using End-to-End Memory Networks

no code implementations NAACL 2018 Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluis Marquez, Alessandro Moschitti

We present a novel end-to-end memory network for stance detection, which jointly (i) predicts whether a document agrees, disagrees, discusses or is unrelated with respect to a given target claim, and also (ii) extracts snippets of evidence for that prediction.

Stance Detection

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

no code implementations9 Apr 2018 David Harwath, Galen Chuang, James Glass

In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images.

Speech Recognition

Scalable Factorized Hierarchical Variational Autoencoder Training

2 code implementations9 Apr 2018 Wei-Ning Hsu, James Glass

Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.

Disentanglement Hyperparameter Optimization +4

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

no code implementations ECCV 2018 David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

1 code implementation23 Mar 2018 Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.

Learning Word Embeddings Word Similarity

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

2 code implementations12 Mar 2018 Suwon Shon, Ahmed Ali, James Glass

Although the Siamese network with language embeddings did not achieve as good a result as the end-to-end DID system, the two approaches had good synergy when combined together in a fused system.

Sound Audio and Speech Processing

Fact Checking in Community Forums

3 code implementations8 Mar 2018 Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass

Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information.

Community Question Answering Fact Checking

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

no code implementations7 Mar 2018 Wei-Ning Hsu, James Glass

The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions.

Automatic Speech Recognition

Learning Modality-Invariant Representations for Speech and Images

no code implementations11 Dec 2017 Kenneth Leidal, David Harwath, James Glass

In this paper, we explore the unsupervised learning of a semantic embedding space for co-occurring sensory inputs.

Information Retrieval Semantic Similarity +2

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations NAACL 2018 Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering Speech Recognition +1

Learning Word Embeddings from Speech

no code implementations5 Nov 2017 Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.

Learning Word Embeddings Word Similarity

Spoken Language Biomarkers for Detecting Cognitive Impairment

1 code implementation20 Oct 2017 Tuka Alhanai, Rhoda Au, James Glass

In this study we developed an automated system that evaluates speech and language features from audio recordings of neuropsychological examinations of 92 subjects in the Framingham Heart Study.

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

3 code implementations NeurIPS 2017 Wei-Ning Hsu, Yu Zhang, James Glass

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

Automatic Speech Recognition Speaker Verification

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

1 code implementation NeurIPS 2017 Yonatan Belinkov, James Glass

In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss.

Automatic Speech Recognition Frame +1

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

no code implementations28 Aug 2017 Suwon Shon, Ahmed Ali, James Glass

In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches.

Arabic Speech Recognition Dialect Identification +1

Learning Latent Representations for Speech Generation and Transformation

no code implementations13 Apr 2017 Wei-Ning Hsu, Yu Zhang, James Glass

In this paper, we apply a convolutional VAE to model the generative process of natural speech.

Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

2 code implementations23 Feb 2017 Hongyin Luo, Jie Fu, James Glass

However, it has been argued that this is not biologically plausible because back-propagating error signals with the exact incoming weights are not considered possible in biological neural systems.

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

no code implementations25 Sep 2016 Yonatan Belinkov, James Glass

Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair.

Machine Translation Translation

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations19 Sep 2016 Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Deep Multimodal Semantic Embeddings for Speech and Images

no code implementations11 Nov 2015 David Harwath, James Glass

In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities.

Image Retrieval

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition.

Speech Recognition Transfer Learning

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.

Distant Speech Recognition Frame

Unsupervised Lexicon Discovery from Acoustic Input

no code implementations TACL 2015 Chia-Ying Lee, Timothy J. O{'}Donnell, James Glass

We present a model of unsupervised phonological lexicon discovery{---}the problem of simultaneously learning phoneme-like and word-like units from acoustic input.

Language Acquisition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.