Search Results for author: James Glass

Found 153 papers, 66 papers with code

Speak: A Toolkit Using Amazon Mechanical Turk to Collect and Validate Speech Audio Recordings

no code implementations • LREC 2022 • Christopher Song, David Harwath, Tuka Alhanai, James Glass

We present Speak, a toolkit that allows researchers to crowdsource speech audio recordings using Amazon Mechanical Turk (MTurk).

Paper
Add Code

Curiosity-driven Red-teaming for Large Language Models

1 code implementation • 29 Feb 2024 • Zhang-Wei Hong, Idan Shenfeld, Tsun-Hsuan Wang, Yung-Sung Chuang, Aldo Pareja, James Glass, Akash Srivastava, Pulkit Agrawal

To probe when an LLM generates unwanted content, the current paradigm is to recruit a \textit{red team} of human testers to design input prompts (i. e., test cases) that elicit undesirable responses from LLMs.

Reinforcement Learning (RL)

Paper
Code

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

no code implementations • 16 Jan 2024 • Alexander H. Liu, Sung-Lin Yeh, James Glass

We use linear probes to estimate the mutual information between the target information and learned representations, showing another insight into the accessibility to the target information from speech representations.

Representation Learning Self-Supervised Learning +2

Paper
Add Code

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

no code implementations • 15 Nov 2023 • Heng-Jui Chang, James Glass

This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin).

Clustering Representation Learning

Paper
Add Code

Audio-Visual Neural Syntax Acquisition

no code implementations • 11 Oct 2023 • Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

We study phrase structure induction from visually-grounded speech.

Language Acquisition

Paper
Add Code

Self-Specialization: Uncovering Latent Expertise within Large Language Models

no code implementations • 29 Sep 2023 • Junmo Kang, Hongyin Luo, Yada Zhu, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky

Recent works have demonstrated the effectiveness of self-alignment in which a large language model is, by itself, aligned to follow general instructions through the automatic generation of instructional data using a handful of human-written seeds.

Hallucination Instruction Following +2

Paper
Add Code

Joint Audio and Speech Understanding

1 code implementation • 25 Sep 2023 • Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Humans are surrounded by audio signals that include both speech and non-speech sounds.

286

Paper
Code

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

1 code implementation • 19 Sep 2023 • Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass

How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning?

Instruction Following Language Modelling +5

Paper
Code

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2 code implementations • 7 Sep 2023 • Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

334

Paper
Code

Revealing the Blind Spot of Sentence Encoder Evaluation by HEROS

no code implementations • 8 Jun 2023 • Cheng-Han Chiang, Yung-Sung Chuang, James Glass, Hung-Yi Lee

We also show that even if two SEs have similar performance on STS benchmarks, they can have very different behavior on HEROS.

Negation Sentence +1

Paper
Add Code

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

no code implementations • 1 Jun 2023 • Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass

Having a single model that supports multiple translation tasks is desirable.

Cross-Lingual Transfer Knowledge Distillation +4

Paper
Add Code

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

1 code implementation • 26 May 2023 • Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass

We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering.

Open-Domain Question Answering Passage Retrieval +1

Paper
Code

Entailment as Robust Self-Learner

1 code implementation • 26 May 2023 • Jiaxin Ge, Hongyin Luo, Yoon Kim, James Glass

Experiments on binary and multi-class classification tasks show that SimPLE leads to more robust self-training results, indicating that the self-trained entailment models are more efficient and trustworthy than large language models on language understanding tasks.

Multi-class Classification Natural Language Understanding +1

Paper
Code

SAIL: Search-Augmented Instruction Learning

no code implementations • 24 May 2023 • Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.

Denoising Fact Checking +3

Paper
Add Code

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

no code implementations • 21 May 2023 • Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass

Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.

Paper
Add Code

Listen, Think, and Understand

1 code implementation • 18 May 2023 • Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.

Ranked #3 on Music Question Answering on MusicQA (using extra training data)

Language Modelling Large Language Model +1

286

Paper
Code

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

1 code implementation • 18 May 2023 • Heng-Jui Chang, Alexander H. Liu, James Glass

Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.

Acoustic Unit Discovery Clustering +3

Paper
Code

Interpretable Unified Language Checking

1 code implementation • 7 Apr 2023 • Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass

Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge.

Fact Checking Fairness +2

Paper
Code

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

no code implementations • 29 Mar 2023 • Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Spatio-temporal grounding describes the task of localizing events in space and time, e. g., in video data, based on verbal descriptions only.

Representation Learning Spatio-Temporal Video Grounding

Paper
Add Code

Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence Reasoning

1 code implementation • 10 Mar 2023 • Hongyin Luo, James Glass

Due to their similarity-based learning objectives, pretrained sentence encoders often internalize stereotypical assumptions that reflect the social biases that exist within their training corpora.

Natural Language Inference Sentence +1

Paper
Code

On the Blind Spots of Model-Based Evaluation Metrics for Text Generation

1 code implementation • 20 Dec 2022 • Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov

In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data.

Text Generation

Paper
Code

On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration

no code implementations • 14 Nov 2022 • Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass

Dropout-based Uncertainty-driven Self-Training (DUST) proceeds by first training a teacher model on source domain labeled data.

Pseudo Label Pseudo Label Filtering +1

Paper
Add Code

PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation

1 code implementation • 14 Oct 2022 • Jingyu Zhang, James Glass, Tianxing He

Existing work on controlled text generation (CTG) assumes a control interface of categorical attributes.

Attribute Text Generation

Paper
Code

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

1 code implementation • 7 Oct 2022 • Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.

Knowledge Distillation Retrieval +2

Paper
Code

Contrastive Audio-Visual Masked Autoencoder

1 code implementation • 2 Oct 2022 • Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +4

201

Paper
Code

UAVM: Towards Unifying Audio and Visual Models

1 code implementation • 29 Jul 2022 • Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass

Conventional audio-visual models have independent audio and video branches.

Ranked #2 on Multi-modal Classification on AudioSet (using extra training data)

Audio Classification audio-visual learning +1

Paper
Code

Developing a Series of AI Challenges for the United States Department of the Air Force

1 code implementation • 14 Jul 2022 • Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron, Jean Piou, Hrishikesh M. Rao, Hayley Reynolds, Kaira Samuel, Siddharth Samsi, Morgan Schmidt, Leslie Shing, Olga Simek, Brandon Swenson, Vivienne Sze, Jonathan Taylor, Paul Tylkin, Mark Veillette, Matthew L Weiss, Allan Wollaber, Sophia Yuditskaya, Jeremy Kepner

Through a series of federal initiatives and orders, the U. S. Government has been making a concerted effort to ensure American leadership in AI.

Paper
Code

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

no code implementations • 17 May 2022 • Sameer Khurana, Antoine Laurent, James Glass

We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.

Retrieval Sentence +5

Paper
Add Code

Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition

1 code implementation • 6 May 2022 • Yuan Gong, Jin Yu, James Glass

Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.

Ranked #1 on Audio Classification on VocalSound

Audio Classification

Paper
Code

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

1 code implementation • 6 May 2022 • Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass

Automatic pronunciation assessment is an important technology to help self-directed language learners.

Ranked #2 on Phone-level pronunciation scoring on speechocean762 (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

119

Paper
Code

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

1 code implementation • NAACL 2022 • Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yih, Yoon Kim, James Glass

We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings.

Ranked #13 on Semantic Textual Similarity on STS16

Contrastive Learning Language Modelling +3

286

Paper
Code

Simple and Effective Unsupervised Speech Synthesis

no code implementations • 6 Apr 2022 • Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe.

speech-recognition Speech Recognition +2

Paper
Add Code

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

2 code implementations • 13 Mar 2022 • Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James Glass

Audio classification is an active research area with a wide range of applications.

Audio Classification Knowledge Distillation

1,003

Paper
Code

Controlling the Focus of Pretrained Language Generation Models

1 code implementation • Findings (ACL) 2022 • Jiabao Ji, Yoon Kim, James Glass, Tianxing He

This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.

Abstractive Text Summarization Response Generation +1

Paper
Code

Everything at Once - Multi-Modal Fusion Transformer for Video Retrieval

1 code implementation • CVPR 2022 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne

In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.

Action Localization Retrieval +2

Paper
Code

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

1 code implementation • 8 Dec 2021 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.

Action Localization Retrieval +2

Paper
Code

Routing with Self-Attention for Multimodal Capsule Networks

no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.

Paper
Add Code

Cascaded Multilingual Audio-Visual Learning from Videos

1 code implementation • 8 Nov 2021 • Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

In this paper, we explore self-supervised audio-visual models that learn from instructional videos.

audio-visual learning Retrieval

Paper
Code

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Ranked #1 on Spoken Command Recognition on Speech Command v2

Audio Classification Emotion Recognition +4

1,003

Paper
Code

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

1 code implementation • 14 Oct 2021 • Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass

These results show that models trained on other datasets and then evaluated on Spoken ObjectNet tend to perform poorly due to biases in other datasets that the models have learned.

Image Retrieval Language Modelling +1

Paper
Code

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

no code implementations • 7 Oct 2021 • Sameer Khurana, Antoine Laurent, James Glass

We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

Are end-to-end text-to-speech (TTS) models over-parametrized?

Knowledge Distillation Speech Synthesis

Paper
Add Code

An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models

1 code implementation • 6 Sep 2021 • Tianxing He, Kyunghyun Cho, James Glass

Prompt-based knowledge probing for 1-hop relations has been used to measure how much world knowledge is stored in pretrained language models.

Knowledge Probing Prompt Engineering +1

Paper
Code

Interpretable Propaganda Detection in News Articles

no code implementations • RANLP 2021 • Seunghak Yu, Giovanni Da San Martino, Mitra Mohtarami, James Glass, Preslav Nakov

Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis.

Descriptive Propaganda detection

Paper
Add Code

Mitigating Biases in Toxic Language Detection through Invariant Rationalization

1 code implementation • ACL (WOAH) 2021 • Yung-Sung Chuang, Mingye Gao, Hongyin Luo, James Glass, Hung-Yi Lee, Yun-Nung Chen, Shang-Wen Li

Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse.

Natural Language Understanding

Paper
Code

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cross-Modal Discrete Representation Learning

no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

Paper
Add Code

Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

no code implementations • CVPR 2021 • Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva

With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.

Contrastive Learning Retrieval +1

Paper
Add Code

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Ranked #4 on Long Video Retrieval (Background Removed) on YouCook2

Clustering Contrastive Learning +6

Paper
Code

AST: Audio Spectrogram Transformer

3 code implementations • 5 Apr 2021 • Yuan Gong, Yu-An Chung, James Glass

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Ranked #1 on Audio Classification on Speech Commands

Audio Classification Audio Tagging +4

1,003

Paper
Code

Analyzing the Forgetting Problem in Pretrain-Finetuning of Open-domain Dialogue Response Models

no code implementations • EACL 2021 • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.

Response Generation Text Generation +1

Paper
Add Code

Cooperative Self-training of Machine Reading Comprehension

1 code implementation • NAACL 2022 • Hongyin Luo, Shang-Wen Li, Mingye Gao, Seunghak Yu, James Glass

Pretrained language models have significantly improved the performance of downstream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings.

Ranked #1 on Question Answering on MRQA out-of-domain

Extractive Question-Answering Machine Reading Comprehension +6

Paper
Code

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

1 code implementation • 2 Feb 2021 • Yuan Gong, Yu-An Chung, James Glass

Audio tagging is an active research area and has a wide range of applications.

Ranked #6 on Audio Classification on FSD50K (using extra training data)

Audio Classification Audio Tagging +2

124

Paper
Code

Knowledge Grounded Conversational Symptom Detection with Graph Memory Networks

1 code implementation • EMNLP (ClinicalNLP) 2020 • Hongyin Luo, Shang-Wen Li, James Glass

Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis.

Goal-Oriented Dialog

Paper
Code

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

no code implementations • ACL 2021 • Wei-Ning Hsu, David Harwath, Christopher Song, James Glass

In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision.

Image Captioning Speech Synthesis +1

Paper
Add Code

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

1 code implementation • 1 Nov 2020 • Alexander H. Liu, Yu-An Chung, James Glass

Self-supervised speech representations have been shown to be effective in a variety of speech applications.

Representation Learning

Paper
Code

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation • 26 Oct 2020 • Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Paper
Code

Similarity Analysis of Self-Supervised Speech Representations

no code implementations • 22 Oct 2020 • Yu-An Chung, Yonatan Belinkov, James Glass

We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.

Representation Learning

Paper
Add Code

We Can Detect Your Bias: Predicting the Political Ideology of News Articles

1 code implementation • EMNLP 2020 • Ramy Baly, Giovanni Da San Martino, James Glass, Preslav Nakov

We explore the task of predicting the leading political ideology or bias of news articles.

Paper
Code

A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Moin Nadeem, Tianxing He, Kyunghyun Cho, James Glass

On the other hand, we find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.

Text Generation

Paper
Code

AutoKG: Constructing Virtual Knowledge Graphs from Unstructured Documents for Question Answering

no code implementations • 20 Aug 2020 • Seunghak Yu, Tianxing He, James Glass

Knowledge graphs (KGs) have the advantage of providing fine-grained detail for question-answering systems.

Knowledge Graphs Language Modelling +2

Paper
Add Code

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

CSTNet: Contrastive Speech Translation Network for Self-Supervised Speech Representation Learning

no code implementations • 4 Jun 2020 • Sameer Khurana, Antoine Laurent, James Glass

The audio encoder is trained to perform a speech-translation retrieval task in a contrastive learning framework.

BIG-bench Machine Learning Contrastive Learning +3

Paper
Add Code

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

no code implementations • 3 Jun 2020 • Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption

no code implementations • 19 May 2020 • Hongyin Luo, Shang-Wen Li, James Glass

Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.

Few-Shot Learning

Paper
Add Code

Vector-Quantized Autoregressive Predictive Coding

2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Paper
Code

What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context

1 code implementation • ACL 2020 • Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov

Alternatively, we can profile entire news outlets and look for those that are likely to publish fake or biased content.

Paper
Code

Similarity Analysis of Contextual Word Representation Models

1 code implementation • ACL 2020 • John M. Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass

We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation.

Paper
Code

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

no code implementations • ACL 2020 • Yu-An Chung, James Glass

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.

speech-recognition Speech Recognition +1

Paper
Add Code

SemEval-2016 Task 3: Community Question Answering

no code implementations • SEMEVAL 2016 • Preslav Nakov, Lluís Màrquez, Alessandro Moschitti, Walid Magdy, Hamdy Mubarak, Abed Alhakim Freihat, James Glass, Bilal Randeree

This paper describes the SemEval--2016 Task 3 on Community Question Answering, which we offered in English and Arabic.

Community Question Answering Question Similarity

Paper
Add Code

SemEval-2015 Task 3: Answer Selection in Community Question Answering

no code implementations • SEMEVAL 2015 • Preslav Nakov, Lluís Màrquez, Walid Magdy, Alessandro Moschitti, James Glass, Bilal Randeree

Community Question Answering (cQA) provides new interesting research directions to the traditional Question Answering (QA) field, e. g., the exploitation of the interaction between users and the structure of related posts.

Answer Selection Community Question Answering

Paper
Add Code

Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech

1 code implementation • ICLR 2020 • David Harwath, Wei-Ning Hsu, James Glass

What differentiates this paper from prior work on speech unit learning is the choice of training objective.

Image Retrieval Quantization +1

Paper
Code

On the Linguistic Representational Power of Neural Machine Translation Models

no code implementations • CL 2020 • Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass

(iii) Do the representations capture lexical semantics?

Machine Translation NMT +1

Paper
Add Code

Neural Multi-Task Learning for Stance Prediction

no code implementations • WS 2019 • Wei Fang, Moin Nadeem, Mitra Mohtarami, James Glass

We present a multi-task learning model that leverages large amount of textual information from existing datasets to improve stance prediction.

Multi-Task Learning

Paper
Add Code

Generative Pre-Training for Speech with Autoregressive Predictive Coding

2 code implementations • 23 Oct 2019 • Yu-An Chung, James Glass

Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.

Representation Learning Speaker Identification +4

184

Paper
Code

Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models

no code implementations • 16 Oct 2019 • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng

We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.

Response Generation Text Generation +1

Paper
Add Code

Tanbih: Get To Know What You Are Reading

no code implementations • IJCNLP 2019 • Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov

We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what's behind a news story.

Paper
Add Code

Contrastive Language Adaptation for Cross-Lingual Stance Detection

no code implementations • IJCNLP 2019 • Mitra Mohtarami, James Glass, Preslav Nakov

In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language.

Stance Detection

Paper
Add Code

DARTS: Dialectal Arabic Transcription System

no code implementations • 26 Sep 2019 • Sameer Khurana, Ahmed Ali, James Glass

We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.

Language Modelling Transfer Learning

Paper
Add Code

Automatic Fact-Checking Using Context and Discourse Information

1 code implementation • 4 Aug 2019 • Pepa Atanasova, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Georgi Karadzhov, Tsvetomila Mihaylova, Mitra Mohtarami, James Glass

We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information.

Fact Checking

Paper
Code

Transfer Learning from Audio-Visual Grounding to Speech Recognition

no code implementations • 9 Jul 2019 • Wei-Ning Hsu, David Harwath, James Glass

Transfer learning aims to reduce the amount of data required to excel at a new task by re-using the knowledge acquired from learning other related tasks.

speech-recognition Speech Recognition +2

Paper
Add Code

Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition

1 code implementation • 9 Jul 2019 • Yonatan Belinkov, Ahmed Ali, James Glass

End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

no code implementations • 17 Jun 2019 • Wei Fang, Yu-An Chung, James Glass

For an input text, it is simultaneously passed into BERT and the Tacotron-2 encoder.

Speech Synthesis Transfer Learning

Paper
Add Code

FAKTA: An Automatic End-to-End Fact Checking System

no code implementations • NAACL 2019 • Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, James Glass

We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis.

Fact Checking Retrieval +2

Paper
Add Code

Improving Neural Language Models by Segmenting, Attending, and Predicting the Future

1 code implementation • ACL 2019 • Hongyin Luo, Lan Jiang, Yonatan Belinkov, James Glass

In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase.

Ranked #24 on Language Modelling on WikiText-103

Language Modelling Segmentation

Paper
Code

Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?

no code implementations • EMNLP 2021 • Tianxing He, Jingzhao Zhang, Zhiming Zhou, James Glass

Exposure bias has been regarded as a central problem for auto-regressive language models (LM).

Machine Translation Text Generation

Paper
Add Code

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

no code implementations • 11 May 2019 • Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Language Modeling with Graph Temporal Convolutional Networks

no code implementations • ICLR 2019 • Hongyin Luo, Yichen Li, Jie Fu, James Glass

Recently, there have been some attempts to use non-recurrent neural models for language modeling.

Language Modelling

Paper
Add Code

VoiceID Loss: Speech Enhancement for Speaker Verification

no code implementations • 7 Apr 2019 • Suwon Shon, Hao Tang, James Glass

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.

Speaker Verification Speech Enhancement

Paper
Add Code

Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

no code implementations • SEMEVAL 2019 • Abdelrhman Saleh, Ramy Baly, Alberto Barrón-Cedeño, Giovanni Da San Martino, Mitra Mohtarami, Preslav Nakov, James Glass

In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection.

regression

Paper
Add Code

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

184

Paper
Code

Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media

no code implementations • NAACL 2019 • Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov

In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems: (i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles.

Paper
Add Code

Negative Training for Neural Dialogue Response Generation

1 code implementation • ACL 2020 • Tianxing He, James Glass

Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses.

Response Generation

Paper
Code

Towards Visually Grounded Sub-Word Speech Unit Discovery

no code implementations • 21 Feb 2019 • David Harwath, James Glass

In this paper, we investigate the manner in which interpretable sub-word speech units emerge within a convolutional neural network model trained to associate raw speech waveforms with semantically related natural image scenes.

Paper
Add Code

Adversarial Domain Adaptation for Stance Detection

no code implementations • 6 Feb 2019 • Brian Xu, Mitra Mohtarami, James Glass

This paper studies the problem of stance detection which aims to predict the perspective (or stance) of a given document with respect to a given claim.

Domain Adaptation Fact Checking +1

Paper
Add Code

Analysis Methods in Neural Language Processing: A Survey

no code implementations • TACL 2019 • Yonatan Belinkov, James Glass

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems.

Paper
Add Code

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

1 code implementation • 21 Dec 2018 • Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass

We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models?

Language Modelling Machine Translation +1

Paper
Code

NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks

2 code implementations • 21 Dec 2018 • Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, James Glass

We present a toolkit to facilitate the interpretation and understanding of neural network models.

Paper
Code

Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain

no code implementations • 4 Dec 2018 • Suwon Shon, Ahmed Ali, James Glass

An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition.

Dialect Identification

Paper
Add Code

Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion

no code implementations • 27 Nov 2018 • Suwon Shon, Tae-Hyun Oh, James Glass

In this paper, we present a multi-modal online person verification system using both speech and visual signals.

Paper
Add Code

Towards Unsupervised Speech-to-Text Translation

no code implementations • 4 Nov 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.

Denoising Language Modelling +3

Paper
Add Code

Identifying and Controlling Important Neurons in Neural Machine Translation

no code implementations • ICLR 2019 • Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass

Neural machine translation (NMT) models learn representations containing substantial linguistic information.

Machine Translation NMT +1

Paper
Add Code

On The Inductive Bias of Words in Acoustics-to-Word Models

no code implementations • 31 Oct 2018 • Hao Tang, James Glass

In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.

Inductive Bias

Paper
Add Code

Predicting Factuality of Reporting and Bias of News Media Sources

2 code implementations • EMNLP 2018 • Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov

We present a study on predicting the factuality of reporting and bias of news media.

Fact Checking

Paper
Code

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

1 code implementation • 12 Sep 2018 • Suwon Shon, Hao Tang, James Glass

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Speaker Recognition Text-Independent Speaker Recognition

Paper
Code

Unsupervised Representation Learning of Speech for Dialect Identification

no code implementations • 12 Sep 2018 • Suwon Shon, Wei-Ning Hsu, James Glass

In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID).

Dialect Identification Disentanglement

Paper
Add Code

Detecting egregious responses in neural sequence-to-sequence models

no code implementations • ICLR 2019 • Tianxing He, James Glass

We adopt an empirical methodology, in which we first create lists of egregious output sequences, and then design a discrete optimization algorithm to find input sequences that will cause the model to generate them.

Response Generation

Paper
Add Code

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.

Dependency Parsing Dialect Identification

Paper
Add Code

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

1 code implementation • 17 Jul 2018 • Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.

Audio and Speech Processing Sound

Paper
Code

On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition

no code implementations • 9 Jul 2018 • Hao Tang, James Glass

In this paper, we study recurrent networks' ability to learn long-term dependency in the context of speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition

no code implementations • 13 Jun 2018 • Wei-Ning Hsu, Hao Tang, James Glass

However, it is relatively inexpensive to collect large amounts of unlabeled data from domains that we want the models to generalize to.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

no code implementations • 13 Jun 2018 • Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass

Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute.

Data Augmentation Distant Speech Recognition +3

Paper
Add Code

Role-specific Language Models for Processing Recorded Neuropsychological Exams

no code implementations • NAACL 2018 • Tuka Al Hanai, Rhoda Au, James Glass

Neuropsychological examinations are an important screening tool for the presence of cognitive conditions (e. g. Alzheimer{'}s, Parkinson{'}s Disease), and require a trained tester to conduct the exam through spoken interactions with the subject.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Disentangling by Partitioning: A Representation Learning Framework for Multimodal Sensory Data

no code implementations • 29 May 2018 • Wei-Ning Hsu, James Glass

In this paper, we present a partitioned variational autoencoder (PVAE) and several training objectives to learn disentangled representations, which encode not only the shared factors, but also modality-dependent ones, into separate latent variables.

Representation Learning Variational Inference

Paper
Add Code

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

no code implementations • NeurIPS 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

1 code implementation • NAACL 2018 • Adam Poliak, Yonatan Belinkov, James Glass, Benjamin Van Durme

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena.

Machine Translation Natural Language Inference +4

Paper
Code

Integrating Stance Detection and Fact Checking in a Unified Corpus

no code implementations • NAACL 2018 • Ramy Baly, Mitra Mohtarami, James Glass, Lluis Marquez, Alessandro Moschitti, Preslav Nakov

A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e. g., news websites, social media, etc.

Fact Checking Retrieval +1

Paper
Add Code

Automatic Stance Detection Using End-to-End Memory Networks

no code implementations • NAACL 2018 • Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluis Marquez, Alessandro Moschitti

We present a novel end-to-end memory network for stance detection, which jointly (i) predicts whether a document agrees, disagrees, discusses or is unrelated with respect to a given target claim, and also (ii) extracts snippets of evidence for that prediction.

Ranked #6 on Fake News Detection on FNC-1

Stance Detection

Paper
Add Code

Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech

no code implementations • 9 Apr 2018 • David Harwath, Galen Chuang, James Glass

In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images.

Retrieval speech-recognition +1

Paper
Add Code

Scalable Factorized Hierarchical Variational Autoencoder Training

2 code implementations • 9 Apr 2018 • Wei-Ning Hsu, James Glass

Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.

Disentanglement Hyperparameter Optimization +5

Paper
Code

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

no code implementations • ECCV 2018 • David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.

Retrieval

Paper
Add Code

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

3 code implementations • 23 Mar 2018 • Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.

Learning Word Embeddings Word Similarity

Paper
Code

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

2 code implementations • 12 Mar 2018 • Suwon Shon, Ahmed Ali, James Glass

Although the Siamese network with language embeddings did not achieve as good a result as the end-to-end DID system, the two approaches had good synergy when combined together in a fused system.

Sound Audio and Speech Processing

Paper
Code

Fact Checking in Community Forums

3 code implementations • 8 Mar 2018 • Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass

Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information.

Community Question Answering Fact Checking

Paper
Code

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

no code implementations • 7 Mar 2018 • Wei-Ning Hsu, James Glass

The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks

1 code implementation • IJCNLP 2017 • Yonatan Belinkov, Lluís Màrquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass

In this paper, we investigate the representations learned at different layers of NMT encoders.

Machine Translation NMT +3

Paper
Code

Learning Modality-Invariant Representations for Speech and Images

no code implementations • 11 Dec 2017 • Kenneth Leidal, David Harwath, James Glass

In this paper, we explore the unsupervised learning of a semantic embedding space for co-occurring sensory inputs.

Information Retrieval Retrieval +3

Paper
Add Code

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations • NAACL 2018 • Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering speech-recognition +2

Paper
Add Code

Learning Word Embeddings from Speech

no code implementations • 5 Nov 2017 • Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.

Learning Word Embeddings Word Similarity

Paper
Add Code

Spoken Language Biomarkers for Detecting Cognitive Impairment

1 code implementation • 20 Oct 2017 • Tuka Alhanai, Rhoda Au, James Glass

In this study we developed an automated system that evaluates speech and language features from audio recordings of neuropsychological examinations of 92 subjects in the Framingham Heart Study.

Paper
Code

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

3 code implementations • NeurIPS 2017 • Wei-Ning Hsu, Yu Zhang, James Glass

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

149

Paper
Code

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

1 code implementation • NeurIPS 2017 • Yonatan Belinkov, James Glass

In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

no code implementations • 28 Aug 2017 • Suwon Shon, Ahmed Ali, James Glass

In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches.

Arabic Speech Recognition Dialect Identification +2

Paper
Add Code

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

no code implementations • 19 Jul 2017 • Wei-Ning Hsu, Yu Zhang, James Glass

Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Learning Latent Representations for Speech Generation and Transformation

no code implementations • 13 Apr 2017 • Wei-Ning Hsu, Yu Zhang, James Glass

In this paper, we apply a convolutional VAE to model the generative process of natural speech.

Paper
Add Code

What do Neural Machine Translation Models Learn about Morphology?

1 code implementation • ACL 2017 • Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass

Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture.

Machine Translation Morphological Tagging +1

Paper
Code

Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks

2 code implementations • 23 Feb 2017 • Hongyin Luo, Jie Fu, James Glass

However, it has been argued that this is not biologically plausible because back-propagating error signals with the exact incoming weights are not considered possible in biological neural systems.

Paper
Code

Neural Attention for Learning to Rank Questions in Community Question Answering

no code implementations • COLING 2016 • Salvatore Romeo, Giovanni Da San Martino, Alberto Barr{\'o}n-Cede{\~n}o, Aless Moschitti, ro, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Mitra Mohtarami, James Glass

In real-world data, e. g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms.

Community Question Answering Learning-To-Rank +3

Paper
Add Code

Unsupervised Learning of Spoken Language with Visual Context

no code implementations • NeurIPS 2016 • David Harwath, Antonio Torralba, James Glass

Humans learn to speak before they can read or write, so why can’t computers do the same?

Image Retrieval Language Acquisition

Paper
Add Code

Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results

no code implementations • 25 Sep 2016 • Yonatan Belinkov, James Glass

Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair.

Machine Translation Translation

Paper
Add Code

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects

1 code implementation • WS 2016 • Yonatan Belinkov, James Glass

Discriminating between closely-related language varieties is considered a challenging and important task.

Dialect Identification

Paper
Code

The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition

no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang

For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.

Acoustic Modelling Language Modelling +1

Paper
Add Code

Learning Semantic Relatedness in Community Question Answering Using Neural Models

no code implementations • WS 2016 • Henry Nassif, Mitra Mohtarami, James Glass

Answer Selection Community Question Answering +3

Paper
Add Code

Recurrent Neural Network Encoder with Attention for Community Question Answering

no code implementations • 23 Mar 2016 • Wei-Ning Hsu, Yu Zhang, James Glass

We apply a general recurrent neural network (RNN) encoder framework to community question answering (cQA) tasks.

Community Question Answering Information Retrieval +2

Paper
Add Code

Deep Multimodal Semantic Embeddings for Speech and Images

no code implementations • 11 Nov 2015 • David Harwath, James Glass

In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities.

Image Retrieval

Paper
Add Code

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

no code implementations • 30 Oct 2015 • Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.

Distant Speech Recognition speech-recognition

Paper
Add Code

Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition

no code implementations • 30 Oct 2015 • Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu

In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition.

speech-recognition Speech Recognition +1

Paper
Add Code

Automatic Dialect Detection in Arabic Broadcast Speech

1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

Ranked #1 on Spoken language identification on Untranscribed mixed-speech dataset

Dialect Identification speech-recognition +2

Paper
Code

Arabic Diacritization with Recurrent Neural Networks

no code implementations • EMNLP 2015 • Yonatan Belinkov, James Glass

Language Modelling Morphological Analysis +2

Paper
Add Code

VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems

no code implementations • SEMEVAL 2015 • Yonatan Belinkov, Mitra Mohtarami, Scott Cyphers, James Glass

Answer Selection Community Question Answering

Paper
Add Code

Unsupervised Lexicon Discovery from Acoustic Input

no code implementations • TACL 2015 • Chia-Ying Lee, Timothy J. O{'}Donnell, James Glass

We present a model of unsupervised phonological lexicon discovery{---}the problem of simultaneously learning phoneme-like and word-like units from acoustic input.

Language Acquisition Speech Recognition

Paper
Add Code

Joint Learning of Phonetic Units and Word Pronunciations for ASR

no code implementations • EMNLP 2013 • Chia-Ying Lee, Yu Zhang, James Glass

Language Modelling Speech Recognition

Paper
Add Code

A Nonparametric Bayesian Approach to Acoustic Model Discovery

no code implementations • ACL 2012 • Chia-Ying Lee, James Glass

Model Discovery

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.