no code implementations • LREC 2022 • Christopher Song, David Harwath, Tuka Alhanai, James Glass
We present Speak, a toolkit that allows researchers to crowdsource speech audio recordings using Amazon Mechanical Turk (MTurk).
1 code implementation • 25 Sep 2023 • Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
Humans are surrounded by audio signals that include both speech and non-speech sounds.
1 code implementation • 19 Sep 2023 • Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning?
1 code implementation • 7 Sep 2023 • Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He
Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.
no code implementations • 8 Jun 2023 • Cheng-Han Chiang, Yung-Sung Chuang, James Glass, Hung-Yi Lee
We also show that even if two SEs have similar performance on STS benchmarks, they can have very different behavior on HEROS.
no code implementations • 1 Jun 2023 • Sameer Khurana, Nauman Dawalatabad, Antoine Laurent, Luis Vicente, Pablo Gimeno, Victoria Mingote, James Glass
Having a single model that supports multiple translation tasks is desirable.
1 code implementation • 26 May 2023 • Jiaxin Ge, Hongyin Luo, Yoon Kim, James Glass
Experiments on binary and multi-class classification tasks show that SimPLE leads to more robust self-training results, indicating that the self-trained entailment models are more efficient and trustworthy than large language models on language understanding tasks.
Multi-class Classification
Natural Language Understanding
+1
1 code implementation • 26 May 2023 • Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass
We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering.
no code implementations • 24 May 2023 • Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass
Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.
no code implementations • 21 May 2023 • Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogerio Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James Glass
Recent models such as XLS-R and Whisper have made multilingual speech technologies more accessible by pre-training on audio from around 100 spoken languages each.
1 code implementation • 18 May 2023 • Heng-Jui Chang, Alexander H. Liu, James Glass
Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.
1 code implementation • 18 May 2023 • Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass
In this paper, we propose a novel audio foundation model, called LTU (Listen, Think, and Understand).
Ranked #3 on
Music Question Answering
on MusicQA Dataset
(using extra training data)
1 code implementation • 7 Apr 2023 • Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass
Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge.
no code implementations • 29 Mar 2023 • Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne
Spatio-temporal grounding describes the task of localizing events in space and time, e. g., in video data, based on verbal descriptions only.
1 code implementation • 10 Mar 2023 • Hongyin Luo, James Glass
Due to their similarity-based learning objectives, pretrained sentence encoders often internalize stereotypical assumptions that reflect the social biases that exist within their training corpora.
1 code implementation • 20 Dec 2022 • Tianxing He, Jingyu Zhang, Tianle Wang, Sachin Kumar, Kyunghyun Cho, James Glass, Yulia Tsvetkov
In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data.
no code implementations • 14 Nov 2022 • Nauman Dawalatabad, Sameer Khurana, Antoine Laurent, James Glass
Dropout-based Uncertainty-driven Self-Training (DUST) proceeds by first training a teacher model on source domain labeled data.
1 code implementation • 14 Oct 2022 • Jingyu Zhang, James Glass, Tianxing He
Existing work on controlled text generation (CTG) assumes a control interface of categorical attributes.
1 code implementation • 7 Oct 2022 • Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass
Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English.
1 code implementation • 2 Oct 2022 • Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.
Ranked #1 on
Audio Tagging
on AudioSet
(using extra training data)
1 code implementation • 29 Jul 2022 • Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass
Conventional audio-visual models have independent audio and video branches.
Ranked #2 on
Multi-modal Classification
on AudioSet
(using extra training data)
1 code implementation • 14 Jul 2022 • Vijay Gadepally, Gregory Angelides, Andrei Barbu, Andrew Bowne, Laura J. Brattain, Tamara Broderick, Armando Cabrera, Glenn Carl, Ronisha Carter, Miriam Cha, Emilie Cowen, Jesse Cummings, Bill Freeman, James Glass, Sam Goldberg, Mark Hamilton, Thomas Heldt, Kuan Wei Huang, Phillip Isola, Boris Katz, Jamie Koerner, Yen-Chen Lin, David Mayo, Kyle McAlpin, Taylor Perron, Jean Piou, Hrishikesh M. Rao, Hayley Reynolds, Kaira Samuel, Siddharth Samsi, Morgan Schmidt, Leslie Shing, Olga Simek, Brandon Swenson, Vivienne Sze, Jonathan Taylor, Paul Tylkin, Mark Veillette, Matthew L Weiss, Allan Wollaber, Sophia Yuditskaya, Jeremy Kepner
Through a series of federal initiatives and orders, the U. S. Government has been making a concerted effort to ensure American leadership in AI.
no code implementations • 17 May 2022 • Sameer Khurana, Antoine Laurent, James Glass
We combine state-of-the-art multilingual acoustic frame-level speech representation learning model XLS-R with the Language Agnostic BERT Sentence Embedding (LaBSE) model to create an utterance-level multimodal multilingual speech encoder SAMU-XLSR.
1 code implementation • 6 May 2022 • Yuan Gong, Jin Yu, James Glass
Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.
Ranked #1 on
Audio Classification
on VocalSound
1 code implementation • 6 May 2022 • Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass
Automatic pronunciation assessment is an important technology to help self-directed language learners.
Ranked #2 on
Phone-level pronunciation scoring
on speechocean762
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • NAACL 2022 • Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yih, Yoon Kim, James Glass
We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings.
Ranked #9 on
Semantic Textual Similarity
on STS16
no code implementations • 6 Apr 2022 • Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass
We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe.
2 code implementations • 13 Mar 2022 • Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James Glass
Audio classification is an active research area with a wide range of applications.
1 code implementation • Findings (ACL) 2022 • Jiabao Ji, Yoon Kim, James Glass, Tianxing He
This work aims to develop a control mechanism by which a user can select spans of context as "highlights" for the model to focus on, and generate relevant output.
1 code implementation • CVPR 2022 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio S. Feris, David Harwath, James Glass, Hilde Kuehne
In this work, we present a multi-modal, modality agnostic fusion transformer that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a fused representation in a joined multi-modal embedding space.
1 code implementation • 8 Dec 2021 • Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne
Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification.
no code implementations • 1 Dec 2021 • Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data.
1 code implementation • 8 Nov 2021 • Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass
In this paper, we explore self-supervised audio-visual models that learn from instructional videos.
2 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.
Ranked #1 on
Spoken Command Recognition
on Speech Command v2
1 code implementation • 14 Oct 2021 • Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass
These results show that models trained on other datasets and then evaluated on Spoken ObjectNet tend to perform poorly due to biases in other datasets that the models have learned.
no code implementations • 7 Oct 2021 • Sameer Khurana, Antoine Laurent, James Glass
We propose a simple and effective cross-lingual transfer learning method to adapt monolingual wav2vec-2. 0 models for Automatic Speech Recognition (ASR) in resource-scarce languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass
Are end-to-end text-to-speech (TTS) models over-parametrized?
1 code implementation • 6 Sep 2021 • Tianxing He, Kyunghyun Cho, James Glass
Prompt-based knowledge probing for 1-hop relations has been used to measure how much world knowledge is stored in pretrained language models.
no code implementations • RANLP 2021 • Seunghak Yu, Giovanni Da San Martino, Mitra Mohtarami, James Glass, Preslav Nakov
Online users today are exposed to misleading and propagandistic news articles and media posts on a daily basis.
1 code implementation • ACL (WOAH) 2021 • Yung-Sung Chuang, Mingye Gao, Hongyin Luo, James Glass, Hung-Yi Lee, Yun-Nung Chen, Shang-Wen Li
Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse.
no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass
We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.
no code implementations • CVPR 2021 • Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.
1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang
Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.
3 code implementations • 5 Apr 2021 • Yuan Gong, Yu-An Chung, James Glass
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.
Ranked #1 on
Audio Classification
on Speech Commands
no code implementations • EACL 2021 • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng
We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.
1 code implementation • NAACL 2022 • Hongyin Luo, Shang-Wen Li, Mingye Gao, Seunghak Yu, James Glass
Pretrained language models have significantly improved the performance of downstream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings.
Ranked #1 on
Question Answering
on MRQA out-of-domain
Extractive Question-Answering
Machine Reading Comprehension
+6
1 code implementation • 2 Feb 2021 • Yuan Gong, Yu-An Chung, James Glass
Audio tagging is an active research area and has a wide range of applications.
Ranked #4 on
Audio Classification
on FSD50K
(using extra training data)
1 code implementation • EMNLP (ClinicalNLP) 2020 • Hongyin Luo, Shang-Wen Li, James Glass
Given a set of explicit symptoms provided by the patient to initiate a dialog for diagnosing, the system is trained to collect implicit symptoms by asking questions, in order to collect more information for making an accurate diagnosis.
no code implementations • ACL 2021 • Wei-Ning Hsu, David Harwath, Christopher Song, James Glass
In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision.
1 code implementation • 1 Nov 2020 • Alexander H. Liu, Yu-An Chung, James Glass
Self-supervised speech representations have been shown to be effective in a variety of speech applications.
1 code implementation • 26 Oct 2020 • Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass
Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.
no code implementations • 22 Oct 2020 • Yu-An Chung, Yonatan Belinkov, James Glass
We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.
1 code implementation • EMNLP 2020 • Ramy Baly, Giovanni Da San Martino, James Glass, Preslav Nakov
We explore the task of predicting the leading political ideology or bias of news articles.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Moin Nadeem, Tianxing He, Kyunghyun Cho, James Glass
On the other hand, we find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
no code implementations • 20 Aug 2020 • Seunghak Yu, Tianxing He, James Glass
Knowledge graphs (KGs) have the advantage of providing fine-grained detail for question-answering systems.
1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass
Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 4 Jun 2020 • Sameer Khurana, Antoine Laurent, James Glass
The audio encoder is trained to perform a speech-translation retrieval task in a contrastive learning framework.
no code implementations • 3 Jun 2020 • Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech.
no code implementations • 19 May 2020 • Hongyin Luo, Shang-Wen Li, James Glass
Experiments showed that the ProtoQN significantly outperformed the baseline DQN model in both supervised and few-shot learning scenarios, and achieves state-of-the-art few-shot learning performances.
2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass
Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.
1 code implementation • ACL 2020 • Ramy Baly, Georgi Karadzhov, Jisun An, Haewoon Kwak, Yoan Dinkov, Ahmed Ali, James Glass, Preslav Nakov
Alternatively, we can profile entire news outlets and look for those that are likely to publish fake or biased content.
1 code implementation • ACL 2020 • John M. Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass
We use existing and novel similarity measures that aim to gauge the level of localization of information in the deep models, and facilitate the investigation of which design factors affect model similarity, without requiring any external linguistic annotation.
no code implementations • ACL 2020 • Yu-An Chung, James Glass
Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.
no code implementations • SEMEVAL 2016 • Preslav Nakov, Lluís Màrquez, Alessandro Moschitti, Walid Magdy, Hamdy Mubarak, Abed Alhakim Freihat, James Glass, Bilal Randeree
This paper describes the SemEval--2016 Task 3 on Community Question Answering, which we offered in English and Arabic.
no code implementations • SEMEVAL 2015 • Preslav Nakov, Lluís Màrquez, Walid Magdy, Alessandro Moschitti, James Glass, Bilal Randeree
Community Question Answering (cQA) provides new interesting research directions to the traditional Question Answering (QA) field, e. g., the exploitation of the interaction between users and the structure of related posts.
1 code implementation • ICLR 2020 • David Harwath, Wei-Ning Hsu, James Glass
What differentiates this paper from prior work on speech unit learning is the choice of training objective.
no code implementations • WS 2019 • Wei Fang, Moin Nadeem, Mitra Mohtarami, James Glass
We present a multi-task learning model that leverages large amount of textual information from existing datasets to improve stance prediction.
no code implementations • CL 2020 • Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass
(iii) Do the representations capture lexical semantics?
2 code implementations • 23 Oct 2019 • Yu-An Chung, James Glass
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.
no code implementations • 16 Oct 2019 • Tianxing He, Jun Liu, Kyunghyun Cho, Myle Ott, Bing Liu, James Glass, Fuchun Peng
We find that mix-review effectively regularizes the finetuning process, and the forgetting problem is alleviated to some extent.
no code implementations • IJCNLP 2019 • Mitra Mohtarami, James Glass, Preslav Nakov
In particular, we introduce a novel contrastive language adaptation approach applied to memory networks, which ensures accurate alignment of stances in the source and target languages, and can effectively deal with the challenge of limited labeled data in the target language.
no code implementations • IJCNLP 2019 • Yifan Zhang, Giovanni Da San Martino, Alberto Barrón-Cedeño, Salvatore Romeo, Jisun An, Haewoon Kwak, Todor Staykovski, Israa Jaradat, Georgi Karadzhov, Ramy Baly, Kareem Darwish, James Glass, Preslav Nakov
We introduce Tanbih, a news aggregator with intelligent analysis tools to help readers understanding what's behind a news story.
no code implementations • 26 Sep 2019 • Sameer Khurana, Ahmed Ali, James Glass
We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube.
1 code implementation • 4 Aug 2019 • Pepa Atanasova, Preslav Nakov, Lluís Màrquez, Alberto Barrón-Cedeño, Georgi Karadzhov, Tsvetomila Mihaylova, Mitra Mohtarami, James Glass
We study the problem of automatic fact-checking, paying special attention to the impact of contextual and discourse information.
1 code implementation • 9 Jul 2019 • Yonatan Belinkov, Ahmed Ali, James Glass
End-to-end neural network systems for automatic speech recognition (ASR) are trained from acoustic features to text transcriptions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 9 Jul 2019 • Wei-Ning Hsu, David Harwath, James Glass
Transfer learning aims to reduce the amount of data required to excel at a new task by re-using the knowledge acquired from learning other related tasks.
no code implementations • 17 Jun 2019 • Wei Fang, Yu-An Chung, James Glass
For an input text, it is simultaneously passed into BERT and the Tacotron-2 encoder.
no code implementations • NAACL 2019 • Moin Nadeem, Wei Fang, Brian Xu, Mitra Mohtarami, James Glass
We present FAKTA which is a unified framework that integrates various components of a fact checking process: document retrieval from media sources with various types of reliability, stance detection of documents with respect to given claims, evidence extraction, and linguistic analysis.
1 code implementation • ACL 2019 • Hongyin Luo, Lan Jiang, Yonatan Belinkov, James Glass
In this work, we propose a method that improves language modeling by learning to align the given context and the following phrase.
Ranked #12 on
Language Modelling
on WikiText-103
no code implementations • EMNLP 2021 • Tianxing He, Jingzhao Zhang, Zhiming Zhou, James Glass
Exposure bias has been regarded as a central problem for auto-regressive language models (LM).
no code implementations • 11 May 2019 • Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass
There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • ICLR 2019 • Hongyin Luo, Yichen Li, Jie Fu, James Glass
Recently, there have been some attempts to use non-recurrent neural models for language modeling.
no code implementations • 7 Apr 2019 • Suwon Shon, Hao Tang, James Glass
In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.
no code implementations • SEMEVAL 2019 • Abdelrhman Saleh, Ramy Baly, Alberto Barrón-Cedeño, Giovanni Da San Martino, Mitra Mohtarami, Preslav Nakov, James Glass
In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection.
5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass
This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.
no code implementations • NAACL 2019 • Ramy Baly, Georgi Karadzhov, Abdelrhman Saleh, James Glass, Preslav Nakov
In the context of fake news, bias, and propaganda, we study two important but relatively under-explored problems: (i) trustworthiness estimation (on a 3-point scale) and (ii) political ideology detection (left/right bias on a 7-point scale) of entire news outlets, as opposed to evaluating individual articles.
1 code implementation • ACL 2020 • Tianxing He, James Glass
Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses.
no code implementations • 21 Feb 2019 • David Harwath, James Glass
In this paper, we investigate the manner in which interpretable sub-word speech units emerge within a convolutional neural network model trained to associate raw speech waveforms with semantically related natural image scenes.
no code implementations • 6 Feb 2019 • Brian Xu, Mitra Mohtarami, James Glass
This paper studies the problem of stance detection which aims to predict the perspective (or stance) of a given document with respect to a given claim.
1 code implementation • 21 Dec 2018 • Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass
We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models?
2 code implementations • 21 Dec 2018 • Fahim Dalvi, Avery Nortonsmith, D. Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, James Glass
We present a toolkit to facilitate the interpretation and understanding of neural network models.
no code implementations • TACL 2019 • Yonatan Belinkov, James Glass
The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems.
no code implementations • 4 Dec 2018 • Suwon Shon, Ahmed Ali, James Glass
An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition.
no code implementations • 27 Nov 2018 • Suwon Shon, Tae-Hyun Oh, James Glass
In this paper, we present a multi-modal online person verification system using both speech and visual signals.
no code implementations • 4 Nov 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass
We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.
no code implementations • ICLR 2019 • Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass
Neural machine translation (NMT) models learn representations containing substantial linguistic information.
no code implementations • 31 Oct 2018 • Hao Tang, James Glass
In addition, we study three types of inductive bias, leveraging a pronunciation dictionary, word boundary annotations, and constraints on word durations.
2 code implementations • EMNLP 2018 • Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, Preslav Nakov
We present a study on predicting the factuality of reporting and bias of news media.
1 code implementation • 12 Sep 2018 • Suwon Shon, Hao Tang, James Glass
In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.
no code implementations • 12 Sep 2018 • Suwon Shon, Wei-Ning Hsu, James Glass
In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID).
no code implementations • ICLR 2019 • Tianxing He, James Glass
We adopt an empirical methodology, in which we first create lists of egregious output sequences, and then design a discrete optimization algorithm to find input sequences that will cause the model to generate them.
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
1 code implementation • 17 Jul 2018 • Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass
The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.
Audio and Speech Processing Sound
no code implementations • 9 Jul 2018 • Hao Tang, James Glass
In this paper, we study recurrent networks' ability to learn long-term dependency in the context of speech recognition.
no code implementations • 13 Jun 2018 • Wei-Ning Hsu, Hao Tang, James Glass
However, it is relatively inexpensive to collect large amounts of unlabeled data from domains that we want the models to generalize to.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 13 Jun 2018 • Hao Tang, Wei-Ning Hsu, Francois Grondin, James Glass
Speech recognizers trained on close-talking speech do not generalize to distant speech and the word error rate degradation can be as large as 40% absolute.
no code implementations • NAACL 2018 • Tuka Al Hanai, Rhoda Au, James Glass
Neuropsychological examinations are an important screening tool for the presence of cognitive conditions (e. g. Alzheimer{'}s, Parkinson{'}s Disease), and require a trained tester to conduct the exam through spoken interactions with the subject.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 29 May 2018 • Wei-Ning Hsu, James Glass
In this paper, we present a partitioned variational autoencoder (PVAE) and several training objectives to learn disentangled representations, which encode not only the shared factors, but also modality-dependent ones, into separate latent variables.
no code implementations • NeurIPS 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass
Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • NAACL 2018 • Adam Poliak, Yonatan Belinkov, James Glass, Benjamin Van Durme
We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena.
no code implementations • NAACL 2018 • Ramy Baly, Mitra Mohtarami, James Glass, Lluis Marquez, Alessandro Moschitti, Preslav Nakov
A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e. g., news websites, social media, etc.
no code implementations • NAACL 2018 • Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluis Marquez, Alessandro Moschitti
We present a novel end-to-end memory network for stance detection, which jointly (i) predicts whether a document agrees, disagrees, discusses or is unrelated with respect to a given target claim, and also (ii) extracts snippets of evidence for that prediction.
Ranked #6 on
Fake News Detection
on FNC-1
2 code implementations • 9 Apr 2018 • Wei-Ning Hsu, James Glass
Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.
no code implementations • 9 Apr 2018 • David Harwath, Galen Chuang, James Glass
In this paper, we explore the learning of neural network embeddings for natural images and speech waveforms describing the content of those images.
no code implementations • ECCV 2018 • David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba, James Glass
In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to.
2 code implementations • 23 Mar 2018 • Yu-An Chung, James Glass
In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.
2 code implementations • 12 Mar 2018 • Suwon Shon, Ahmed Ali, James Glass
Although the Siamese network with language embeddings did not achieve as good a result as the end-to-end DID system, the two approaches had good synergy when combined together in a fused system.
Sound Audio and Speech Processing
3 code implementations • 8 Mar 2018 • Tsvetomila Mihaylova, Preslav Nakov, Lluis Marquez, Alberto Barron-Cedeno, Mitra Mohtarami, Georgi Karadzhov, James Glass
Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information.
no code implementations • 7 Mar 2018 • Wei-Ning Hsu, James Glass
The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • IJCNLP 2017 • Yonatan Belinkov, Lluís Màrquez, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass
In this paper, we investigate the representations learned at different layers of NMT encoders.
no code implementations • 11 Dec 2017 • Kenneth Leidal, David Harwath, James Glass
In this paper, we explore the unsupervised learning of a semantic embedding space for co-occurring sensory inputs.
no code implementations • NAACL 2018 • Yu-An Chung, Hung-Yi Lee, James Glass
Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.
no code implementations • 5 Nov 2017 • Yu-An Chung, James Glass
In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.
1 code implementation • 20 Oct 2017 • Tuka Alhanai, Rhoda Au, James Glass
In this study we developed an automated system that evaluates speech and language features from audio recordings of neuropsychological examinations of 92 subjects in the Framingham Heart Study.
3 code implementations • NeurIPS 2017 • Wei-Ning Hsu, Yu Zhang, James Glass
We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • NeurIPS 2017 • Yonatan Belinkov, James Glass
In this work, we analyze the speech representations learned by a deep end-to-end model that is based on convolutional and recurrent layers, and trained with a connectionist temporal classification (CTC) loss.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 28 Aug 2017 • Suwon Shon, Ahmed Ali, James Glass
In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches.
no code implementations • 19 Jul 2017 • Wei-Ning Hsu, Yu Zhang, James Glass
Research on robust speech recognition can be regarded as trying to overcome this domain mismatch issue.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 13 Apr 2017 • Wei-Ning Hsu, Yu Zhang, James Glass
In this paper, we apply a convolutional VAE to model the generative process of natural speech.
1 code implementation • ACL 2017 • Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James Glass
Neural machine translation (MT) models obtain state-of-the-art performance while maintaining a simple, end-to-end architecture.
2 code implementations • 23 Feb 2017 • Hongyin Luo, Jie Fu, James Glass
However, it has been argued that this is not biologically plausible because back-propagating error signals with the exact incoming weights are not considered possible in biological neural systems.
no code implementations • NeurIPS 2016 • David Harwath, Antonio Torralba, James Glass
Humans learn to speak before they can read or write, so why can’t computers do the same?
no code implementations • COLING 2016 • Salvatore Romeo, Giovanni Da San Martino, Alberto Barr{\'o}n-Cede{\~n}o, Aless Moschitti, ro, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Mitra Mohtarami, James Glass
In real-world data, e. g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms.
no code implementations • 25 Sep 2016 • Yonatan Belinkov, James Glass
Machine translation between Arabic and Hebrew has so far been limited by a lack of parallel corpora, despite the political and cultural importance of this language pair.
1 code implementation • WS 2016 • Yonatan Belinkov, James Glass
Discriminating between closely-related language varieties is considered a challenging and important task.
no code implementations • 19 Sep 2016 • Ahmed Ali, Peter Bell, James Glass, Yacine Messaoui, Hamdy Mubarak, Steve Renals, Yifan Zhang
For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera. net for a 10 year duration 2000-2011.
no code implementations • 23 Mar 2016 • Wei-Ning Hsu, Yu Zhang, James Glass
We apply a general recurrent neural network (RNN) encoder framework to community question answering (cQA) tasks.
no code implementations • 11 Nov 2015 • David Harwath, James Glass
In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities.
no code implementations • 30 Oct 2015 • Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass
In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.
no code implementations • 30 Oct 2015 • Yu Zhang, Ekapol Chuangsuwanich, James Glass, Dong Yu
In this paper, we investigate the use of prediction-adaptation-correction recurrent neural networks (PAC-RNNs) for low-resource speech recognition.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
no code implementations • TACL 2015 • Chia-Ying Lee, Timothy J. O{'}Donnell, James Glass
We present a model of unsupervised phonological lexicon discovery{---}the problem of simultaneously learning phoneme-like and word-like units from acoustic input.