Search Results for author: Aren Jansen

Found 24 papers, 7 papers with code

Towards Learning a Universal Non-Semantic Representation of Speech

1 code implementation • 25 Feb 2020 • Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Felix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv

The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a pre-existing embedding model trained for different datasets or tasks.

Transfer Learning

32,753

Paper
Code

MusicLM: Generating Music From Text

3 code implementations • 26 Jan 2023 • Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank

We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff".

Ranked #8 on Text-to-Music Generation on MusicCaps

Music Generation Text-to-Music Generation

19,561

Paper
Code

MuLan: A Joint Embedding of Music Audio and Natural Language

1 code implementation • 26 Aug 2022 • Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P. W. Ellis

Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries.

Cross-Modal Retrieval Music Tagging +2

3,001

Paper
Code

Attention Bottlenecks for Multimodal Fusion

1 code implementation • NeurIPS 2021 • Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun

Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio.

Ranked #2 on Action Classification on Kinetics-Sounds

Action Classification Action Recognition +2

2,988

Paper
Code

CNN Architectures for Large-Scale Audio Classification

16 code implementations • 29 Sep 2016 • Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.

Audio Classification Event Detection +1

2,972

Paper
Code

Self-Supervised Learning from Automatically Separated Sound Scenes

1 code implementation • 5 May 2021 • Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings.

Contrastive Learning Self-Supervised Learning

Paper
Code

A segmental framework for fully-unsupervised large-vocabulary speech recognition

5 code implementations • 22 Jun 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater

We also show that the discovered clusters can be made less speaker- and gender-specific by using an unsupervised autoencoder-like feature extractor to learn better frame-level features (prior to embedding).

Language Modelling Speech Recognition +1

Paper
Code

Unsupervised Learning of Semantic Audio Representations

no code implementations • 6 Nov 2017 • Aren Jansen, Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous

Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds.

Ranked #41 on Audio Classification on AudioSet

Audio Classification General Classification +1

Paper
Add Code

Scalable Out-of-Sample Extension of Graph Embeddings Using Deep Neural Networks

no code implementations • 18 Aug 2015 • Aren Jansen, Gregory Sell, Vince Lyzinski

Several popular graph embedding techniques for representation learning and dimensionality reduction rely on performing computationally expensive eigendecompositions to derive a nonlinear transformation of the input data space.

Dimensionality Reduction Graph Embedding +2

Paper
Add Code

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

no code implementations • 9 Mar 2016 • Herman Kamper, Aren Jansen, Sharon Goldwater

In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text.

Language Acquisition Language Modelling +1

Paper
Add Code

Evaluating Low-Level Speech Features Against Human Perceptual Data

no code implementations • TACL 2017 • Caitlin Richter, Naomi H. Feldman, Harini Salgado, Aren Jansen

We introduce a method for measuring the correspondence between low-level speech features and human perception, using a cognitive model of speech perception implemented directly on speech recordings.

Automatic Speech Recognition (ASR) Representation Learning +1

Paper
Add Code

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

no code implementations • HLT 2015 • Douglas Oard, Jerome White, Aren Jansen, Rashmi Sankepally, Jiaul Paik

Information Retrieval Retrieval

Paper
Add Code

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

no code implementations • LREC 2014 • Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson, Emmanuel Dupoux

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint.

Language Acquisition

Paper
Add Code

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

no code implementations • 14 Nov 2019 • Aren Jansen, Daniel P. W. Ellis, Shawn Hershey, R. Channing Moore, Manoj Plakal, Ashok C. Popat, Rif A. Saurous

Humans do not acquire perceptual abilities in the way we train machines.

Active Learning Clustering +1

Paper
Add Code

Improving Universal Sound Separation Using Sound Classification

no code implementations • 18 Nov 2019 • Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis

Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification.

Audio Source Separation Classification +2

Paper
Add Code

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

no code implementations • 2 May 2020 • Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore, Xavier Serra

The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets.

Missing Labels

Paper
Add Code

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

no code implementations • ICLR 2021 • Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

For evaluation and semi-supervised experiments, we collected human labels for presence of on-screen and off-screen sounds on a small subset of clips.

Scene Understanding

Paper
Add Code

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation

no code implementations • 1 Jun 2021 • Scott Wisdom, Aren Jansen, Ron J. Weiss, Hakan Erdogan, John R. Hershey

The best performance is achieved using larger numbers of output sources, enabled by our efficient MixIT loss, combined with sparsity losses to prevent over-separation.

Paper
Add Code

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

no code implementations • 27 Sep 2021 • Yu Zhang, Daniel S. Park, Wei Han, James Qin, Anmol Gulati, Joel Shor, Aren Jansen, Yuanzhong Xu, Yanping Huang, Shibo Wang, Zongwei Zhou, Bo Li, Min Ma, William Chan, Jiahui Yu, Yongqiang Wang, Liangliang Cao, Khe Chai Sim, Bhuvana Ramabhadran, Tara N. Sainath, Françoise Beaufays, Zhifeng Chen, Quoc V. Le, Chung-Cheng Chiu, Ruoming Pang, Yonghui Wu

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio.

Ranked #1 on Speech Recognition on Common Voice

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

no code implementations • 9 Oct 2021 • Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech.

Paper
Add Code

Text-Driven Separation of Arbitrary Sounds

no code implementations • 12 Apr 2022 • Kevin Kilgour, Beat Gfeller, Qingqing Huang, Aren Jansen, Scott Wisdom, Marco Tagliasacchi

The second model, SoundFilter, takes a mixed source audio clip as an input and separates it based on a conditioning vector from the shared text-audio representation defined by SoundWords, making the model agnostic to the conditioning modality.

Paper
Add Code

MAQA: A Multimodal QA Benchmark for Negation

no code implementations • 9 Jan 2023 • Judith Yue Li, Aren Jansen, Qingqing Huang, Joonseok Lee, Ravi Ganti, Dima Kuzmin

Multimodal learning can benefit from the representation power of pretrained Large Language Models (LLMs).

Negation Question Answering

Paper
Add Code

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

no code implementations • 11 May 2023 • Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures.

Music Generation

Paper
Add Code

Dataset balancing can hurt model performance

no code implementations • 30 Jun 2023 • R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal

We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.