Search Results for author: Benjamin Elizalde

Found 15 papers, 5 papers with code

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation • 1 Feb 2024 • Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

Paper
Code

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations • 3 Oct 2023 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

Paper
Add Code

Training Audio Captioning Models without Audio

1 code implementation • 14 Sep 2023 • Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang

During inference, the text encoder is replaced with the pretrained CLAP audio encoder.

Audio captioning

Paper
Code

Pengi: An Audio Language Model for Audio Tasks

1 code implementation • NeurIPS 2023 • Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang

We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.

Audio captioning Audio Question Answering +6

247

Paper
Code

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

no code implementations • 20 Feb 2023 • Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, Soham Deshmukh

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans.

Scene Recognition

Paper
Add Code

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations • 14 Nov 2022 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

Paper
Add Code

Audio Retrieval with WavText5K and CLAP Training

1 code implementation • 28 Sep 2022 • Soham Deshmukh, Benjamin Elizalde, Huaming Wang

In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval.

AudioCaps Audio captioning +3

Paper
Code

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

no code implementations • 20 Feb 2020 • Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier

State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i. e., multiple sound events occur in one recording).

Retrieval

Paper
Add Code

NELS-Never-Ending Learner of Sounds

no code implementations • NIPS Workshop on Machine Learning for Audio 2018 • Benjamin Elizalde, Rohan Badlani, Ankit Shah, Anurag Kumar, and Bhiksha Raj.

Sounds are essential to how humans perceive and interact with the world.

Retrieval

Paper
Add Code

Framework for evaluation of sound event detection in web videos

no code implementations • 2 Nov 2017 • Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

Paper
Add Code

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations • 20 Sep 2016 • Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Paper
Add Code

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

no code implementations • 13 Jul 2016 • Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels.

TAG

Paper
Add Code

City-Identification of Flickr Videos Using Semantic Acoustic Features

no code implementations • 12 Jul 2016 • Benjamin Elizalde, Guan-Lin Chao, Ming Zeng, Ian Lane

In particular, we present a method to compute and use semantic acoustic features to perform city-identification and the features show semantic evidence of the identification.

Paper
Add Code

The YLI-MED Corpus: Characteristics, Procedures, and Plans

no code implementations • 13 Mar 2015 • Julia Bernd, Damian Borth, Benjamin Elizalde, Gerald Friedland, Heather Gallagher, Luke Gottlieb, Adam Janin, Sara Karabashlieva, Jocelyn Takahashi, Jennifer Won

The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i. e., automatically identifying what's happening in a video by analyzing the audio and visual content.

Descriptive Event Detection

Paper
Add Code

YFCC100M: The New Data in Multimedia Research

2 code implementations • 5 Mar 2015 • Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, Li-Jia Li

We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released.

Multimedia Computers and Society H.3.7

208

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.