Search Results for author: Benjamin Elizalde

Found 15 papers, 5 papers with code

PAM: Prompting Audio-Language Models for Audio Quality Assessment

1 code implementation1 Feb 2024 Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.

Music Generation Text-to-Music Generation

Prompting Audios Using Acoustic Properties For Emotion Representation

no code implementations3 Oct 2023 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs.

Contrastive Learning Retrieval +1

Pengi: An Audio Language Model for Audio Tasks

1 code implementation NeurIPS 2023 Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming Wang

We introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks.

Audio captioning Audio Question Answering +6

Synergy between human and machine approaches to sound/scene recognition and processing: An overview of ICASSP special session

no code implementations20 Feb 2023 Laurie M. Heller, Benjamin Elizalde, Bhiksha Raj, Soham Deshmukh

Machine Listening, as usually formalized, attempts to perform a task that is, from our perspective, fundamentally human-performable, and performed by humans.

Scene Recognition

Describing emotions with acoustic property prompts for speech emotion recognition

no code implementations14 Nov 2022 Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.

Retrieval Speech Emotion Recognition

Audio Retrieval with WavText5K and CLAP Training

1 code implementation28 Sep 2022 Soham Deshmukh, Benjamin Elizalde, Huaming Wang

In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval.

AudioCaps Audio captioning +3

Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

no code implementations20 Feb 2020 Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier

State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i. e., multiple sound events occur in one recording).

Retrieval

Framework for evaluation of sound event detection in web videos

no code implementations2 Nov 2017 Rohan Badlani, Ankit Shah, Benjamin Elizalde, Anurag Kumar, Bhiksha Raj

The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets.

Event Detection Sound Event Detection

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

no code implementations13 Jul 2016 Sebastian Sager, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, Ian Lane

One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels.

TAG

City-Identification of Flickr Videos Using Semantic Acoustic Features

no code implementations12 Jul 2016 Benjamin Elizalde, Guan-Lin Chao, Ming Zeng, Ian Lane

In particular, we present a method to compute and use semantic acoustic features to perform city-identification and the features show semantic evidence of the identification.

The YLI-MED Corpus: Characteristics, Procedures, and Plans

no code implementations13 Mar 2015 Julia Bernd, Damian Borth, Benjamin Elizalde, Gerald Friedland, Heather Gallagher, Luke Gottlieb, Adam Janin, Sara Karabashlieva, Jocelyn Takahashi, Jennifer Won

The YLI Multimedia Event Detection corpus is a public-domain index of videos with annotations and computed features, specialized for research in multimedia event detection (MED), i. e., automatically identifying what's happening in a video by analyzing the audio and visual content.

Descriptive Event Detection

YFCC100M: The New Data in Multimedia Research

2 code implementations5 Mar 2015 Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, Li-Jia Li

We present the Yahoo Flickr Creative Commons 100 Million Dataset (YFCC100M), the largest public multimedia collection that has ever been released.

Multimedia Computers and Society H.3.7

Cannot find the paper you are looking for? You can Submit a new open access paper.