Search Results for author: Huang Xie

Found 9 papers, 3 papers with code

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

1 code implementation16 Jun 2023 Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen

Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.

Audio captioning Contrastive Learning +1

On Negative Sampling for Contrastive Audio-Text Retrieval

no code implementations8 Nov 2022 Huang Xie, Okko Räsänen, Tuomas Virtanen

With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling.

Audio to Text Retrieval Contrastive Learning +2

Language-based Audio Retrieval Task in DCASE 2022 Challenge

no code implementations20 Sep 2022 Huang Xie, Samuel Lipping, Tuomas Virtanen

Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.

Audio captioning Retrieval

Language-based Audio Retrieval Task in DCASE 2022 Challenge

1 code implementation13 Jun 2022 Huang Xie, Samuel Lipping, Tuomas Virtanen

Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.

Audio captioning Retrieval

Zero-Shot Audio Classification using Image Embeddings

no code implementations10 Jun 2022 Duygu Dogan, Huang Xie, Toni Heittola, Tuomas Virtanen

The results show that the classification performance is highly sensitive to the semantic relation between test and training classes and textual and image embeddings can reach up to the semantic acoustic embeddings when the seen and unseen classes are semantically similar.

Audio Classification Zero-shot Audio Classification +1

Unsupervised Audio-Caption Aligning Learns Correspondences between Individual Sound Events and Textual Phrases

1 code implementation6 Oct 2021 Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen

We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.

Event Detection Retrieval +1

Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

no code implementations25 Nov 2020 Huang Xie, Okko Räsänen, Tuomas Virtanen

In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes.

Audio Classification General Classification +2

Zero-Shot Audio Classification via Semantic Embeddings

no code implementations24 Nov 2020 Huang Xie, Tuomas Virtanen

The experimental results show that classification performance is significantly improved by involving sound classes that are semantically close to the test classes in training.

Audio Classification General Classification +4

Zero-Shot Audio Classification Based on Class Label Embeddings

no code implementations6 May 2019 Huang Xie, Tuomas Virtanen

We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings.

Audio Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.