1 code implementation • 16 Jun 2023 • Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen
Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.
no code implementations • 8 Nov 2022 • Huang Xie, Okko Räsänen, Tuomas Virtanen
With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling.
no code implementations • 20 Sep 2022 • Huang Xie, Samuel Lipping, Tuomas Virtanen
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.
1 code implementation • 13 Jun 2022 • Huang Xie, Samuel Lipping, Tuomas Virtanen
Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset.
no code implementations • 10 Jun 2022 • Duygu Dogan, Huang Xie, Toni Heittola, Tuomas Virtanen
The results show that the classification performance is highly sensitive to the semantic relation between test and training classes and textual and image embeddings can reach up to the semantic acoustic embeddings when the seen and unseen classes are semantically similar.
1 code implementation • 6 Oct 2021 • Huang Xie, Okko Räsänen, Konstantinos Drossos, Tuomas Virtanen
We investigate unsupervised learning of correspondences between sound events and textual phrases through aligning audio clips with textual captions describing the content of a whole audio clip.
no code implementations • 25 Nov 2020 • Huang Xie, Okko Räsänen, Tuomas Virtanen
In this paper, we study zero-shot learning in audio classification through factored linear and nonlinear acoustic-semantic projections between audio instances and sound classes.
no code implementations • 24 Nov 2020 • Huang Xie, Tuomas Virtanen
The experimental results show that classification performance is significantly improved by involving sound classes that are semantically close to the test classes in training.
no code implementations • 6 May 2019 • Huang Xie, Tuomas Virtanen
We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings.