Search Results for author: Khazar Khorrami

Found 5 papers, 3 papers with code

Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

1 code implementation16 Jun 2023 Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen

Conversely, the results suggest that using only binary relevances defined by captioning-based audio-caption pairs is sufficient for contrastive learning.

Audio captioning Contrastive Learning +1

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

no code implementations5 Jun 2023 Khazar Khorrami, María Andrea Cruz Blandón, Tuomas Virtanen, Okko Räsänen

As a result, we find that sequential training with wav2vec 2. 0 first and VGS next provides higher performance on audio-visual retrieval compared to simultaneous optimization of both learning mechanisms.

Multi-Task Learning Representation Learning +3

Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models

1 code implementation5 Jul 2021 Khazar Khorrami, Okko Räsänen

We compare the alignment performance using our proposed evaluation metrics to the semantic retrieval task commonly used to evaluate VGS models.

Cross-Modal Retrieval Object Localization +2

A computational model of early language acquisition from audiovisual experiences of young infants

no code implementations24 Jun 2019 Okko Räsänen, Khazar Khorrami

Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech.

Language Acquisition

Cannot find the paper you are looking for? You can Submit a new open access paper.