Search Results for author: Arda Senocak

Found 11 papers, 2 papers with code

From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers

no code implementations16 Jan 2024 Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak

We introduce multi-phase training of audio spectrogram transformers by connecting the seminal idea of coarse-to-fine with transformer models.

Audio Classification

Can CLIP Help Sound Source Localization?

1 code implementation7 Nov 2023 Sooyoung Park, Arda Senocak, Joon Son Chung

Large-scale pre-trained image-text models demonstrate remarkable versatility across diverse tasks, benefiting from their robust representational capabilities and effective multimodal alignment.

audio-visual learning Contrastive Learning

Sound Source Localization is All about Cross-Modal Alignment

no code implementations ICCV 2023 Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for genuine sound source localization.

Cross-Modal Retrieval Retrieval

FlexiAST: Flexibility is What AST Needs

no code implementations18 Jul 2023 Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak

To overcome this limitation, this paper proposes a training procedure to provide flexibility to standard AST models without architectural changes, allowing them to work with various patch sizes at the inference stage - FlexiAST.

Audio Classification

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

no code implementations30 Mar 2023 Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung

The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective.

Cross-Modal Retrieval Retrieval

MarginNCE: Robust Sound Localization with a Negative Margin

no code implementations3 Nov 2022 Sooyoung Park, Arda Senocak, Joon Son Chung

Furthermore, we demonstrate that the introduction of a negative margin to existing methods results in a consistent improvement in performance.

Contrastive Learning

Audio-Visual Fusion Layers for Event Type Aware Video Recognition

no code implementations12 Feb 2022 Arda Senocak, Junsik Kim, Tae-Hyun Oh, Hyeonggon Ryu, DIngzeyu Li, In So Kweon

Human brain is continuously inundated with the multisensory information and their complex interactions coming from the outside world at any given moment.

Multi-Task Learning Video Recognition +1

Learning Sound Localization Better From Semantically Similar Samples

no code implementations7 Feb 2022 Arda Senocak, Hyeonggon Ryu, Junsik Kim, In So Kweon

Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives.

Contrastive Learning

Learning to Localize Sound Source in Visual Scenes

no code implementations CVPR 2018 Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon

We show that even with a few supervision, false conclusion is able to be corrected and the source of sound in a visual scene can be localized effectively.

Cannot find the paper you are looking for? You can Submit a new open access paper.