no code implementations • 16 Jan 2024 • Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak
We introduce multi-phase training of audio spectrogram transformers by connecting the seminal idea of coarse-to-fine with transformer models.
1 code implementation • 7 Nov 2023 • Sooyoung Park, Arda Senocak, Joon Son Chung
Large-scale pre-trained image-text models demonstrate remarkable versatility across diverse tasks, benefiting from their robust representational capabilities and effective multimodal alignment.
no code implementations • ICCV 2023 • Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung
However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for genuine sound source localization.
no code implementations • 18 Jul 2023 • Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak
To overcome this limitation, this paper proposes a training procedure to provide flexibility to standard AST models without architectural changes, allowing them to work with various patch sizes at the inference stage - FlexiAST.
no code implementations • 30 Mar 2023 • Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung
The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective.
no code implementations • CVPR 2023 • Kim Sung-Bin, Arda Senocak, Hyunwoo Ha, Andrew Owens, Tae-Hyun Oh
The key idea is to enrich the audio features with visual information by learning to align audio to visual latent space.
no code implementations • 3 Nov 2022 • Sooyoung Park, Arda Senocak, Joon Son Chung
Furthermore, we demonstrate that the introduction of a negative margin to existing methods results in a consistent improvement in performance.
no code implementations • 12 Feb 2022 • Arda Senocak, Junsik Kim, Tae-Hyun Oh, Hyeonggon Ryu, DIngzeyu Li, In So Kweon
Human brain is continuously inundated with the multisensory information and their complex interactions coming from the outside world at any given moment.
no code implementations • 7 Feb 2022 • Arda Senocak, Hyeonggon Ryu, Junsik Kim, In So Kweon
Thus, these semantically correlated pairs, "hard positives", are mistakenly grouped as negatives.
1 code implementation • 20 Nov 2019 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon
Visual events are usually accompanied by sounds in our daily lives.
no code implementations • CVPR 2018 • Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, In So Kweon
We show that even with a few supervision, false conclusion is able to be corrected and the source of sound in a visual scene can be localized effectively.