Search Results for author: Pedro Morgado

Found 17 papers, 11 papers with code

Audio-Synchronized Visual Animation

no code implementations8 Mar 2024 Lin Zhang, Shentong Mo, Yijing Zhang, Pedro Morgado

We hope our established benchmark can open new avenues for controllable visual generation.

Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling

1 code implementation CVPR 2024 Shentong Mo, Pedro Morgado

Thus, to address the computational complexity, we propose an alternative procedure that factorizes the local representations before representing audio-visual interactions.

A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition

1 code implementation30 May 2023 Shentong Mo, Pedro Morgado

The ability to accurately recognize, localize and separate sound sources is fundamental to any audio-visual perception task.

audio-visual learning

A Closer Look at Weakly-Supervised Audio-Visual Source Localization

1 code implementation30 Aug 2022 Shentong Mo, Pedro Morgado

We also propose a new approach for visual sound source localization that addresses both these problems.

The Challenges of Continuous Self-Supervised Learning

no code implementations23 Mar 2022 Senthil Purushwalkam, Pedro Morgado, Abhinav Gupta

As a result, SSL holds the promise to learn representations from data in-the-wild, i. e., without the need for finite and static datasets.

Representation Learning Self-Supervised Learning

Localizing Visual Sounds the Easy Way

1 code implementation17 Mar 2022 Shentong Mo, Pedro Morgado

Unsupervised audio-visual source localization aims at localizing visible sound sources in a video without relying on ground-truth localization for training.

Robust Audio-Visual Instance Discrimination

no code implementations CVPR 2021 Pedro Morgado, Ishan Misra, Nuno Vasconcelos

Second, since self-supervised contrastive learning relies on random sampling of negative instances, instances that are semantically similar to the base instance can be used as faulty negatives.

Action Recognition Contrastive Learning +2

Learning Representations from Audio-Visual Spatial Alignment

no code implementations NeurIPS 2020 Pedro Morgado, Yi Li, Nuno Vasconcelos

To learn from these spatial cues, we tasked a network to perform contrastive audio-visual spatial alignment of 360{\deg} video and spatial audio.

Action Recognition Representation Learning +2

Deep Hashing with Hash-Consistent Large Margin Proxy Embeddings

no code implementations27 Jul 2020 Pedro Morgado, Yunsheng Li, Jose Costa Pereira, Mohammad Saberian, Nuno Vasconcelos

The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity, and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced.

Binarization Classification +2

Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier

1 code implementation ECCV 2020 Tz-Ying Wu, Pedro Morgado, Pei Wang, Chih-Hui Ho, Nuno Vasconcelos

Motivated by this, a deep realistic taxonomic classifier (Deep-RTC) is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.

NetTailor: Tuning the Architecture, Not Just the Weights

1 code implementation CVPR 2019 Pedro Morgado, Nuno Vasconcelos

Under the standard paradigm of network fine-tuning, an entirely new CNN is learned per task, and the final network size is independent of task complexity.

Continual Learning Object Recognition +2

Self-Supervised Generation of Spatial Audio for 360° Video

no code implementations NeurIPS 2018 Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, Oliver Wang

We introduce an approach to convert mono audio recorded by a 360° video camera into spatial audio, a representation of the distribution of sound over the full viewing sphere.

Self-Supervised Generation of Spatial Audio for 360 Video

1 code implementation7 Sep 2018 Pedro Morgado, Nuno Vasconcelos, Timothy Langlois, Oliver Wang

Using our approach, we show that it is possible to infer the spatial location of sound sources based only on 360 video and a mono audio track.

Cannot find the paper you are looking for? You can Submit a new open access paper.