Search Results for author: Haosen Yang

Found 12 papers, 4 papers with code

FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation

no code implementations5 Sep 2024 Xi Chen, Haosen Yang, Sheng Jin, Xiatian Zhu, Hongxun Yao

To fully exploit pre-trained knowledge while minimizing training overhead, we freeze both foundation models, focusing optimization efforts solely on a lightweight transformer decoder for mask proposal generation-the performance bottleneck.

Decoder Segmentation

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

no code implementations13 Jun 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source.

Audio Synthesis

Gaussian Splatting with Localized Points Management

no code implementations6 Jun 2024 Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu

To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration.

Management

Unsupervised Audio-Visual Segmentation with Modality Alignment

no code implementations21 Mar 2024 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiangkang Deng, Xiatian Zhu

Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound.

Contrastive Learning

Uncertainty-Aware Pseudo-Label Filtering for Source-Free Unsupervised Domain Adaptation

1 code implementation17 Mar 2024 Xi Chen, Haosen Yang, Huicong Zhang, Hongxun Yao, Xiatian Zhu

Source-free unsupervised domain adaptation (SFUDA) aims to enable the utilization of a pre-trained source model in an unlabeled target domain without access to source data.

Contrastive Learning Memorization +3

WSI-SAM: Multi-resolution Segment Anything Model (SAM) for histopathology whole-slide images

no code implementations14 Mar 2024 Hong Liu, Haosen Yang, Paul J. van Diest, Josien P. W. Pluim, Mitko Veta

In particular, our model outperforms SAM by 4. 1 and 2. 5 percent points on a ductal carcinoma in situ (DCIS) segmentation tasks and breast cancer metastasis segmentation task (CAMELYON16 dataset).

Decoder Segmentation +2

Optimization Efficient Open-World Visual Region Recognition

1 code implementation2 Nov 2023 Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu

Building on the success of powerful image-level vision-language (ViL) foundation models like CLIP, recent efforts have sought to harness their capabilities by either training a contrastive model from scratch with an extensive collection of region-label pairs or aligning the outputs of a detection model with image-level representations of region proposals.

object-detection Object Recognition +1

Leveraging Foundation models for Unsupervised Audio-Visual Segmentation

no code implementations13 Sep 2023 Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Xiatian Zhu

Particularly, in situations where existing supervised AVS methods struggle with overlapping foreground objects, our models still excel in accurately segmenting overlapped auditory objects.

Segmentation

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

1 code implementation9 Oct 2022 Haosen Yang, Deng Huang, Bin Wen, Jiannan Wu, Hongxun Yao, Yi Jiang, Xiatian Zhu, Zehuan Yuan

As a result, our model can extract effectively both static appearance and dynamic motion spontaneously, leading to superior spatiotemporal representation learning capability.

Representation Learning Semantic Segmentation +2

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations21 Jul 2022 Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Action Recognition Video Classification +1

Temporal Action Proposal Generation with Background Constraint

1 code implementation15 Dec 2021 Haosen Yang, Wenhao Wu, Lining Wang, Sheng Jin, Boyang xia, Hongxun Yao, Hujie Huang

To evaluate the confidence of proposals, the existing works typically predict action score of proposals that are supervised by the temporal Intersection-over-Union (tIoU) between proposal and the ground-truth.

Temporal Action Proposal Generation

Temporal Action Proposal Generation with Transformers

no code implementations25 May 2021 Lining Wang, Haosen Yang, Wenhao Wu, Hongxun Yao, Hujie Huang

Conventionally, the temporal action proposal generation (TAPG) task is divided into two main sub-tasks: boundary prediction and proposal confidence prediction, which rely on the frame-level dependencies and proposal-level relationships separately.

Temporal Action Proposal Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.