Search Results for author: Jinxiang Liu

Found 9 papers, 0 papers with code

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

no code implementations23 Apr 2024 Haozhe Cheng, Cheng Ju, Haicheng Wang, Jinxiang Liu, Mengting Chen, Qiang Hu, Xiaoyun Zhang, Yanfeng Wang

The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising.

Audio-Visual Segmentation via Unlabeled Frame Exploitation

no code implementations17 Mar 2024 Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang

NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.

valid

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

no code implementations25 Jul 2023 Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang

The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.

Segmentation

Annotation-free Audio-Visual Segmentation

no code implementations18 May 2023 Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya zhang, Weidi Xie

The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks.

Image Segmentation Segmentation +1

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

no code implementations17 Mar 2023 Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang

However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.

Object Object Discovery +1

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

no code implementations20 Feb 2023 Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Ya zhang, Peisen Zhao, Jianlong Chang, Qi Tian

Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos.

Sentence Temporal Sentence Grounding

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

no code implementations26 Jun 2022 Jinxiang Liu, Chen Ju, Weidi Xie, Ya zhang

We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos.

Cross-Modal Retrieval Representation Learning +1

A 3D Mesh-based Lifting-and-Projection Network for Human Pose Transfer

no code implementations24 Sep 2021 Jinxiang Liu, Yangheng Zhao, Siheng Chen, Ya zhang

To leverage the human body shape prior, LPNet exploits the topological information of the body mesh to learn an expressive visual representation for the target person in the 3D mesh space.

Image-to-Image Translation Pose Transfer +1

Cannot find the paper you are looking for? You can Submit a new open access paper.