no code implementations • 22 Mar 2024 • Zhonghua Zhai, Chen Ju, Jinsong Lan, Shuai Xiao
In this work, we propose Cell Variational Information Bottleneck Network (cellVIB), a convolutional neural network using information bottleneck mechanism, which can be combined with the latest feedforward network architecture in an end-to-end training method.
no code implementations • 19 Mar 2024 • Mengting Chen, Xi Chen, Zhonghua Zhai, Chen Ju, Xuewen Hong, Jinsong Lan, Shuai Xiao
This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way.
no code implementations • 17 Mar 2024 • Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang
NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.
no code implementations • 12 Dec 2023 • Chen Ju, Haicheng Wang, Zeqian Li, Xu Chen, Zhonghua Zhai, Weilin Huang, Shuai Xiao
Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the impressive performance.
no code implementations • 30 Nov 2023 • Xu Chen, Zida Cheng, Jiangchao Yao, Chen Ju, Weilin Huang, Jinsong Lan, Xiaoyi Zeng, Shuai Xiao
Later the augmentation network employs the explicit cross-domain knowledge as augmented information to boost the target domain CTR prediction.
no code implementations • NeurIPS 2023 • Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Ya zhang, Yanfeng Wang
The results show the superior performance of attribute decomposition-aggregation.
no code implementations • 25 Jul 2023 • Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang
The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.
no code implementations • 5 Jul 2023 • Yuhuan Yang, Chaofan Ma, Chen Ju, Ya zhang, Yanfeng Wang
In this paper, we define a unified setting termed as open-set semantic segmentation (O3S), which aims to learn seen and unseen semantics from both visual examples and textual names.
no code implementations • 18 May 2023 • Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya zhang, Weidi Xie
The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks.
no code implementations • 6 May 2023 • Zida Cheng, Chen Ju, Xu Chen, Zhonghua Zhai, Shuai Xiao, Xiaoyi Zeng, Weilin Huang
We formally define a novel valuable information retrieval task: image-to-multi-modal-retrieval (IMMR), where the query is an image and the doc is an entity with both image and textual description.
no code implementations • 21 Mar 2023 • Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie
In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.
no code implementations • 17 Mar 2023 • Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang
However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.
no code implementations • 20 Feb 2023 • Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Ya zhang, Peisen Zhao, Jianlong Chang, Qi Tian
Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos.
no code implementations • CVPR 2023 • Chen Ju, Kunhao Zheng, Jinxiang Liu, Peisen Zhao, Ya zhang, Jianlong Chang, Yanfeng Wang, Qi Tian
And as a result, the dual-branch complementarity is effectively fused to promote a strong alliance.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
no code implementations • 26 Jun 2022 • Jinxiang Liu, Chen Ju, Weidi Xie, Ya zhang
We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos.
1 code implementation • 8 Dec 2021 • Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie
Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.
Ranked #5 on Zero-Shot Action Detection on ActivityNet-1.3
no code implementations • 6 Apr 2021 • Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Xiaoyun Zhang, Qi Tian
To solve this issue, we introduce an adaptive mutual supervision framework (AMS) with two branches, where the base branch adopts CAS to localize the most discriminative action regions, while the supplementary branch localizes the less discriminative action regions through a novel adaptive sampler.
Ranked #7 on Weakly Supervised Action Localization on THUMOS14
Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1
no code implementations • ICCV 2021 • Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian
Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
no code implementations • 15 Dec 2020 • Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian
Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Ranked #3 on Weakly Supervised Action Localization on BEOID
1 code implementation • ECCV 2020 • Peisen Zhao, Lingxi Xie, Chen Ju, Ya zhang, Yan-Feng Wang, Qi Tian
To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases.