Search Results for author: Chen Ju

Found 20 papers, 2 papers with code

Cell Variational Information Bottleneck Network

no code implementations22 Mar 2024 Zhonghua Zhai, Chen Ju, Jinsong Lan, Shuai Xiao

In this work, we propose Cell Variational Information Bottleneck Network (cellVIB), a convolutional neural network using information bottleneck mechanism, which can be combined with the latest feedforward network architecture in an end-to-end training method.

Face Recognition Representation Learning

Audio-Visual Segmentation via Unlabeled Frame Exploitation

no code implementations17 Mar 2024 Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya zhang, Yanfeng Wang

NFs, temporally adjacent to the labeled frame, often contain rich motion information that assists in the accurate localization of sounding objects.

valid

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models

no code implementations12 Dec 2023 Chen Ju, Haicheng Wang, Zeqian Li, Xu Chen, Zhonghua Zhai, Weilin Huang, Shuai Xiao

Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the impressive performance.

Enhancing Cross-domain Click-Through Rate Prediction via Explicit Feature Augmentation

no code implementations30 Nov 2023 Xu Chen, Zida Cheng, Jiangchao Yao, Chen Ju, Weilin Huang, Jinsong Lan, Xiaoyi Zeng, Shuai Xiao

Later the augmentation network employs the explicit cross-domain knowledge as augmented information to boost the target domain CTR prediction.

Click-Through Rate Prediction Transfer Learning

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

no code implementations25 Jul 2023 Jinxiang Liu, Chen Ju, Chaofan Ma, Yanfeng Wang, Yu Wang, Ya zhang

The goal of the audio-visual segmentation (AVS) task is to segment the sounding objects in the video frames using audio cues.

Segmentation

Multi-Modal Prototypes for Open-Set Semantic Segmentation

no code implementations5 Jul 2023 Yuhuan Yang, Chaofan Ma, Chen Ju, Ya zhang, Yanfeng Wang

In this paper, we define a unified setting termed as open-set semantic segmentation (O3S), which aims to learn seen and unseen semantics from both visual examples and textual names.

Segmentation Semantic Segmentation

Annotation-free Audio-Visual Segmentation

no code implementations18 May 2023 Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya zhang, Weidi Xie

The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks.

Image Segmentation Segmentation +1

Image to Multi-Modal Retrieval for Industrial Scenarios

no code implementations6 May 2023 Zida Cheng, Chen Ju, Xu Chen, Zhonghua Zhai, Shuai Xiao, Xiaoyi Zeng, Weilin Huang

We formally define a novel valuable information retrieval task: image-to-multi-modal-retrieval (IMMR), where the query is an image and the doc is an entity with both image and textual description.

Cross-Modal Retrieval Information Retrieval +2

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations21 Mar 2023 Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery

no code implementations17 Mar 2023 Chaofan Ma, Yuhuan Yang, Chen Ju, Fei Zhang, Jinxiang Liu, Yu Wang, Ya zhang, Yanfeng Wang

However, the challenges exist as there is one structural difference between generative and discriminative models, which limits the direct use.

Object Object Discovery +1

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

no code implementations20 Feb 2023 Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Ya zhang, Peisen Zhao, Jianlong Chang, Qi Tian

Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos.

Sentence Temporal Sentence Grounding

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

no code implementations26 Jun 2022 Jinxiang Liu, Chen Ju, Weidi Xie, Ya zhang

We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos.

Cross-Modal Retrieval Representation Learning +1

Prompting Visual-Language Models for Efficient Video Understanding

1 code implementation8 Dec 2021 Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie

Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.

Action Recognition Language Modelling +4

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

no code implementations6 Apr 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Xiaoyun Zhang, Qi Tian

To solve this issue, we introduce an adaptive mutual supervision framework (AMS) with two branches, where the base branch adopts CAS to localize the most discriminative action regions, while the supplementary branch localizes the less discriminative action regions through a novel adaptive sampler.

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations ICCV 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Temporal Action Localization

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations15 Dec 2020 Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Weakly Supervised Action Localization

Bottom-Up Temporal Action Localization with Mutual Regularization

1 code implementation ECCV 2020 Peisen Zhao, Lingxi Xie, Chen Ju, Ya zhang, Yan-Feng Wang, Qi Tian

To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases.

Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.