Search Results for author: Yu-Gang Jiang

Found 144 papers, 73 papers with code

Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language

1 code implementation • ECCV 2020 • Shaoxiang Chen, Yu-Gang Jiang

Temporal Activity Localization via Language (TALL) in video is a recently proposed challenging vision task, and tackling it requires fine-grained understanding of the video content, however, this is overlooked by most of the existing works.

Sentence

Paper
Code

PoseAnimate: Zero-shot high fidelity pose controllable character animation

no code implementations • 21 Apr 2024 • Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi

Image-to-video(I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity with the source image. However, existing approaches suffer from character appearance inconsistency and poor preservation of fine details.

Paper
Add Code

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

no code implementations • 19 Apr 2024 • Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes.

Benchmarking counterfactual +3

Paper
Add Code

The Dog Walking Theory: Rethinking Convergence in Federated Learning

no code implementations • 18 Apr 2024 • Kun Zhai, Yifeng Gao, Xingjun Ma, Difan Zou, Guangnan Ye, Yu-Gang Jiang

In this paper, we study the convergence of FL on non-IID data and propose a novel \emph{Dog Walking Theory} to formulate and identify the missing element in existing research.

Federated Learning

Paper
Add Code

Learning to Rank Patches for Unbiased Image Redundancy Reduction

1 code implementation • 31 Mar 2024 • Yang Luo, Zhineng Chen, Peng Zhou, Zuxuan Wu, Xieping Gao, Yu-Gang Jiang

The results demonstrate that LTRP outperforms both supervised and other self-supervised methods due to the fair assessment of image content.

Image Reconstruction Inductive Bias +1

Paper
Code

OmniVid: A Generative Framework for Universal Video Understanding

1 code implementation • 26 Mar 2024 • Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.

Action Recognition Decoder +5

Paper
Code

FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model

no code implementations • 15 Mar 2024 • Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang

Reconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available.

3D Reconstruction

Paper
Add Code

Whose Side Are You On? Investigating the Political Stance of Large Language Models

1 code implementation • 15 Mar 2024 • Pagnarasmey Pit, Xingjun Ma, Mike Conway, Qingyu Chen, James Bailey, Henry Pit, Putrasmey Keo, Watey Diep, Yu-Gang Jiang

Large Language Models (LLMs) have gained significant popularity for their application in various everyday tasks such as text generation, summarization, and information retrieval.

Fairness Information Retrieval +1

Paper
Code

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

1 code implementation • 12 Mar 2024 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

This adaptation leads to convenient development of such LMMs with minimal modifications, however, it overlooks the intrinsic characteristics of diverse visual tasks and hinders the learning of perception capabilities.

Concept Alignment Language Modelling

Paper
Code

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

no code implementations • 12 Mar 2024 • Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.

Food Recognition

Paper
Add Code

Doubly Abductive Counterfactual Inference for Text-based Image Editing

1 code implementation • 5 Mar 2024 • Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang

Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning.

counterfactual Counterfactual Inference +2

Paper
Code

Instruction-Guided Scene Text Recognition

no code implementations • 31 Jan 2024 • Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, Yu-Gang Jiang

Multi-modal models have shown appealing performance in visual tasks recently, as instruction-guided training has evoked the ability to understand fine-grained visual content.

Scene Text Recognition

Paper
Add Code

MouSi: Poly-Visual-Expert Vision-Language Models

1 code implementation • 30 Jan 2024 • Xiaoran Fan, Tao Ji, Changhao Jiang, Shuo Li, Senjie Jin, Sirui Song, Junke Wang, Boyang Hong, Lu Chen, Guodong Zheng, Ming Zhang, Caishuang Huang, Rui Zheng, Zhiheng Xi, Yuhao Zhou, Shihan Dou, Junjie Ye, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs.

Ranked #42 on Visual Question Answering on MM-Vet

Image Segmentation Image-text matching +4

Paper
Code

Multi-Trigger Backdoor Attacks: More Triggers, More Threats

no code implementations • 27 Jan 2024 • Yige Li, Xingjun Ma, Jiabo He, Hanxun Huang, Yu-Gang Jiang

Arguably, real-world backdoor attacks can be much more complex, e. g., the existence of multiple adversaries for the same dataset if it is of high value.

Paper
Add Code

Secrets of RLHF in Large Language Models Part II: Reward Modeling

1 code implementation • 11 Jan 2024 • Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset and fully leverage high-quality preference data.

Contrastive Learning Meta-Learning +1

1,169

Paper
Code

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

no code implementations • 22 Dec 2023 • Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo

In the second stage, we construct a multi-round conversation dataset and a reasoning segmentation dataset to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in the food domain.

Food Recognition Multi-Task Learning +3

Paper
Add Code

Instance-aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

no code implementations • 13 Dec 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Lechao Cheng, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.

3D Object Detection Autonomous Driving +3

Paper
Add Code

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation • 30 Nov 2023 • Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

Paper
Code

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models

no code implementations • 30 Nov 2023 • Zhen Xing, Qi Dai, Zihao Zhang, HUI ZHANG, Han Hu, Zuxuan Wu, Yu-Gang Jiang

Our model can edit and translate the desired results within seconds based on user instructions.

Semantic Segmentation Video Editing +3

Paper
Add Code

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model

1 code implementation • 29 Nov 2023 • Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Zuxuan Wu, Hang Xu, Yu-Gang Jiang

Identity-consistent video generation seeks to synthesize videos that are guided by both textual prompts and reference images of entities.

Ranked #1 on Video Generation on MSR-VTT

Denoising Image to Video Generation +1

Paper
Code

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

1 code implementation • 24 Nov 2023 • Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target.

Meta-Learning One-Shot Segmentation +3

Paper
Code

AdaDiff: Adaptive Step Selection for Fast Diffusion

no code implementations • 24 Nov 2023 • HUI ZHANG, Zuxuan Wu, Zhen Xing, Jie Shao, Yu-Gang Jiang

Diffusion models, as a type of generative models, have achieved impressive results in generating images and videos conditioned on textual conditions.

Denoising Image Generation +1

Paper
Add Code

Adversarial Prompt Tuning for Vision-Language Models

1 code implementation • 19 Nov 2023 • Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang

With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities.

Adversarial Robustness

Paper
Code

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

2 code implementations • 13 Nov 2023 • Junke Wang, Lingchen Meng, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang

Existing visual instruction tuning methods typically prompt large language models with textual descriptions to generate instruction-following data.

Ranked #35 on Visual Question Answering on MM-Vet

Instruction Following Visual Question Answering

178

Paper
Code

Fake Alignment: Are LLMs Really Aligned Well?

1 code implementation • 10 Nov 2023 • Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang

The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety.

Multiple-choice

Paper
Code

A Survey on Video Diffusion Models

1 code implementation • 16 Oct 2023 • Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.

Image Generation Video Editing +2

1,309

Paper
Code

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

1 code implementation • 8 Oct 2023 • Zuxuan Wu, Zejia Weng, Wujian Peng, Xitong Yang, Ang Li, Larry S. Davis, Yu-Gang Jiang

Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition.

Action Recognition Continual Learning +5

Paper
Code

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

no code implementations • 7 Sep 2023 • Jiaxi Gu, Shicong Wang, Haoyu Zhao, Tianyi Lu, Xing Zhang, Zuxuan Wu, Songcen Xu, Wei zhang, Yu-Gang Jiang, Hang Xu

Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process.

Action Recognition Decoder +4

Paper
Add Code

SimDA: Simple Diffusion Adapter for Efficient Video Generation

no code implementations • 18 Aug 2023 • Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang

In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1. 1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way.

Transfer Learning Video Editing +2

Paper
Add Code

On the Importance of Spatial Relations for Few-shot Action Recognition

no code implementations • 14 Aug 2023 • Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

We are thus motivated to investigate the importance of spatial relations and propose a more accurate few-shot action recognition method that leverages both spatial and temporal information.

Few-Shot action recognition Few Shot Action Recognition +1

Paper
Add Code

Context Perception Parallel Decoder for Scene Text Recognition

1 code implementation • 23 Jul 2023 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, Yu-Gang Jiang

We first present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception.

Ranked #1 on Scene Text Recognition on CUTE80 (using extra training data)

Decoder Language Modelling +1

38,644

Paper
Code

LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network

1 code implementation • 27 Jun 2023 • Yuchen Su, Zhineng Chen, Zhiwen Shao, Yuning Du, Zhilong Ji, Jinfeng Bai, Yong Zhou, Yu-Gang Jiang

Next, we propose a dual assignment scheme for speed acceleration.

Scene Text Detection Text Detection

Paper
Code

Prompting Large Language Models to Reformulate Queries for Moment Localization

no code implementations • 6 Jun 2023 • Wenfeng Yan, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang

The task of moment localization is to localize a temporal moment in an untrimmed video for a given natural language query.

Moment Queries Natural Language Queries

Paper
Add Code

Reconstructive Neuron Pruning for Backdoor Defense

1 code implementation • 24 May 2023 • Yige Li, Xixiang Lyu, Xingjun Ma, Nodens Koren, Lingjuan Lyu, Bo Li, Yu-Gang Jiang

Specifically, RNP first unlearns the neurons by maximizing the model's error on a small subset of clean samples and then recovers the neurons by minimizing the model's error on the same data.

backdoor defense

Paper
Code

MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition

1 code implementation • ICCV 2023 • Tianlun Zheng, Zhineng Chen, Bingchen Huang, Wei zhang, Yu-Gang Jiang

In this paper, we propose the Incremental MLTR (IMLTR) task in the context of incremental learning (IL), where different languages are introduced in batches.

Ranked #1 on Incremental Learning on MLT17

Continual Learning Incremental Learning +2

Paper
Code

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

2 code implementations • 24 May 2023 • Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues.

Autonomous Driving Question Answering +1

629

Paper
Code

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

1 code implementation • 9 May 2023 • Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang

In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time.

Ranked #1 on Scene Text Recognition on SVT-P

Optical Character Recognition (OCR) Scene Text Recognition

Paper
Code

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

no code implementations • 27 Apr 2023 • Junke Wang, Dongdong Chen, Chong Luo, Xiyang Dai, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

Existing deep video models are limited by specific tasks, fixed input-output spaces, and poor generalization capabilities, making it difficult to deploy them in real-world scenarios.

Video Understanding

Paper
Add Code

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

1 code implementation • ICCV 2023 • Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

Ranked #16 on Action Classification on Kinetics-400

Action Classification Action Recognition +1

Paper
Code

OmniTracker: Unifying Object Tracking by Tracking-with-Detection

no code implementations • 21 Mar 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang

Object tracking (OT) aims to estimate the positions of target objects in a video sequence.

Object Object Tracking

Paper
Add Code

DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection

1 code implementation • 15 Mar 2023 • HUI ZHANG, Zheng Wang, Zuxuan Wu, Yu-Gang Jiang

Anomaly detection has garnered extensive applications in real industrial manufacturing due to its remarkable effectiveness and efficiency.

Ranked #1 on Anomaly Detection on VisA

Denoising Unsupervised Anomaly Detection

111

Paper
Code

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

no code implementations • 13 Mar 2023 • Haoran Chen, Zuxuan Wu, Xintong Han, Menglin Jia, Yu-Gang Jiang

Such a trade-off is referred to as the stabilityplasticity dilemma and is a more general and challenging problem for continual learning.

Class Incremental Learning Incremental Learning

Paper
Add Code

StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning

2 code implementations • CVPR 2023 • Yuqian Fu, Yu Xie, Yanwei Fu, Yu-Gang Jiang

Thus, inspired by vanilla adversarial learning, a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method is proposed for CD-FSL.

Ranked #1 on Cross-Domain Few-Shot on Plantae

Adversarial Attack cross-domain few-shot learning

Paper
Code

Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

1 code implementation • 1 Feb 2023 • Zejia Weng, Xitong Yang, Ang Li, Zuxuan Wu, Yu-Gang Jiang

Our framework extends CLIP with minimal modifications to model spatial-temporal relationships in videos, making it a specialized video classifier, while striving for generalization.

Action Recognition Continual Learning +2

Paper
Code

Vocabulary-informed Zero-shot and Open-set Learning

1 code implementation • 3 Jan 2023 • Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, xiangyang xue, Leonid Sigal

Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels.

Object Categorization Open Set Learning +1

Paper
Code

Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining

no code implementations • CVPR 2023 • Kexin Sun, Zhineng Chen, Gongwei Wang, Jun Liu, Xiongjun Ye, Yu-Gang Jiang

In order to eliminate the square effect, we design a bi-directional feature fusion generative adversarial network (BFF-GAN) with a global branch and a local branch.

Generative Adversarial Network

Paper
Add Code

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

1 code implementation • CVPR 2023 • Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yu-Gang Jiang, YaoWei Wang, Changsheng Xu

Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.

Data Poisoning

Paper
Code

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

no code implementations • CVPR 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang

Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.

Ranked #1 on Semi-Supervised Video Object Segmentation on Long Video Dataset (using extra training data)

Instance Segmentation Segmentation +3

Paper
Add Code

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection

no code implementations • 12 Dec 2022 • Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang

Online media data, in the forms of images and videos, are becoming mainstream communication channels.

DeepFake Detection Face Swapping

Paper
Add Code

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

4 code implementations • CVPR 2023 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu-Gang Jiang

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Ranked #1 on Self-Supervised Action Recognition on HMDB51

Action Classification Representation Learning +1

Paper
Code

Prototypical Residual Networks for Anomaly Detection and Localization

no code implementations • CVPR 2023 • HUI ZHANG, Zuxuan Wu, Zheng Wang, Zhineng Chen, Yu-Gang Jiang

Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness.

Ranked #2 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Supervised Anomaly Detection

Paper
Add Code

ResFormer: Scaling ViTs with Multi-Resolution Training

1 code implementation • CVPR 2023 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions.

Action Recognition Image Classification +4

Paper
Code

Transferability Estimation Based On Principal Gradient Expectation

no code implementations • 29 Nov 2022 • Huiyan Qi, Lechao Cheng, Jingjing Chen, Yue Yu, Xue Song, Zunlei Feng, Yu-Gang Jiang

Transfer learning aims to improve the performance of target tasks by transferring knowledge acquired in source tasks.

Transfer Learning

Paper
Add Code

SVFormer: Semi-supervised Video Transformer for Action Recognition

1 code implementation • CVPR 2023 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

In this paper, we investigate the use of transformer models under the SSL setting for action recognition.

Action Recognition Semi-Supervised Image Classification +1

Paper
Code

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning

1 code implementation • 11 Oct 2022 • Linhai Zhuo, Yuqian Fu, Jingjing Chen, Yixin Cao, Yu-Gang Jiang

The proposed TGDM framework contains a Mixup-3T network for learning classifiers and a dynamic ratio generation network (DRGN) for learning the optimal mix ratio.

cross-domain few-shot learning Transfer Learning

Paper
Code

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning

1 code implementation • 11 Oct 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang

Concretely, to solve the data imbalance problem between the source data with sufficient examples and the auxiliary target data with limited examples, we build our model under the umbrella of multi-expert learning.

cross-domain few-shot learning Knowledge Distillation

Paper
Code

Text-driven Video Prediction

no code implementations • 6 Oct 2022 • Xue Song, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

Specifically, appearance and motion components are provided by the image and caption separately.

Causal Inference Video Generation +1

Paper
Add Code

Locate before Answering: Answer Guided Question Localization for Video Question Answering

no code implementations • 5 Oct 2022 • Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang

Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.

Question Answering Video Question Answering

Paper
Add Code

Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors

1 code implementation • 30 Sep 2022 • Zhen Xing, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang

In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.

3D Reconstruction Object Reconstruction +2

Paper
Code

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.

Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Action Classification Action Recognition +13

Paper
Add Code

Enhancing the Self-Universality for Transferable Targeted Attacks

1 code implementation • CVPR 2023 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks.

Paper
Code

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation • CVPR 2023 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

155

Paper
Code

Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling

no code implementations • 25 Aug 2022 • Rui Wang, Zuxuan Wu, Dongdong Chen, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Luowei Zhou, Lu Yuan, Yu-Gang Jiang

To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e. g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism.

Video Recognition

Paper
Add Code

Balanced Contrastive Learning for Long-Tailed Visual Recognition

1 code implementation • CVPR 2022 • Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping Phoebe Chen, Yu-Gang Jiang

In this paper, we focus on representation learning for imbalanced data.

Ranked #1 on Long-tail Learning on CIFAR-10-LT (ρ=100) on CIFAR-10-LT (ρ=100)

Contrastive Learning Image Classification +3

Paper
Code

PolarFormer: Multi-camera 3D Object Detection with Polar Transformer

1 code implementation • 30 Jun 2022 • Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang

3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world.

Ranked #2 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Autonomous Driving +5

153

Paper
Code

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

no code implementations • CVPR 2023 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

Object object-detection +1

Paper
Add Code

SVTR: Scene Text Recognition with a Single Visual Model

2 code implementations • 30 Apr 2022 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang

Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription.

Ranked #16 on Scene Text Recognition on ICDAR2013

Scene Text Recognition

38,644

Paper
Code

Adaptive Split-Fusion Transformer

1 code implementation • 26 Apr 2022 • Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.

Ranked #1 on Image Classification on CIFAR-10 Image Classification

Image Classification

Paper
Code

Deeper Insights into the Robustness of ViTs towards Common Corruptions

no code implementations • 26 Apr 2022 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang

With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy.

Benchmarking Data Augmentation

Paper
Add Code

Video Moment Retrieval from Text Queries via Single Frame Annotation

1 code implementation • 20 Apr 2022 • Ran Cui, Tianwen Qian, Pai Peng, Elena Daskalaki, Jingjing Chen, Xiaowei Guo, Huyang Sun, Yu-Gang Jiang

Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor.

Contrastive Learning Moment Retrieval +1

Paper
Code

ObjectFormer for Image Manipulation Detection and Localization

no code implementations • CVPR 2022 • Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.

Image Manipulation Image Manipulation Detection

Paper
Add Code

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning

1 code implementation • 15 Mar 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang

The key challenge of CD-FSL lies in the huge data shift between source and target domains, which is typically in the form of totally different visual styles.

Ranked #2 on Cross-Domain Few-Shot on CUB

cross-domain few-shot learning Self-Supervised Learning

Paper
Code

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation • 10 Mar 2022 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning +3

Paper
Code

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

no code implementations • 10 Mar 2022 • Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders.

Object Visual Grounding

Paper
Add Code

Cross-Modal Transferable Adversarial Attacks from Images to Videos

no code implementations • CVPR 2022 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

This paper investigates the transferability of adversarial perturbation across different modalities, i. e., leveraging adversarial perturbation generated on white-box image models to attack black-box video models.

Video Recognition

Paper
Add Code

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation

no code implementations • 10 Dec 2021 • Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang

Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model, and a feasible way to improve both tasks is to use more data.

Image-text matching Language Modelling +8

Paper
Add Code

BEVT: BERT Pretraining of Video Transformers

1 code implementation • CVPR 2022 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan

This design is motivated by two observations: 1) transformers learned on image datasets provide decent spatial priors that can ease the learning of video transformers, which are often times computationally-intensive if trained from scratch; 2) discriminative clues, i. e., spatial and temporal information, needed to make correct predictions vary among different videos due to large intra-class and inter-class variations.

Ranked #8 on Action Recognition on Diving-48

Action Recognition Representation Learning

152

Paper
Code

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

no code implementations • CVPR 2022 • Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim

To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.

Paper
Add Code

Efficient Video Transformers with Spatial-Temporal Token Selection

1 code implementation • 23 Nov 2021 • Junke Wang, Xitong Yang, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang

Video transformers have achieved impressive results on major video recognition benchmarks, which however suffer from high computational cost.

Video Recognition

Paper
Code

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

2 code implementations • 22 Nov 2021 • Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang

In this paper, we propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.

Ranked #12 on Scene Text Recognition on ICDAR2015

Position Scene Text Recognition

106

Paper
Code

Semi-Supervised Vision Transformers

1 code implementation • 22 Nov 2021 • Zejia Weng, Xitong Yang, Ang Li, Zuxuan Wu, Yu-Gang Jiang

Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available.

Ranked #17 on Semi-Supervised Image Classification on ImageNet - 10% labeled data

Inductive Bias Semi-Supervised Image Classification

Paper
Code

Attacking Video Recognition Models with Bullet-Screen Comments

1 code implementation • 29 Oct 2021 • Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video.

Adversarial Attack Adversarial Attack on Video Classification +2

Paper
Code

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

1 code implementation • 18 Oct 2021 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.

Adversarial Attack Translation +1

Paper
Code

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation • 9 Oct 2021 • Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +2

Paper
Code

Self-supervised Learning for Semi-supervised Temporal Language Grounding

no code implementations • 23 Sep 2021 • Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video.

Contrastive Learning Pseudo Label +2

Paper
Add Code

Towards Transferable Adversarial Attacks on Vision Transformers

2 code implementations • 9 Sep 2021 • Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang

We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs.

141

Paper
Code

A Multimodal Framework for Video Ads Understanding

no code implementations • 29 Aug 2021 • Zejia Weng, Lingchen Meng, Rui Wang, Zuxuan Wu, Yu-Gang Jiang

There is a growing trend in placing video advertisements on social platforms for online marketing, which demands automatic approaches to understand the contents of advertisements effectively.

Marketing Optical Character Recognition +5

Paper
Add Code

Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better

1 code implementation • ICCV 2021 • Bojia Zi, Shihao Zhao, Xingjun Ma, Yu-Gang Jiang

We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods in improving the robustness of small models against state-of-the-art attacks including the AutoAttack.

Adversarial Robustness Knowledge Distillation

Paper
Code

FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting

no code implementations • 10 Aug 2021 • Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang

Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image.

Facial Inpainting

Paper
Add Code

Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data

1 code implementation • 26 Jul 2021 • Yuqian Fu, Yanwei Fu, Yu-Gang Jiang

Secondly, a novel disentangle module together with a domain classifier is proposed to extract the disentangled domain-irrelevant and domain-specific features.

cross-domain few-shot learning

Paper
Code

Can Action be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos

no code implementations • 25 Jul 2021 • Yuqian Fu, Yanwei Fu, Yu-Gang Jiang

To achieve this, a novel Mesh-based Video Action Imitation (M-VAI) method is proposed by us.

Human Dynamics

Paper
Add Code

Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning

no code implementations • CVPR 2021 • Shaoxiang Chen, Yu-Gang Jiang

Dense Event Captioning (DEC) aims to jointly localize and describe multiple events of interest in untrimmed videos, which is an advancement of the conventional video captioning task (generating a single sentence description for a trimmed video).

Sentence Video Captioning

Paper
Add Code

Cross-domain Contrastive Learning for Unsupervised Domain Adaptation

1 code implementation • 10 Jun 2021 • Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.

Clustering Contrastive Learning +3

Paper
Code

VideoLT: Large-scale Long-tailed Video Recognition

1 code implementation • ICCV 2021 • Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis

In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.

Image Classification Video Recognition

Paper
Code

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition

no code implementations • 20 Apr 2021 • Zejia Weng, Zuxuan Wu, Hengduo Li, Jingjing Chen, Yu-Gang Jiang

Conventional video recognition pipelines typically fuse multimodal features for improved performance.

Video Recognition

Paper
Add Code

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

1 code implementation • 20 Apr 2021 • Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images.

DeepFake Detection Face Swapping +1

Paper
Code

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

no code implementations • 18 Jan 2021 • Shihao Zhao, Xingjun Ma, Yisen Wang, James Bailey, Bo Li, Yu-Gang Jiang

In this paper, we focus on image classification and propose a method to visualize and understand the class-wise knowledge (patterns) learned by DNNs under three different settings including natural, backdoor and adversarial.

Image Classification

Paper
Add Code

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

1 code implementation • 5 Jan 2021 • Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang

WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes.

DeepFake Detection Face Swapping

132

Paper
Code

Motion Guided Region Message Passing for Video Captioning

no code implementations • ICCV 2021 • Shaoxiang Chen, Yu-Gang Jiang

In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors.

Decoder Video Captioning

Paper
Add Code

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos

no code implementations • 31 Dec 2020 • Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, Jingjing Chen, Jie Liu, Yu-Gang Jiang

However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness.

Domain Adaptation

Paper
Add Code

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

1 code implementation • 20 Oct 2020 • Yuqian Fu, Li Zhang, Junke Wang, Yanwei Fu, Yu-Gang Jiang

Humans can easily recognize actions with only a few examples given, while the existing video recognition models still heavily rely on the large-scale labeled data inputs.

Ranked #1 on Few Shot Action Recognition on Kinetics-100

Few Shot Action Recognition Meta-Learning +2

Paper
Code

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness

no code implementations • 28 Sep 2020 • Linxi Jiang, Xingjun Ma, Zejia Weng, James Bailey, Yu-Gang Jiang

Evaluating the robustness of a defense model is a challenging task in adversarial robustness research.

Adversarial Robustness

Paper
Add Code

Multi-modal Cooking Workflow Construction for Food Recipes

no code implementations • 20 Aug 2020 • Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.

Common Sense Reasoning Decoder

Paper
Add Code

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

no code implementations • ECCV 2020 • Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang

Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.

Sentence

Paper
Add Code

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

1 code implementation • 24 Jun 2020 • Xingjun Ma, Linxi Jiang, Hanxun Huang, Zejia Weng, James Bailey, Yu-Gang Jiang

Evaluating the robustness of a defense model is a challenging task in adversarial robustness research.

Adversarial Robustness

Paper
Code

Long-Term Cloth-Changing Person Re-identification

no code implementations • 26 May 2020 • Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue

Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable.

Cloth-Changing Person Re-Identification

Paper
Add Code

Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt

1 code implementation • CVPR 2020 • Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, xiangyang xue

Unfortunately, the representation learned by SketchRNN is primarily for the generation tasks, rather than the other tasks of recognition and retrieval of sketches.

Retrieval Self-Supervised Learning +1

Paper
Code

Clean-Label Backdoor Attacks on Video Recognition Models

1 code implementation • CVPR 2020 • Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang

We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions.

Backdoor Attack backdoor defense +2

Paper
Code

Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition

no code implementations • 17 Jan 2020 • Wenxuan Wang, Yanwei Fu, Qiang Sun, Tao Chen, Chenjie Cao, Ziqi Zheng, Guoqiang Xu, Han Qiu, Yu-Gang Jiang, xiangyang xue

Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we further evaluate several tasks of few-shot expression learning by virtue of our F2ED, which are to recognize the facial expressions given only few training instances.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Add Code

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

no code implementations • NeurIPS 2019 • Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.

Video Recognition

Paper
Add Code

Heuristic Black-box Adversarial Attacks on Video Recognition Models

1 code implementation • 21 Nov 2019 • Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, Yu-Gang Jiang

To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions.

Adversarial Attack Video Recognition

Paper
Code

DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine

no code implementations • 25 Sep 2019 • Qiang Sun, Zhinan Cheng, Yanwei Fu, Wenxuan Wang, Yu-Gang Jiang, xiangyang xue

Instead of learning the cross features directly, DeepEnFM adopts the Transformer encoder as a backbone to align the feature embeddings with the clues of other fields.

Click-Through Rate Prediction

Paper
Add Code

Black-box Adversarial Attacks on Video Recognition Models

no code implementations • 10 Apr 2019 • Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, Yu-Gang Jiang

Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models.

Video Recognition

Paper
Add Code

A Multi-task Neural Approach for Emotion Attribution, Classification and Summarization

1 code implementation • 21 Dec 2018 • Guoyun Tu, Yanwei Fu, Boyang Li, Jiarui Gao, Yu-Gang Jiang, xiangyang xue

However, the sparsity of emotional expressions in the videos poses an obstacle to visual emotion analysis.

Classification Emotion Recognition +1

Paper
Code

Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network

no code implementations • 28 Nov 2018 • Peng Lu, Hangyu Lin, Yanwei Fu, Shaogang Gong, Yu-Gang Jiang, xiangyang xue

Additionally, to study the tasks of sketch-based hairstyle retrieval, this paper contributes a new instance-level photo-sketch dataset - Hairstyle Photo-Sketch dataset, which is composed of 3600 sketches and photos, and 2400 sketch-photo pairs.

General Classification Retrieval +2

Paper
Add Code

Composite Binary Decomposition Networks

no code implementations • 16 Nov 2018 • You Qiaoben, Zheng Wang, Jianguo Li, Yinpeng Dong, Yu-Gang Jiang, Jun Zhu

Binary neural networks have great resource and computing efficiency, while suffer from long training procedure and non-negligible accuracy drops, when comparing to the full-precision counterparts.

General Classification Image Classification +3

Paper
Add Code

Non-local NetVLAD Encoding for Video Classification

no code implementations • 29 Sep 2018 • Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, Yu-Gang Jiang

This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI.

Classification General Classification +3

Paper
Add Code

Object Detection from Scratch with Deep Supervision

1 code implementation • 25 Sep 2018 • Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, xiangyang xue

Thus, a better solution to handle these critical problems is to train object detectors from scratch, which motivates our proposed method.

General Classification Object +2

701

Paper
Code

NAIS: Neural Attentive Item Similarity Model for Recommendation

3 code implementations • 19 Sep 2018 • Xiangnan He, Zhankui He, Jingkuan Song, Zhenguang Liu, Yu-Gang Jiang, Tat-Seng Chua

As such, the key to an item-based CF method is in the estimation of item similarities.

Collaborative Filtering Recommendation Systems

147

Paper
Code

Recurrent Fusion Network for Image Captioning

no code implementations • ECCV 2018 • Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang

In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.

Decoder Image Captioning

Paper
Add Code

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

no code implementations • ECCV 2018 • Minjun Li, Hao-Zhi Huang, Lin Ma, Wei Liu, Tong Zhang, Yu-Gang Jiang

Recent studies on unsupervised image-to-image translation have made a remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss.

Translation Unsupervised Image-To-Image Translation

Paper
Add Code

Cross-Domain Sentiment Classification with Target Domain Specific Information

no code implementations • ACL 2018 • Minlong Peng, Qi Zhang, Yu-Gang Jiang, Xuanjing Huang

And we introduce a few target domain labeled data for learning domain-specific information.

Classification General Classification +2

Paper
Add Code

Multi-level Semantic Feature Augmentation for One-shot Learning

1 code implementation • 15 Apr 2018 • Zitian Chen, Yanwei Fu, yinda zhang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal

In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.

Decoder Novel Concepts +1

Paper
Code

Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging

no code implementations • 12 Apr 2018 • Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, Qi Tian

Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by constructing and exploring an image-tag-user graph.

Graph Learning TAG

Paper
Add Code

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images

6 code implementations • ECCV 2018 • Nanyang Wang, yinda zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang

We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image.

Ranked #3 on 3D Object Reconstruction on Data3D−R2N2 (Avg F1 metric)

3D Object Reconstruction

1,611

Paper
Code

Learning to score the figure skating sports videos

1 code implementation • 8 Feb 2018 • Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, xiangyang xue

This paper targets at learning to score the figure skating sports videos.

Paper
Code

Pose-Normalized Image Generation for Person Re-identification

2 code implementations • ECCV 2018 • Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, xiangyang xue

Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations.

Ranked #2 on Person Re-Identification on Market-1501->DukeMTMC-reID

Generative Adversarial Network Image Generation +2

1,270

Paper
Code

Dual Skipping Networks

no code implementations • CVPR 2018 • Changmao Cheng, Yanwei Fu, Yu-Gang Jiang, Wei Liu, Wenlian Lu, Jianfeng Feng, xiangyang xue

Inspired by the recent neuroscience studies on the left-right asymmetry of the human brain in processing low and high spatial frequency information, this paper introduces a dual skipping network which carries out coarse-to-fine object categorization.

General Classification Object +1

Paper
Add Code

Recent Advances in Zero-shot Recognition

no code implementations • 13 Oct 2017 • Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal, Shaogang Gong

With the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data.

Open Set Learning Zero-Shot Learning

Paper
Add Code

Multi-scale Deep Learning Architectures for Person Re-identification

no code implementations • ICCV 2017 • Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, xiangyang xue

Our model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching.

Person Re-Identification

Paper
Add Code

DSOD: Learning Deeply Supervised Object Detectors from Scratch

4 code implementations • ICCV 2017 • Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, xiangyang xue

State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.

General Classification Object +2

701

Paper
Code

Learning Fashion Compatibility with Bidirectional LSTMs

2 code implementations • 18 Jul 2017 • Xintong Han, Zuxuan Wu, Yu-Gang Jiang, Larry S. Davis

To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion.

Attribute

158

Paper
Code

Aggregating Frame-level Features for Large-Scale Video Classification

no code implementations • 4 Jul 2017 • Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset.

Classification General Classification +3

Paper
Add Code

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

no code implementations • 14 Jun 2017 • Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang

More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.

General Classification Video Classification

Paper
Add Code

Weakly Supervised Dense Video Captioning

no code implementations • CVPR 2017 • Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, xiangyang xue

This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences.

Dense Video Captioning Language Modelling +2

Paper
Add Code

Iterative Object and Part Transfer for Fine-Grained Recognition

no code implementations • 29 Mar 2017 • Zhiqiang Shen, Yu-Gang Jiang, Dequan Wang, xiangyang xue

On both datasets, we achieve better results than many state-of-the-art approaches, including a few using oracle (manually annotated) bounding boxes in the test images.

Object

Paper
Add Code

Deep Learning for Video Classification and Captioning

1 code implementation • 22 Sep 2016 • Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data.

Classification General Classification +3

Paper
Code

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

no code implementations • CVPR 2016 • Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, Leonid Sigal

Large-scale action recognition and video categorization are important problems in computer vision.

Action Recognition Clustering +4

Paper
Add Code

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

no code implementations • 21 Apr 2016 • Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

Action Classification Action Recognition +3

Paper
Add Code

Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization

no code implementations • 16 Nov 2015 • Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal

Emotion is a key element in user-generated videos.

Ranked #5 on Video Emotion Recognition on Ekman6

Transfer Learning Video Emotion Recognition +1

Paper
Add Code

Fusing Multi-Stream Deep Networks for Video Classification

no code implementations • 21 Sep 2015 • Zuxuan Wu, Yu-Gang Jiang, Xi Wang, Hao Ye, xiangyang xue, Jun Wang

A multi-stream framework is proposed to fully utilize the rich multimodal information in videos.

Classification General Classification +1

Paper
Add Code

Evaluating Two-Stream CNN for Video Classification

no code implementations • 8 Apr 2015 • Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, xiangyang xue

In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification.

Classification General Classification +2

Paper
Add Code

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

1 code implementation • 7 Apr 2015 • Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, xiangyang xue

In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos.

Classification General Classification +1

Paper
Code

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

no code implementations • 25 Feb 2015 • Yu-Gang Jiang, Zuxuan Wu, Jun Wang, xiangyang xue, Shih-Fu Chang

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.