Search Results for author: Yu-Gang Jiang

Found 61 papers, 21 papers with code

Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language

1 code implementation ECCV 2020 Shaoxiang Chen, Yu-Gang Jiang

Temporal Activity Localization via Language (TALL) in video is a recently proposed challenging vision task, and tackling it requires fine-grained understanding of the video content, however, this is overlooked by most of the existing works.

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

no code implementations18 Oct 2021 Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.

Adversarial Attack Translation +1

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation9 Oct 2021 Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Semantic Segmentation

Self-supervised Learning for Semi-supervised Temporal Language Grounding

no code implementations23 Sep 2021 Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To achieve good performance with limited annotations, we tackle this task in a semi-supervised way and propose a unified Semi-supervised Temporal Language Grounding (STLG) framework.

Contrastive Learning Self-Supervised Learning

Towards Transferable Adversarial Attacks on Vision Transformers

no code implementations9 Sep 2021 Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang

The results of these experiments demonstrate that the proposed dual attack can greatly boost transferability between ViTs and from ViTs to CNNs.

A Multimodal Framework for Video Ads Understanding

no code implementations29 Aug 2021 Zejia Weng, Lingchen Meng, Rui Wang, Zuxuan Wu, Yu-Gang Jiang

There is a growing trend in placing video advertisements on social platforms for online marketing, which demands automatic approaches to understand the contents of advertisements effectively.

Optical Character Recognition Scene Segmentation +1

Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better

1 code implementation ICCV 2021 Bojia Zi, Shihao Zhao, Xingjun Ma, Yu-Gang Jiang

We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods in improving the robustness of small models against state-of-the-art attacks including the AutoAttack.

Knowledge Distillation

FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting

no code implementations10 Aug 2021 Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang

Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image.

Facial Inpainting

Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data

1 code implementation26 Jul 2021 Yuqian Fu, Yanwei Fu, Yu-Gang Jiang

Secondly, a novel disentangle module together with a domain classifier is proposed to extract the disentangled domain-irrelevant and domain-specific features.

cross-domain few-shot learning

Can Action be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos

no code implementations25 Jul 2021 Yuqian Fu, Yanwei Fu, Yu-Gang Jiang

To achieve this, a novel Mesh-based Video Action Imitation (M-VAI) method is proposed by us.

Human Dynamics

Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning

no code implementations CVPR 2021 Shaoxiang Chen, Yu-Gang Jiang

Dense Event Captioning (DEC) aims to jointly localize and describe multiple events of interest in untrimmed videos, which is an advancement of the conventional video captioning task (generating a single sentence description for a trimmed video).

Video Captioning

Cross-domain Contrastive Learning for Unsupervised Domain Adaptation

no code implementations10 Jun 2021 Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang

In addition, we demonstrate that CDCL is a general framework and can be adapted to the data-free setting, where the source data are unavailable during training, with minimal modification.

Contrastive Learning Self-Supervised Learning +1

VideoLT: Large-scale Long-tailed Video Recognition

1 code implementation ICCV 2021 Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis

In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.

Image Classification Video Recognition

HMS: Hierarchical Modality Selection for Efficient Video Recognition

no code implementations20 Apr 2021 Zejia Weng, Zuxuan Wu, Hengduo Li, Yu-Gang Jiang

Conventional video recognition pipelines typically fuse multimodal features for improved performance.

Video Recognition

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

no code implementations20 Apr 2021 Junke Wang, Zuxuan Wu, Jingjing Chen, Yu-Gang Jiang

This demands effective approaches that can detect perceptually convincing Deepfakes generated by advanced manipulation techniques.

DeepFake Detection Face Swapping +1

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

no code implementations18 Jan 2021 Shihao Zhao, Xingjun Ma, Yisen Wang, James Bailey, Bo Li, Yu-Gang Jiang

In this paper, we focus on image classification and propose a method to visualize and understand the class-wise knowledge (patterns) learned by DNNs under three different settings including natural, backdoor and adversarial.

Image Classification

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

1 code implementation5 Jan 2021 Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang

WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes.

DeepFake Detection Face Swapping

Motion Guided Region Message Passing for Video Captioning

no code implementations ICCV 2021 Shaoxiang Chen, Yu-Gang Jiang

In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors.

Video Captioning

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos

no code implementations31 Dec 2020 Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, Jingjing Chen, Jie Liu, Yu-Gang Jiang

However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness.

Domain Adaptation

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

1 code implementation20 Oct 2020 Yuqian Fu, Li Zhang, Junke Wang, Yanwei Fu, Yu-Gang Jiang

Humans can easily recognize actions with only a few examples given, while the existing video recognition models still heavily rely on the large-scale labeled data inputs.

Action Recognition Meta-Learning +1

Multi-modal Cooking Workflow Construction for Food Recipes

no code implementations20 Aug 2020 Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.

Common Sense Reasoning

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

no code implementations ECCV 2020 Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang

Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.

Imbalanced Gradients: A New Cause of Overestimated Adversarial Robustness

no code implementations24 Jun 2020 Linxi Jiang, Xingjun Ma, Zejia Weng, James Bailey, Yu-Gang Jiang

Evaluating the robustness of a defense model is a challenging task in adversarial robustness research.

Long-Term Cloth-Changing Person Re-identification

no code implementations26 May 2020 Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue

Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable.

Person Re-Identification

Sketch-BERT: Learning Sketch Bidirectional Encoder Representation from Transformers by Self-supervised Learning of Sketch Gestalt

1 code implementation CVPR 2020 Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, xiangyang xue

Unfortunately, the representation learned by SketchRNN is primarily for the generation tasks, rather than the other tasks of recognition and retrieval of sketches.

Self-Supervised Learning Sketch Recognition

Clean-Label Backdoor Attacks on Video Recognition Models

1 code implementation CVPR 2020 Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang

We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions.

Image Classification Video Recognition

Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition

no code implementations17 Jan 2020 Wenxuan Wang, Yanwei Fu, Qiang Sun, Tao Chen, Chenjie Cao, Ziqi Zheng, Guoqiang Xu, Han Qiu, Yu-Gang Jiang, xiangyang xue

Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we further evaluate several tasks of few-shot expression learning by virtue of our F2ED, which are to recognize the facial expressions given only few training instances.

Facial Expression Recognition

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

no code implementations NeurIPS 2019 Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.

Video Recognition

Heuristic Black-box Adversarial Attacks on Video Recognition Models

1 code implementation21 Nov 2019 Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, Yu-Gang Jiang

To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions.

Adversarial Attack Video Recognition

Black-box Adversarial Attacks on Video Recognition Models

no code implementations10 Apr 2019 Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, Yu-Gang Jiang

Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models.

Video Recognition

Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network

no code implementations28 Nov 2018 Peng Lu, Hangyu Lin, Yanwei Fu, Shaogang Gong, Yu-Gang Jiang, xiangyang xue

Additionally, to study the tasks of sketch-based hairstyle retrieval, this paper contributes a new instance-level photo-sketch dataset - Hairstyle Photo-Sketch dataset, which is composed of 3600 sketches and photos, and 2400 sketch-photo pairs.

Classification General Classification +2

Composite Binary Decomposition Networks

no code implementations16 Nov 2018 You Qiaoben, Zheng Wang, Jianguo Li, Yinpeng Dong, Yu-Gang Jiang, Jun Zhu

Binary neural networks have great resource and computing efficiency, while suffer from long training procedure and non-negligible accuracy drops, when comparing to the full-precision counterparts.

General Classification Image Classification +2

Non-local NetVLAD Encoding for Video Classification

no code implementations29 Sep 2018 Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, Yu-Gang Jiang

This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI.

Classification General Classification +3

Object Detection from Scratch with Deep Supervision

1 code implementation25 Sep 2018 Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, xiangyang xue

Thus, a better solution to handle these critical problems is to train object detectors from scratch, which motivates our proposed method.

General Classification Object Detection

Recurrent Fusion Network for Image Captioning

no code implementations ECCV 2018 Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang

In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.

Image Captioning

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

no code implementations ECCV 2018 Minjun Li, Hao-Zhi Huang, Lin Ma, Wei Liu, Tong Zhang, Yu-Gang Jiang

Recent studies on unsupervised image-to-image translation have made a remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss.

Translation Unsupervised Image-To-Image Translation

Multi-level Semantic Feature Augmentation for One-shot Learning

1 code implementation15 Apr 2018 Zitian Chen, Yanwei Fu, yinda zhang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal

In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.

One-Shot Learning

Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging

no code implementations12 Apr 2018 Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, Qi Tian

Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by constructing and exploring an image-tag-user graph.

Graph Learning

Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images

3 code implementations ECCV 2018 Nanyang Wang, yinda zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang

We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image.

3D Object Reconstruction

Learning to score the figure skating sports videos

1 code implementation8 Feb 2018 Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, xiangyang xue

This paper targets at learning to score the figure skating sports videos.

Pose-Normalized Image Generation for Person Re-identification

2 code implementations ECCV 2018 Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, xiangyang xue

Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations.

Image Generation Person Re-Identification +1

Dual Skipping Networks

no code implementations CVPR 2018 Changmao Cheng, Yanwei Fu, Yu-Gang Jiang, Wei Liu, Wenlian Lu, Jianfeng Feng, xiangyang xue

Inspired by the recent neuroscience studies on the left-right asymmetry of the human brain in processing low and high spatial frequency information, this paper introduces a dual skipping network which carries out coarse-to-fine object categorization.

General Classification

Recent Advances in Zero-shot Recognition

no code implementations13 Oct 2017 Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal, Shaogang Gong

With the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data.

Open Set Learning Zero-Shot Learning

Multi-scale Deep Learning Architectures for Person Re-identification

no code implementations ICCV 2017 Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, xiangyang xue

Our model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching.

Person Re-Identification

DSOD: Learning Deeply Supervised Object Detectors from Scratch

4 code implementations ICCV 2017 Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, xiangyang xue

State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.

General Classification Object Detection

Learning Fashion Compatibility with Bidirectional LSTMs

1 code implementation18 Jul 2017 Xintong Han, Zuxuan Wu, Yu-Gang Jiang, Larry S. Davis

To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion.

Aggregating Frame-level Features for Large-Scale Video Classification

no code implementations4 Jul 2017 Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang

This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset.

Classification General Classification +3

Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification

no code implementations14 Jun 2017 Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang

More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.

General Classification Video Classification

Weakly Supervised Dense Video Captioning

no code implementations CVPR 2017 Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, xiangyang xue

This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences.

Dense Video Captioning Language Modelling +1

Iterative Object and Part Transfer for Fine-Grained Recognition

no code implementations29 Mar 2017 Zhiqiang Shen, Yu-Gang Jiang, Dequan Wang, xiangyang xue

On both datasets, we achieve better results than many state-of-the-art approaches, including a few using oracle (manually annotated) bounding boxes in the test images.

Deep Learning for Video Classification and Captioning

1 code implementation22 Sep 2016 Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang

Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data.

Classification General Classification +2

The THUMOS Challenge on Action Recognition for Videos "in the Wild"

no code implementations21 Apr 2016 Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah

Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.

Action Classification Action Recognition +3

Evaluating Two-Stream CNN for Video Classification

no code implementations8 Apr 2015 Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, xiangyang xue

In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification.

Classification General Classification +1

Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification

1 code implementation7 Apr 2015 Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, xiangyang xue

In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos.

Classification General Classification +1

Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

no code implementations25 Feb 2015 Yu-Gang Jiang, Zuxuan Wu, Jun Wang, xiangyang xue, Shih-Fu Chang

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event.

Cannot find the paper you are looking for? You can Submit a new open access paper.