1 code implementation • ECCV 2020 • Shaoxiang Chen, Yu-Gang Jiang
Temporal Activity Localization via Language (TALL) in video is a recently proposed challenging vision task, and tackling it requires fine-grained understanding of the video content, however, this is overlooked by most of the existing works.
no code implementations • 21 Mar 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang
Object tracking (OT) aims to estimate the positions of target objects in a video sequence.
1 code implementation • 15 Mar 2023 • HUI ZHANG, Zheng Wang, Zuxuan Wu, Yu-Gang Jiang
We introduce a new pipeline, DiffusionAD, to anomaly detection.
Ranked #1 on
Unsupervised Anomaly Detection
on DAGM2007
no code implementations • 13 Mar 2023 • Haoran Chen, Zuxuan Wu, Xintong Han, Menglin Jia, Yu-Gang Jiang
Such a trade-off is referred to as the stabilityplasticity dilemma and is a more general and challenging problem for continual learning.
no code implementations • 18 Feb 2023 • Yuqian Fu, Yu Xie, Yanwei Fu, Yu-Gang Jiang
Thus, inspired by vanilla adversarial learning, a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method is proposed for CD-FSL.
no code implementations • 1 Feb 2023 • Zejia Weng, Xitong Yang, Ang Li, Zuxuan Wu, Yu-Gang Jiang
Contrastive Language-Image Pretraining (CLIP) has demonstrated impressive zero-shot learning abilities for image understanding, yet limited effort has been made to investigate CLIP for zero-shot video recognition.
1 code implementation • 3 Jan 2023 • Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, xiangyang xue, Leonid Sigal
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels.
1 code implementation • 31 Dec 2022 • Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yu-Gang Jiang, YaoWei Wang, Changsheng Xu
Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.
no code implementations • 13 Dec 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang
Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
Ranked #1 on
Semi-Supervised Video Object Segmentation
on Long Video Dataset
(using extra training data)
no code implementations • 12 Dec 2022 • Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang
Online media data, in the forms of images and videos, are becoming mainstream communication channels.
1 code implementation • 8 Dec 2022 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Lu Yuan, Yu-Gang Jiang
For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.
Ranked #1 on
Action Recognition
on AVA v2.2
(using extra training data)
1 code implementation • 5 Dec 2022 • HUI ZHANG, Zuxuan Wu, Zheng Wang, Zhineng Chen, Yu-Gang Jiang
Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness.
Ranked #1 on
supervised anomaly detection
on MVTec AD
(using extra training data)
no code implementations • 1 Dec 2022 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang
Vision Transformers (ViTs) have achieved overwhelming success, yet they suffer from vulnerable resolution scalability, i. e., the performance drops drastically when presented with input resolutions that are unseen during training.
no code implementations • 29 Nov 2022 • Huiyan Qi, Lechao Cheng, Jingjing Chen, Yue Yu, Xue Song, Zunlei Feng, Yu-Gang Jiang
Transfer learning aims to improve the performance of target tasks by transferring knowledge acquired in source tasks.
1 code implementation • 23 Nov 2022 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
In this paper, we investigate the use of transformer models under the SSL setting for action recognition.
no code implementations • 11 Oct 2022 • Linhai Zhuo, Yuqian Fu, Jingjing Chen, Yixin Cao, Yu-Gang Jiang
The proposed TGDM framework contains a Mixup-3T network for learning classifiers and a dynamic ratio generation network (DRGN) for learning the optimal mix ratio.
1 code implementation • 11 Oct 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang
Concretely, to solve the data imbalance problem between the source data with sufficient examples and the auxiliary target data with limited examples, we build our model under the umbrella of multi-expert learning.
no code implementations • 6 Oct 2022 • Xue Song, Jingjing Chen, Bin Zhu, Yu-Gang Jiang
Specifically, appearance and motion components are provided by the image and caption separately.
no code implementations • 5 Oct 2022 • Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang
Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.
no code implementations • 30 Sep 2022 • Haoran Chen, Zuxuan Wu, Yu-Gang Jiang
Most existing methods for multi-source unsupervised domain adaptation (UDA) rely on a common encoder to extract domain-invariant features.
Multi-Source Unsupervised Domain Adaptation
Unsupervised Domain Adaptation
1 code implementation • 30 Sep 2022 • Zhen Xing, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang
In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.
no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan
This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.
Ranked #2 on
Zero-Shot Video Retrieval
on MSR-VTT
(using extra training data)
1 code implementation • 8 Sep 2022 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks.
1 code implementation • 7 Sep 2022 • Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang
Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.
no code implementations • 25 Aug 2022 • Rui Wang, Zuxuan Wu, Dongdong Chen, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Luowei Zhou, Lu Yuan, Yu-Gang Jiang
To avoid significant computational cost incurred by computing self-attention between the large number of local patches in videos, we propose to use very few global tokens (e. g., 6) for a whole video in Transformers to exchange information with 3D-CNNs with a cross-attention mechanism.
1 code implementation • CVPR 2022 • Jianggang Zhu, Zheng Wang, Jingjing Chen, Yi-Ping Phoebe Chen, Yu-Gang Jiang
In this paper, we focus on representation learning for imbalanced data.
1 code implementation • 30 Jun 2022 • Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, Yu-Gang Jiang
3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world.
Ranked #8 on
3D Object Detection
on nuScenes Camera Only
no code implementations • 7 Jun 2022 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang
Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.
1 code implementation • 30 Apr 2022 • Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang
Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription.
Ranked #6 on
Scene Text Recognition
on ICDAR2013
no code implementations • 26 Apr 2022 • Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang
With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy.
1 code implementation • 26 Apr 2022 • Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang
Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.
Ranked #13 on
Image Classification
on ImageNet
(Number of params metric)
1 code implementation • 20 Apr 2022 • Ran Cui, Tianwen Qian, Pai Peng, Elena Daskalaki, Jingjing Chen, Xiaowei Guo, Huyang Sun, Yu-Gang Jiang
Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor.
no code implementations • CVPR 2022 • Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang
Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.
no code implementations • 15 Mar 2022 • Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang
The key challenge of CD-FSL lies in the huge data shift between source and target domains, which is typically in the form of totally different visual styles.
no code implementations • 10 Mar 2022 • Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang
However, exploring relationships among these suspected objects in the one-stage visual grounding paradigm is non-trivial due to two core problems: (1) no object proposals are available as the basis on which to select suspected objects and perform relationship modeling; (2) compared with those irrelevant to the text query, suspected objects are more confusing, as they may share similar semantics, be entangled with certain relationships, etc, and thereby more easily mislead the model's prediction.
1 code implementation • 10 Mar 2022 • Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang
3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.
no code implementations • CVPR 2022 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
This paper investigates the transferability of adversarial perturbation across different modalities, i. e., leveraging adversarial perturbation generated on white-box image models to attack black-box video models.
no code implementations • 10 Dec 2021 • Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model, and a feasible way to improve both tasks is to use more data.
1 code implementation • CVPR 2022 • Rui Wang, Dongdong Chen, Zuxuan Wu, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Yu-Gang Jiang, Luowei Zhou, Lu Yuan
This design is motivated by two observations: 1) transformers learned on image datasets provide decent spatial priors that can ease the learning of video transformers, which are often times computationally-intensive if trained from scratch; 2) discriminative clues, i. e., spatial and temporal information, needed to make correct predictions vary among different videos due to large intra-class and inter-class variations.
Ranked #4 on
Action Recognition
on Diving-48
no code implementations • CVPR 2022 • Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim
To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.
1 code implementation • 23 Nov 2021 • Junke Wang, Xitong Yang, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang
Video transformers have achieved impressive results on major video recognition benchmarks, which however suffer from high computational cost.
1 code implementation • 22 Nov 2021 • Zejia Weng, Xitong Yang, Ang Li, Zuxuan Wu, Yu-Gang Jiang
Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available.
2 code implementations • 22 Nov 2021 • Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang
The Transformer-based encoder-decoder framework is becoming popular in scene text recognition, largely because it naturally integrates recognition clues from both visual and semantic domains.
Ranked #4 on
Scene Text Recognition
on IIIT5k
1 code implementation • 29 Oct 2021 • Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video.
1 code implementation • 18 Oct 2021 • Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.
1 code implementation • 9 Oct 2021 • Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma
Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
no code implementations • 23 Sep 2021 • Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video.
1 code implementation • 9 Sep 2021 • Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang
We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs.
no code implementations • 29 Aug 2021 • Zejia Weng, Lingchen Meng, Rui Wang, Zuxuan Wu, Yu-Gang Jiang
There is a growing trend in placing video advertisements on social platforms for online marketing, which demands automatic approaches to understand the contents of advertisements effectively.
1 code implementation • ICCV 2021 • Bojia Zi, Shihao Zhao, Xingjun Ma, Yu-Gang Jiang
We empirically demonstrate the effectiveness of our RSLAD approach over existing adversarial training and distillation methods in improving the robustness of small models against state-of-the-art attacks including the AutoAttack.
no code implementations • 10 Aug 2021 • Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang
Blind face inpainting refers to the task of reconstructing visual contents without explicitly indicating the corrupted regions in a face image.
1 code implementation • 26 Jul 2021 • Yuqian Fu, Yanwei Fu, Yu-Gang Jiang
Secondly, a novel disentangle module together with a domain classifier is proposed to extract the disentangled domain-irrelevant and domain-specific features.
no code implementations • 25 Jul 2021 • Yuqian Fu, Yanwei Fu, Yu-Gang Jiang
To achieve this, a novel Mesh-based Video Action Imitation (M-VAI) method is proposed by us.
no code implementations • CVPR 2021 • Shaoxiang Chen, Yu-Gang Jiang
Dense Event Captioning (DEC) aims to jointly localize and describe multiple events of interest in untrimmed videos, which is an advancement of the conventional video captioning task (generating a single sentence description for a trimmed video).
no code implementations • 10 Jun 2021 • Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang
Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.
1 code implementation • ICCV 2021 • Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis
In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.
no code implementations • 20 Apr 2021 • Zejia Weng, Zuxuan Wu, Hengduo Li, Jingjing Chen, Yu-Gang Jiang
Conventional video recognition pipelines typically fuse multimodal features for improved performance.
no code implementations • 20 Apr 2021 • Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang
The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images.
no code implementations • 18 Jan 2021 • Shihao Zhao, Xingjun Ma, Yisen Wang, James Bailey, Bo Li, Yu-Gang Jiang
In this paper, we focus on image classification and propose a method to visualize and understand the class-wise knowledge (patterns) learned by DNNs under three different settings including natural, backdoor and adversarial.
1 code implementation • 5 Jan 2021 • Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang
WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes.
no code implementations • ICCV 2021 • Shaoxiang Chen, Yu-Gang Jiang
In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors.
no code implementations • 31 Dec 2020 • Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, Jingjing Chen, Jie Liu, Yu-Gang Jiang
However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness.
1 code implementation • 20 Oct 2020 • Yuqian Fu, Li Zhang, Junke Wang, Yanwei Fu, Yu-Gang Jiang
Humans can easily recognize actions with only a few examples given, while the existing video recognition models still heavily rely on the large-scale labeled data inputs.
Ranked #1 on
Few Shot Action Recognition
on Kinetics-100
no code implementations • 28 Sep 2020 • Linxi Jiang, Xingjun Ma, Zejia Weng, James Bailey, Yu-Gang Jiang
Evaluating the robustness of a defense model is a challenging task in adversarial robustness research.
no code implementations • 20 Aug 2020 • Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.
no code implementations • ECCV 2020 • Shaoxiang Chen, Wenhao Jiang, Wei Liu, Yu-Gang Jiang
Inspired by the fact that there exist cross-modal interactions in the human brain, we propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos and thus improve performances on both tasks.
1 code implementation • 24 Jun 2020 • Xingjun Ma, Linxi Jiang, Hanxun Huang, Zejia Weng, James Bailey, Yu-Gang Jiang
Evaluating the robustness of a defense model is a challenging task in adversarial robustness research.
no code implementations • 26 May 2020 • Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue
Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable.
1 code implementation • CVPR 2020 • Hangyu Lin, Yanwei Fu, Yu-Gang Jiang, xiangyang xue
Unfortunately, the representation learned by SketchRNN is primarily for the generation tasks, rather than the other tasks of recognition and retrieval of sketches.
1 code implementation • CVPR 2020 • Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang
We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions.
no code implementations • 17 Jan 2020 • Wenxuan Wang, Yanwei Fu, Qiang Sun, Tao Chen, Chenjie Cao, Ziqi Zheng, Guoqiang Xu, Han Qiu, Yu-Gang Jiang, xiangyang xue
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we further evaluate several tasks of few-shot expression learning by virtue of our F2ED, which are to recognize the facial expressions given only few training instances.
no code implementations • NeurIPS 2019 • Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis
This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.
1 code implementation • 21 Nov 2019 • Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, Yu-Gang Jiang
To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions.
no code implementations • 25 Sep 2019 • Qiang Sun, Zhinan Cheng, Yanwei Fu, Wenxuan Wang, Yu-Gang Jiang, xiangyang xue
Instead of learning the cross features directly, DeepEnFM adopts the Transformer encoder as a backbone to align the feature embeddings with the clues of other fields.
no code implementations • 10 Apr 2019 • Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, Yu-Gang Jiang
Using three benchmark video datasets, we demonstrate that V-BAD can craft both untargeted and targeted attacks to fool two state-of-the-art deep video recognition models.
1 code implementation • 21 Dec 2018 • Guoyun Tu, Yanwei Fu, Boyang Li, Jiarui Gao, Yu-Gang Jiang, xiangyang xue
However, the sparsity of emotional expressions in the videos poses an obstacle to visual emotion analysis.
no code implementations • 28 Nov 2018 • Peng Lu, Hangyu Lin, Yanwei Fu, Shaogang Gong, Yu-Gang Jiang, xiangyang xue
Additionally, to study the tasks of sketch-based hairstyle retrieval, this paper contributes a new instance-level photo-sketch dataset - Hairstyle Photo-Sketch dataset, which is composed of 3600 sketches and photos, and 2400 sketch-photo pairs.
no code implementations • 16 Nov 2018 • You Qiaoben, Zheng Wang, Jianguo Li, Yinpeng Dong, Yu-Gang Jiang, Jun Zhu
Binary neural networks have great resource and computing efficiency, while suffer from long training procedure and non-negligible accuracy drops, when comparing to the full-precision counterparts.
no code implementations • 29 Sep 2018 • Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, Yu-Gang Jiang
This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI.
1 code implementation • 25 Sep 2018 • Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, xiangyang xue
Thus, a better solution to handle these critical problems is to train object detectors from scratch, which motivates our proposed method.
2 code implementations • 19 Sep 2018 • Xiangnan He, Zhankui He, Jingkuan Song, Zhenguang Liu, Yu-Gang Jiang, Tat-Seng Chua
As such, the key to an item-based CF method is in the estimation of item similarities.
no code implementations • ECCV 2018 • Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang
In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.
no code implementations • ECCV 2018 • Minjun Li, Hao-Zhi Huang, Lin Ma, Wei Liu, Tong Zhang, Yu-Gang Jiang
Recent studies on unsupervised image-to-image translation have made a remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss.
no code implementations • ACL 2018 • Minlong Peng, Qi Zhang, Yu-Gang Jiang, Xuanjing Huang
And we introduce a few target domain labeled data for learning domain-specific information.
1 code implementation • 15 Apr 2018 • Zitian Chen, Yanwei Fu, yinda zhang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal
In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.
no code implementations • 12 Apr 2018 • Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, Qi Tian
Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by constructing and exploring an image-tag-user graph.
4 code implementations • ECCV 2018 • Nanyang Wang, yinda zhang, Zhuwen Li, Yanwei Fu, Wei Liu, Yu-Gang Jiang
We propose an end-to-end deep learning architecture that produces a 3D shape in triangular mesh from a single color image.
Ranked #3 on
3D Object Reconstruction
on Data3D−R2N2
(Avg F1 metric)
1 code implementation • 8 Feb 2018 • Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, xiangyang xue
This paper targets at learning to score the figure skating sports videos.
2 code implementations • ECCV 2018 • Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, xiangyang xue
Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations.
no code implementations • CVPR 2018 • Changmao Cheng, Yanwei Fu, Yu-Gang Jiang, Wei Liu, Wenlian Lu, Jianfeng Feng, xiangyang xue
Inspired by the recent neuroscience studies on the left-right asymmetry of the human brain in processing low and high spatial frequency information, this paper introduces a dual skipping network which carries out coarse-to-fine object categorization.
no code implementations • 13 Oct 2017 • Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal, Shaogang Gong
With the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data.
no code implementations • ICCV 2017 • Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, xiangyang xue
Our model is able to learn deep discriminative feature representations at different scales and automatically determine the most suitable scales for matching.
4 code implementations • ICCV 2017 • Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, xiangyang xue
State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.
1 code implementation • 18 Jul 2017 • Xintong Han, Zuxuan Wu, Yu-Gang Jiang, Larry S. Davis
To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion.
no code implementations • 4 Jul 2017 • Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, Yu-Gang Jiang
This paper introduces the system we developed for the Google Cloud & YouTube-8M Video Understanding Challenge, which can be considered as a multi-label classification problem defined on top of the large scale YouTube-8M Dataset.
no code implementations • 14 Jun 2017 • Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang
More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.
no code implementations • CVPR 2017 • Zhiqiang Shen, Jianguo Li, Zhou Su, Minjun Li, Yurong Chen, Yu-Gang Jiang, xiangyang xue
This paper focuses on a novel and challenging vision task, dense video captioning, which aims to automatically describe a video clip with multiple informative and diverse caption sentences.
no code implementations • 29 Mar 2017 • Zhiqiang Shen, Yu-Gang Jiang, Dequan Wang, xiangyang xue
On both datasets, we achieve better results than many state-of-the-art approaches, including a few using oracle (manually annotated) bounding boxes in the test images.
1 code implementation • 22 Sep 2016 • Zuxuan Wu, Ting Yao, Yanwei Fu, Yu-Gang Jiang
Accelerated by the tremendous increase in Internet bandwidth and storage space, video data has been generated, published and spread explosively, becoming an indispensable part of today's big data.
no code implementations • CVPR 2016 • Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, Leonid Sigal
Large-scale action recognition and video categorization are important problems in computer vision.
no code implementations • 21 Apr 2016 • Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah
Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos.
no code implementations • 16 Nov 2015 • Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal
Emotion is a key element in user-generated videos.
no code implementations • 21 Sep 2015 • Zuxuan Wu, Yu-Gang Jiang, Xi Wang, Hao Ye, xiangyang xue, Jun Wang
A multi-stream framework is proposed to fully utilize the rich multimodal information in videos.
no code implementations • 8 Apr 2015 • Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, xiangyang xue
In this paper, we conduct an in-depth study to investigate important implementation options that may affect the performance of deep nets on video classification.
1 code implementation • 7 Apr 2015 • Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, xiangyang xue
In this paper, we propose a hybrid deep learning framework for video classification, which is able to model static spatial information, short-term motion, as well as long-term temporal clues in the videos.
no code implementations • 25 Feb 2015 • Yu-Gang Jiang, Zuxuan Wu, Jun Wang, xiangyang xue, Shih-Fu Chang
In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event.