no code implementations • ACL 2022 • Yubo Ma, Zehao Wang, Mukai Li, Yixin Cao, Meiqi Chen, Xinze Li, Wenqi Sun, Kunquan Deng, Kun Wang, Aixin Sun, Jing Shao
Events are fundamental building blocks of real-world happenings.
1 code implementation • 19 Jan 2023 • Bin Huang, Yangguang Li, Enze Xie, Feng Liang, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao
Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving.
no code implementations • 9 Jan 2023 • Huan Peng, Fenggang Liu, Yangguang Li, Bin Huang, Jing Shao, Nong Sang, Changxin Gao
Human-Object Interaction (HOI) detection aims to learn how human interacts with surrounding objects.
no code implementations • 8 Dec 2022 • Dong An, Yuankai Qi, Yangguang Li, Yan Huang, Liang Wang, Tieniu Tan, Jing Shao
Inspired by the robotics community, we introduce hybrid topo-metric maps into VLN, where a topological map is used for long-term planning and a metric map for short-term reasoning.
1 code implementation • 22 Oct 2022 • Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao
Document-level natural language inference (DOCNLI) is a new challenging task in natural language processing, aiming at judging the entailment relationship between a pair of hypothesis and premise documents.
1 code implementation • 20 Oct 2022 • Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao
Multimodal ambiguity and color bleeding remain challenging in colorization.
1 code implementation • 3 Sep 2022 • Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jing Shao, Chunlei Liu, Xianglong Liu
Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks.
no code implementations • 5 Aug 2022 • Ruining Tang, Zhenyu Liu, Yangguang Li, Yiguo Song, Hui Liu, Qide Wang, Jing Shao, Guifang Duan, Jianrong Tan
To alleviate this problem, a novel Task-decoupled Feature Distillation (TFD) is proposed by flexibly balancing the contributions of classification and regression tasks.
1 code implementation • 14 Jul 2022 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu
We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark.
1 code implementation • 27 Jun 2022 • Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li
This has led to a new research direction in parameter-efficient transfer learning.
Ranked #10 on
Action Recognition
on Something-Something V2
(using extra training data)
no code implementations • 23 Jun 2022 • Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao
Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.
no code implementations • 27 Apr 2022 • Yuanhan Zhang, Yichao Wu, Zhenfei Yin, Jing Shao, Ziwei Liu
In this work, we attempt to fill this gap by automatically addressing the noise problem from both label and data perspectives in a probabilistic manner.
no code implementations • COLING 2022 • Meiqi Chen, Yixin Cao, Kunquan Deng, Mukai Li, Kun Wang, Jing Shao, Yan Zhang
In this paper, we propose a novel Event Relational Graph TransfOrmer (ERGO) framework for DECI, which improves existing state-of-the-art (SOTA) methods upon two aspects.
no code implementations • 12 Apr 2022 • Haonan Qiu, Siyu Chen, Bei Gan, Kun Wang, Huafeng Shi, Jing Shao, Ziwei Liu
Realistic visual media synthesis is becoming a critical societal issue with the surge of face manipulation models; new forgery approaches emerge at an unprecedented pace.
no code implementations • 16 Mar 2022 • Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao
2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.
2 code implementations • 15 Mar 2022 • Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu
This work thus proposes a novel active learning framework for realistic dataset annotation.
Ranked #1 on
Image Classification
on Food-101
(using extra training data)
1 code implementation • 11 Mar 2022 • Yufeng Cui, Lichen Zhao, Feng Liang, Yangguang Li, Jing Shao
This is because researchers do not choose consistent training recipes and even use different data, hampering the fair comparison between different methods.
1 code implementation • ACL 2022 • Yubo Ma, Zehao Wang, Yixin Cao, Mukai Li, Meiqi Chen, Kun Wang, Jing Shao
We have conducted extensive experiments on three benchmarks, including both sentence- and document-level EAE.
no code implementations • 18 Jan 2022 • Luya Wang, Feng Liang, Yangguang Li, Honggang Zhang, Wanli Ouyang, Jing Shao
Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability.
1 code implementation • 16 Jan 2022 • Hao Wang, Yangguang Li, Zhen Huang, Yong Dou, Lingpeng Kong, Jing Shao
To alleviate feature suppression, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE).
no code implementations • 15 Dec 2021 • Yinan He, Lu Sheng, Jing Shao, Ziwei Liu, Zhaofan Zou, Zhizhi Guo, Shan Jiang, Curitis Sun, Guosheng Zhang, Keyao Wang, Haixiao Yue, Zhibin Hong, Wanguo Wang, Zhenyu Li, Qi Wang, Zhenli Wang, Ronghao Xu, Mingwen Zhang, Zhiheng Wang, Zhenhang Huang, Tianming Zhang, Ningning Zhao
The rapid progress of photorealistic synthesis techniques has reached a critical point where the boundary between real and manipulated images starts to blur.
1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao
Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.
Ranked #2 on
Long-tail Learning
on Places-LT
(using extra training data)
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
2 code implementations • ICLR 2022 • Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan
Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.
no code implementations • 27 Jun 2021 • Bowen Yang, Jing Zhang, Zhenfei Yin, Jing Shao
In practice, given a handful of labeled samples from a new deployment scenario (target domain) and abundant labeled face images in the existing source domain, the FAS system is expected to perform well in the new scenario without sacrificing the performance on the original domain.
1 code implementation • CVPR 2021 • Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu
To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification.
1 code implementation • 25 Feb 2021 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao
It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects.
no code implementations • 2 Nov 2020 • ZiHao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao
Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability.
no code implementations • ECCV 2020 • Kun Yuan, Quanquan Li, Jing Shao, Junjie Yan
In this paper, we attempt to optimize the connectivity in neural networks.
1 code implementation • ECCV 2020 • Yuanhan Zhang, Zhenfei Yin, Yidong Li, Guojun Yin, Junjie Yan, Jing Shao, Ziwei Liu
The main reason is that current face anti-spoofing datasets are limited in both quantity and diversity.
2 code implementations • ECCV 2020 • Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, Jing Shao
As realistic facial manipulation technologies have achieved remarkable progress, social concerns about potential malicious abuse of these technologies bring out an emerging research topic of face forgery detection.
1 code implementation • 16 Jun 2020 • Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu
This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.
Spatio-Temporal Action Localization
Temporal Action Localization
3 code implementations • CVPR 2021 • Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li
We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.
Ranked #1 on
Action Recognition
on AVA v2.1
2 code implementations • 30 Nov 2019 • Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, Shi-Min Hu
3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.
Ranked #6 on
Point Cloud Completion
on ShapeNet
1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li
Semantic image synthesis aims at generating photorealistic images from semantic layouts.
no code implementations • 25 Sep 2019 • Kun Yuan, Quanquan Li, Yucong Zhou, Jing Shao, Junjie Yan
Seeking effective networks has become one of the most crucial and practical areas in deep learning.
1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Ranked #8 on
Image Retrieval
on Flickr30K 1K test
no code implementations • CVPR 2019 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao
Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.
Ranked #2 on
Dense Captioning
on Visual Genome
no code implementations • CVPR 2019 • Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao
Synthesizing photo-realistic images from text descriptions is a challenging problem.
2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
no code implementations • 3 Mar 2019 • Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy
Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.
no code implementations • CVPR 2019 • Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li
Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.
1 code implementation • 16 Sep 2018 • Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan
Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs.
Ranked #9 on
Multi-Label Classification
on NUS-WIDE
no code implementations • ECCV 2018 • Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang
It is inspired by the observation of the weights in classification layer (called extit{anchors}) converge to the central direction of each class in hyperspace.
no code implementations • 28 Aug 2018 • Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao
Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos.
2 code implementations • 16 Aug 2018 • Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2. 35% top-1 error rate on CIFAR-10.
no code implementations • ECCV 2018 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy
We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.
3 code implementations • CVPR 2018 • Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang
Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.
no code implementations • CVPR 2018 • Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes learning disentangled but complementary face features with minimal supervision by face identification.
no code implementations • ECCV 2018 • Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang
The aim of image captioning is to generate captions by machine to describe image contents.
no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang
In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.
2 code implementations • ICCV 2017 • Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang
Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems.
Ranked #2 on
Pedestrian Attribute Recognition
on RAP
1 code implementation • CVPR 2018 • Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
Convolutional neural networks have gained a remarkable success in computer vision.
1 code implementation • CVPR 2017 • Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang
Person re-identification (ReID) is an important task in video surveillance and has various applications.
no code implementations • CVPR 2016 • Jing Shao, Chen-Change Loy, Kai Kang, Xiaogang Wang
Learning and capturing both appearance and dynamic representations are pivotal for crowd video understanding.
no code implementations • CVPR 2015 • Jing Shao, Kai Kang, Chen Change Loy, Xiaogang Wang
We further measure user study performance on WWW and compare this with the proposed deep models.
no code implementations • CVPR 2014 • Jing Shao, Chen Change Loy, Xiaogang Wang
Groups are the primary entities that make up a crowd.