1 code implementation • ECCV 2020 • Zhenzhi Wang, Ziteng Gao, Li-Min Wang, Zhifeng Li, Gangshan Wu
To address these problems, we present a new boundary-aware cascade network by introducing two novel components.
Ranked #11 on
Action Segmentation
on GTEA
1 code implementation • 1 Mar 2023 • Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, LiMin Wang
In this paper, we propose a novel module to explicitly extract motion and appearance information via a unifying operation.
Ranked #1 on
Video Frame Interpolation
on MSU Video Frame Interpolation
(PSNR metric)
1 code implementation • 13 Feb 2023 • Jiange Yang, Sheng Guo, Gangshan Wu, LiMin Wang
Our CoMAE presents a curriculum learning strategy to unify the two popular self-supervised representation learning algorithms: contrastive learning and masked image modeling.
1 code implementation • 6 Feb 2023 • Yutao Cui, Cheng Jiang, Gangshan Wu, LiMin Wang
Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration.
Ranked #1 on
Visual Object Tracking
on LaSOT
1 code implementation • 30 Nov 2022 • Jie Liu, Chao Chen, Jie Tang, Gangshan Wu
In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a $3\times3$ convolution is applied to process the finest details.
2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang
The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.
no code implementations • 2 May 2022 • Tao Lu, Chunxu Liu, Youxin Chen, Gangshan Wu, LiMin Wang
In the existing work, each point in the cloud may inevitably be selected as the neighbors of multiple aggregation centers, as all centers will gather neighbor features from the whole point cloud independently.
Ranked #26 on
3D Point Cloud Classification
on ScanObjectNN
1 code implementation • 18 Apr 2022 • Zongcai Du, Ding Liu, Jie Liu, Jie Tang, Gangshan Wu, Lean Fu
Besides, FMEN-S achieves the lowest memory consumption and the second shortest runtime in NTIRE 2022 challenge on efficient super-resolution.
1 code implementation • CVPR 2022 • Yutao Cui, Cheng Jiang, LiMin Wang, Gangshan Wu
Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration.
Ranked #4 on
Visual Object Tracking
on GOT-10k
Semi-Supervised Video Object Segmentation
Visual Object Tracking
no code implementations • 1 Mar 2022 • Jing Tan, Yuhong Wang, Gangshan Wu, LiMin Wang
Instead, in this paper, we present Temporal Perceiver, a general architecture with Transformer, offering a unified solution to the detection of arbitrary generic boundaries, ranging from shot-level, event-level, to scene-level GBDs.
1 code implementation • 31 Dec 2021 • Bin-Cheng Yang, Gangshan Wu
By introducing dual path connections inspired by Dual Path Networks into EMSRDPN, it uses residual connections and dense connections in an integrated way in most network layers.
1 code implementation • 27 Nov 2021 • Jie Liu, Jie Tang, Gangshan Wu
We found that the standard deviation of the residual feature shrinks a lot after normalization layers, which causes the performance degradation in SR networks.
1 code implementation • 24 Oct 2021 • Zhenxi Zhu, LiMin Wang, Sheng Guo, Gangshan Wu
In this paper, we aim to present an in-depth study on few-shot video classification by making three contributions.
no code implementations • ICCV 2021 • Ziteng Gao, LiMin Wang, Gangshan Wu
In this paper, we break the convention of the same training samples for these two heads in dense detectors and explore a novel supervisory paradigm, termed as Mutual Supervision (MuSu), to respectively and mutually assign training samples for the classification and regression head to ensure this consistency.
1 code implementation • 10 Sep 2021 • Zhenzhi Wang, LiMin Wang, Tao Wu, TianHao Li, Gangshan Wu
Instead, from a perspective on temporal grounding as a metric-learning problem, we present a Mutual Matching Network (MMN), to directly model the similarity between language queries and video moments in a joint embedding space.
1 code implementation • ICCV 2021 • TianHao Li, LiMin Wang, Gangshan Wu
In this paper, we show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition.
Ranked #32 on
Long-tail Learning
on CIFAR-100-LT (ρ=100)
1 code implementation • ICCV 2021 • Yao Teng, LiMin Wang, Zhifeng Li, Gangshan Wu
Specifically, we design an efficient method for frame-level VidSGG, termed as {\em Target Adaptive Context Aggregation Network} (TRACE), with a focus on capturing spatio-temporal context information for relation recognition.
1 code implementation • CVPR 2021 • Tao Lu, LiMin Wang, Gangshan Wu
Previous point cloud semantic segmentation networks use the same process to aggregate features from neighbors of the same category and different categories.
Ranked #1 on
Semantic Segmentation
on SYNTHIA
1 code implementation • 6 Jun 2021 • Zeyu Ruan, Changqing Zou, Longhai Wu, Gangshan Wu, LiMin Wang
Three-dimensional face dense alignment and reconstruction in the wild is a challenging problem as partial facial information is commonly missing in occluded and large pose face images.
Ranked #1 on
3D Face Reconstruction
on AFLW2000-3D
2 code implementations • 20 May 2021 • Zongcai Du, Jie Liu, Jie Tang, Gangshan Wu
Along with the rapid development of real-world applications, higher requirements on the accuracy and efficiency of image super-resolution (SR) are brought forward.
1 code implementation • ICCV 2021 • Yixuan Li, Lei Chen, Runyu He, Zhenzhi Wang, Gangshan Wu, LiMin Wang
Spatio-temporal action detection is an important and challenging problem in video understanding.
1 code implementation • ICCV 2021 • Yuan Zhi, Zhan Tong, LiMin Wang, Gangshan Wu
First, we present two different motion representations to enable us to efficiently distinguish the motion-salient frames from the background.
1 code implementation • 1 Apr 2021 • Yutao Cui, Cheng Jiang, LiMin Wang, Gangshan Wu
Accurate tracking is still a challenging task due to appearance variations, pose and view changes, and geometric deformations of target in videos.
Ranked #1 on
Visual Object Tracking
on VOT2019
2 code implementations • ICCV 2021 • Jing Tan, Jiaqi Tang, LiMin Wang, Gangshan Wu
Extensive experiments on THUMOS14 and ActivityNet-1. 3 benchmarks demonstrate the effectiveness of RTD-Net, on both tasks of temporal action proposal generation and temporal action detection.
no code implementations • 1 Jan 2021 • LiMin Wang, Bin Ji, Zhan Tong, Gangshan Wu
To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition.
1 code implementation • CVPR 2021 • LiMin Wang, Zhan Tong, Bin Ji, Gangshan Wu
To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition.
Ranked #11 on
Action Recognition
on Something-Something V1
2 code implementations • 24 Sep 2020 • Jie Liu, Jie Tang, Gangshan Wu
Thanks to FDC, we can rethink the information multi-distillation network (IMDN) and propose a lightweight and accurate SISR model called residual feature distillation network (RFDN).
3 code implementations • 15 Sep 2020 • Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Maitreya Suin, Kuldeep Purohit, A. N. Rajagopalan, Xiaochuan Li, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Abdul Muqeet, Jiwon Hwang, Subin Yang, JungHeum Kang, Sung-Ho Bae, Yongwoo Kim, Geun-Woo Jeon, Jun-Ho Choi, Jun-Hyuk Kim, Jong-Seok Lee, Steven Marty, Eric Marty, Dongliang Xiong, Siang Chen, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Haicheng Wang, Vineeth Bhaskara, Alex Levinshtein, Stavros Tsogkas, Allan Jepson, Xiangzhen Kong, Tongtong Zhao, Shanshan Zhao, Hrishikesh P. S, Densen Puthussery, Jiji C. V, Nan Nan, Shuai Liu, Jie Cai, Zibo Meng, Jiaming Ding, Chiu Man Ho, Xuehui Wang, Qiong Yan, Yuzhi Zhao, Long Chen, Jiangtao Zhang, Xiaotong Luo, Liang Chen, Yanyun Qu, Long Sun, Wenhao Wang, Zhenbing Liu, Rushi Lan, Rao Muhammad Umer, Christian Micheloni
This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results.
1 code implementation • ECCV 2020 • Jianchao Wu, Zhanghui Kuang, Li-Min Wang, Wayne Zhang, Gangshan Wu
In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.
2 code implementations • 15 Apr 2020 • Yutao Cui, Cheng Jiang, Li-Min Wang, Gangshan Wu
To tackle this issue, we present the fully convolutional online tracking framework, coined as FCOT, and focus on enabling online learning for both classification and regression branches by using a target filter based tracking paradigm.
2 code implementations • ECCV 2020 • Yixuan Li, Zixu Wang, Li-Min Wang, Gangshan Wu
The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization.
Ranked #1 on
Action Detection
on UCF101-24
1 code implementation • 23 Nov 2019 • Zhe Zhang, Jie Tang, Gangshan Wu
Specifically, our LPN-50 can achieve 68. 7 in AP score on the COCO test-dev set, with only 2. 7M parameters and 1. 0 GFLOPs, while the inference speed is 17 FPS on an Intel i7-8700K CPU machine.
1 code implementation • ICCV 2019 • Ziteng Gao, Li-Min Wang, Gangshan Wu
Spatial downsampling layers are favored in convolutional neural networks (CNNs) to downscale feature maps for larger receptive fields and less memory consumption.
Ranked #134 on
Object Detection
on COCO minival
no code implementations • 27 May 2019 • Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Li-Min Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao
To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation.
no code implementations • CVPR 2019 • Dapeng Du, Li-Min Wang, Huiling Wang, Kai Zhao, Gangshan Wu
Empirically, we verify that this new semi-supervised setting is able to further enhance the performance of recognition network.
2 code implementations • CVPR 2019 • Jianchao Wu, Li-Min Wang, Li Wang, Jie Guo, Gangshan Wu
To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors.
Ranked #3 on
Group Activity Recognition
on Collective Activity
no code implementations • ICCV 2015 • Ran Ju, Tongwei Ren, Gangshan Wu
We also demonstrate in a few applications how our method can be used as a basic tool for stereo image editing.