1 code implementation • ECCV 2020 • Zhenzhi Wang, Ziteng Gao, Li-Min Wang, Zhifeng Li, Gangshan Wu
To address these problems, we present a new boundary-aware cascade network by introducing two novel components.
Ranked #14 on Action Segmentation on GTEA
1 code implementation • 28 Dec 2020 • Li-Min Wang, Hsing-Yi Lai, Sun-Ting Tsai, Chen Siang Ng, Shan-Jyun Wu, Meng-Xue Tsai, Yi-Ching Su, Daw-Wei Wang, Tzay-Ming Hong
Complex systems, such as life and languages, are governed by principles of evolution.
no code implementations • 30 Aug 2020 • Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Li-Min Wang, Shugong Xu
The task of spatial-temporal action detection has attracted increasing attention among researchers.
Ranked #3 on Action Detection on UCF Sports (Video-mAP 0.2 metric)
3 code implementations • ECCV 2020 • Jianchao Wu, Zhanghui Kuang, Li-Min Wang, Wayne Zhang, Gangshan Wu
In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.
no code implementations • 28 Jun 2020 • Yin-Dong Zheng, Zhao-Yang Liu, Tong Lu, Li-Min Wang
The existing action recognition methods are mainly based on clip-level classifiers such as two-stream CNNs or 3D CNNs, which are trained from the randomly selected clips and applied to densely sampled clips during testing.
Ranked #9 on Action Recognition on ActivityNet
2 code implementations • ICCV 2021 • Zhao-Yang Liu, Li-Min Wang, Wayne Wu, Chen Qian, Tong Lu
Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities.
1 code implementation • 5 May 2020 • Li-Min Wang, Sun-Ting Tsai, Shan-Jyun Wu, Meng-Xue Tsai, Daw-Wei Wang, Yi-Ching Su, Tzay-Ming Hong
One of the ultimate goals for linguists is to find universal properties in human languages.
no code implementations • ICLR 2020 • Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Li-Min Wang
Most existing 3D CNN structures for video representation learning are clip-based methods, and do not consider video-level temporal evolution of spatio-temporal features.
2 code implementations • 15 Apr 2020 • Yutao Cui, Cheng Jiang, Li-Min Wang, Gangshan Wu
To tackle this issue, we present the fully convolutional online tracking framework, coined as FCOT, and focus on enabling online learning for both classification and regression branches by using a target filter based tracking paradigm.
no code implementations • CVPR 2020 • Yan Li, Bin Ji, Xintian Shi, Jian-Guo Zhang, Bin Kang, Li-Min Wang
Temporal modeling is key for action recognition in videos.
2 code implementations • CVPR 2020 • Chengying Gao, Qi Liu, Qi Xu, Li-Min Wang, Jianzhuang Liu, Changqing Zou
We introduce the first method for automatic image generation from scene-level freehand sketches.
Ranked #2 on Sketch-to-Image Translation on SketchyCOCO
no code implementations • 18 Feb 2020 • Shiwen Zhang, Sheng Guo, Li-Min Wang, Weilin Huang, Matthew R. Scott
We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition.
1 code implementation • 18 Feb 2020 • Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Li-Min Wang
Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features.
1 code implementation • 16 Jan 2020 • Tianhao Li, Li-Min Wang
In addition, our CPD model yields a new state of the art for zero-shot action recognition on UCF101 by directly utilizing the learnt visual-textual embeddings.
2 code implementations • ECCV 2020 • Yixuan Li, Zixu Wang, Li-Min Wang, Gangshan Wu
The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization.
Ranked #5 on Action Detection on UCF101-24
no code implementations • 21 Nov 2019 • Zhao-Yang Liu, Donghao Luo, Yabiao Wang, Li-Min Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Tong Lu
To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet).
1 code implementation • ICCV 2019 • Ziteng Gao, Li-Min Wang, Gangshan Wu
Spatial downsampling layers are favored in convolutional neural networks (CNNs) to downscale feature maps for larger receptive fields and less memory consumption.
Ranked #147 on Object Detection on COCO test-dev (using extra training data)
no code implementations • 27 May 2019 • Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Li-Min Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao
To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation.
1 code implementation • CVPR 2019 • Dapeng Du, Li-Min Wang, Huiling Wang, Kai Zhao, Gangshan Wu
Empirically, we verify that this new semi-supervised setting is able to further enhance the performance of recognition network.
2 code implementations • CVPR 2019 • Jianchao Wu, Li-Min Wang, Li Wang, Jie Guo, Gangshan Wu
To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors.
Ranked #3 on Group Activity Recognition on Collective Activity
8 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen
In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.
1 code implementation • ECCV 2018 • Jie Guo, Zuojian Zhou, Li-Min Wang
We propose a sparse and low-rank reflection model for specular highlight detection and removal using a single input image.
1 code implementation • 24 Jan 2018 • Zhe Wang, Xiaoyi Liu, Liangjian Chen, Li-Min Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes
Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision.
no code implementations • 9 Aug 2017 • Wen Li, Li-Min Wang, Wei Li, Eirikur Agustsson, Luc van Gool
Our new WebVision database and relevant studies in this work would benefit the advance of learning state-of-the-art visual models with minimum supervision based on web data.
no code implementations • 16 May 2017 • Wen Li, Li-Min Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc van Gool
The 2017 WebVision challenge consists of two tracks, the image classification task on WebVision test set, and the transfer learning task on PASCAL VOC 2012 dataset.
6 code implementations • ICCV 2017 • Yue Zhao, Yuanjun Xiong, Li-Min Wang, Zhirong Wu, Xiaoou Tang, Dahua Lin
Detecting actions in untrimmed videos is an important yet challenging task.
Ranked #6 on Action Recognition on THUMOS’14
no code implementations • CVPR 2017 • Jie Song, Li-Min Wang, Luc van Gool, Otmar Hilliges
Temporal information can provide additional cues about the location of body joints and help to alleviate these issues.
Ranked #4 on Pose Estimation on UPenn Action
1 code implementation • 8 Mar 2017 • Yuanjun Xiong, Yue Zhao, Li-Min Wang, Dahua Lin, Xiaoou Tang
Detecting activities in untrimmed videos is an important but challenging task.
Ranked #28 on Temporal Action Localization on ActivityNet-1.3
1 code implementation • 1 Sep 2016 • Zhe Wang, Li-Min Wang, Yali Wang, Bo-Wen Zhang, Yu Qiao
In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition.
1 code implementation • 2 Aug 2016 • Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang
This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.
1 code implementation • CVPR 2016 • Bowen Zhang, Li-Min Wang, Zhe Wang, Yu Qiao, Hanli Wang
The deep two-stream architecture exhibited excellent performance on video based action recognition.
Ranked #74 on Action Recognition on UCF101
no code implementations • 27 Jan 2016 • Sheng Guo, Weilin Huang, Li-Min Wang, Yu Qiao
Secondly, we propose a new Local Convolutional Supervision (LCS) layer to enhance the local structure of the image by directly propagating the label information to the convolutional layers.
no code implementations • CVPR 2014 • Zhuowei Cai, Li-Min Wang, Xiaojiang Peng, Yu Qiao
Kernel average is then applied on these components to produce recognition result.
no code implementations • 18 May 2014 • Xiaojiang Peng, Li-Min Wang, Xingxing Wang, Yu Qiao
Many efforts have been made in each step independently in different scenarios and their effect on action recognition is still unknown.
no code implementations • CVPR 2013 • Li-Min Wang, Yu Qiao, Xiaoou Tang
We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability.