Search Results for author: Li-Min Wang

Found 36 papers, 21 papers with code

Finding Action Tubes with a Sparse-to-Dense Framework

no code implementations30 Aug 2020 Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Li-Min Wang, Shugong Xu

The task of spatial-temporal action detection has attracted increasing attention among researchers.

Ranked #3 on Action Detection on UCF Sports (Video-mAP 0.2 metric)

Action Detection

Context-Aware RCNN: A Baseline for Action Detection in Videos

3 code implementations ECCV 2020 Jianchao Wu, Zhanghui Kuang, Li-Min Wang, Wayne Zhang, Gangshan Wu

In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.

Action Detection Action Recognition

Dynamic Sampling Networks for Efficient Action Recognition in Videos

no code implementations28 Jun 2020 Yin-Dong Zheng, Zhao-Yang Liu, Tong Lu, Li-Min Wang

The existing action recognition methods are mainly based on clip-level classifiers such as two-stream CNNs or 3D CNNs, which are trained from the randomly selected clips and applied to densely sampled clips during testing.

Action Recognition In Videos

TAM: Temporal Adaptive Module for Video Recognition

2 code implementations ICCV 2021 Zhao-Yang Liu, Li-Min Wang, Wayne Wu, Chen Qian, Tong Lu

Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities.

Action Recognition Video Recognition

V4D: 4D Convolutional Neural Networks for Video-level Representation Learning

no code implementations ICLR 2020 Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Li-Min Wang

Most existing 3D CNN structures for video representation learning are clip-based methods, and do not consider video-level temporal evolution of spatio-temporal features.

Representation Learning Video Recognition

Fully Convolutional Online Tracking

2 code implementations15 Apr 2020 Yutao Cui, Cheng Jiang, Li-Min Wang, Gangshan Wu

To tackle this issue, we present the fully convolutional online tracking framework, coined as FCOT, and focus on enabling online learning for both classification and regression branches by using a target filter based tracking paradigm.

Real-Time Visual Tracking regression

Knowledge Integration Networks for Action Recognition

no code implementations18 Feb 2020 Shiwen Zhang, Sheng Guo, Li-Min Wang, Weilin Huang, Matthew R. Scott

We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition.

Action Recognition Human Parsing +2

V4D:4D Convolutional Neural Networks for Video-level Representation Learning

1 code implementation18 Feb 2020 Shiwen Zhang, Sheng Guo, Weilin Huang, Matthew R. Scott, Li-Min Wang

Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features.

Long-range modeling Representation Learning +1

Learning Spatiotemporal Features via Video and Text Pair Discrimination

1 code implementation16 Jan 2020 Tianhao Li, Li-Min Wang

In addition, our CPD model yields a new state of the art for zero-shot action recognition on UCF101 by directly utilizing the learnt visual-textual embeddings.

Action Classification Action Recognition +2

Actions as Moving Points

2 code implementations ECCV 2020 Yixuan Li, Zixu Wang, Li-Min Wang, Gangshan Wu

The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization.

Action Detection Action Recognition

TEINet: Towards an Efficient Architecture for Video Recognition

no code implementations21 Nov 2019 Zhao-Yang Liu, Donghao Luo, Yabiao Wang, Li-Min Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Tong Lu

To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet).

Action Recognition Video Recognition

LIP: Local Importance-based Pooling

1 code implementation ICCV 2019 Ziteng Gao, Li-Min Wang, Gangshan Wu

Spatial downsampling layers are favored in convolutional neural networks (CNNs) to downscale feature maps for larger receptive fields and less memory consumption.

Ranked #147 on Object Detection on COCO test-dev (using extra training data)

Image Classification Object Detection

Dynamically Visual Disambiguation of Keyword-based Image Search

no code implementations27 May 2019 Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Li-Min Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao

To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation.

General Classification Image Retrieval

Translate-to-Recognize Networks for RGB-D Scene Recognition

1 code implementation CVPR 2019 Dapeng Du, Li-Min Wang, Huiling Wang, Kai Zhao, Gangshan Wu

Empirically, we verify that this new semi-supervised setting is able to further enhance the performance of recognition network.

Scene Recognition Translation

Learning Actor Relation Graphs for Group Activity Recognition

2 code implementations CVPR 2019 Jianchao Wu, Li-Min Wang, Li Wang, Jie Guo, Gangshan Wu

To this end, we propose to build a flexible and efficient Actor Relation Graph (ARG) to simultaneously capture the appearance and position relation between actors.

Action Recognition Group Activity Recognition +1

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

8 code implementations5 Nov 2018 Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Action Recognition Temporal Action Localization

Single Image Highlight Removal with a Sparse and Low-Rank Reflection Model

1 code implementation ECCV 2018 Jie Guo, Zuojian Zhou, Li-Min Wang

We propose a sparse and low-rank reflection model for specular highlight detection and removal using a single input image.

Highlight Detection highlight removal

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

1 code implementation24 Jan 2018 Zhe Wang, Xiaoyi Liu, Liangjian Chen, Li-Min Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes

Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision.

Multiple-choice POS +3

WebVision Database: Visual Learning and Understanding from Web Data

no code implementations9 Aug 2017 Wen Li, Li-Min Wang, Wei Li, Eirikur Agustsson, Luc van Gool

Our new WebVision database and relevant studies in this work would benefit the advance of learning state-of-the-art visual models with minimum supervision based on web data.

Domain Adaptation

WebVision Challenge: Visual Learning and Understanding With Web Data

no code implementations16 May 2017 Wen Li, Li-Min Wang, Wei Li, Eirikur Agustsson, Jesse Berent, Abhinav Gupta, Rahul Sukthankar, Luc van Gool

The 2017 WebVision challenge consists of two tracks, the image classification task on WebVision test set, and the transfer learning task on PASCAL VOC 2012 dataset.

Benchmarking Image Classification +1

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition

1 code implementation1 Sep 2016 Zhe Wang, Li-Min Wang, Yali Wang, Bo-Wen Zhang, Yu Qiao

In this paper, we propose a hybrid representation, which leverages the discriminative capacity of CNNs and the simplicity of descriptor encoding schema for image recognition, with a focus on scene recognition.

Scene Recognition

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

1 code implementation2 Aug 2016 Yuanjun Xiong, Li-Min Wang, Zhe Wang, Bo-Wen Zhang, Hang Song, Wei Li, Dahua Lin, Yu Qiao, Luc van Gool, Xiaoou Tang

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016.

General Classification Video Classification

Locally-Supervised Deep Hybrid Model for Scene Recognition

no code implementations27 Jan 2016 Sheng Guo, Weilin Huang, Li-Min Wang, Yu Qiao

Secondly, we propose a new Local Convolutional Supervision (LCS) layer to enhance the local structure of the image by directly propagating the label information to the convolutional layers.

General Classification Image Classification +1

Motionlets: Mid-level 3D Parts for Human Motion Recognition

no code implementations CVPR 2013 Li-Min Wang, Yu Qiao, Xiaoou Tang

We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.