Search Results for author: Tong Lu

Found 62 papers, 44 papers with code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

1 code implementation • 25 Apr 2024 • Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

3,015

Paper
Code

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

1 code implementation • 14 Mar 2024 • Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Ranked #1 on Temporal Action Localization on FineAction

Moment Retrieval Temporal Action Localization +1

162

Paper
Code

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

1 code implementation • 4 Mar 2024 • Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang

Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage processing high-resolution inputs.

Image Classification

258

Paper
Code

PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

1 code implementation • 4 Feb 2024 • Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts.

Reflection Removal

Paper
Code

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

1 code implementation • 18 Jan 2024 • Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie zhou, Hongsheng Li, Yu Qiao, Jifeng Dai

Developing generative models for interleaved image-text data has both research and practical value.

161

Paper
Code

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

1 code implementation • 11 Jan 2024 • Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The advancements in speed and efficiency of DCNv4, combined with its robust performance across diverse vision tasks, show its potential as a foundational building block for future vision models.

Image Classification Image Generation +1

361

Paper
Code

CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

1 code implementation • 3 Jan 2024 • Yi Rong, Haoran Zhou, Lixin Yuan, Cheng Mei, Jiahao Wang, Tong Lu

Point cloud completion is an indispensable task for recovering complete point clouds due to incompleteness caused by occlusion, limited sensor resolution, etc.

Point Cloud Completion

Paper
Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations • 21 Dec 2023 • Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +11

3,015

Paper
Code

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

1 code implementation • 5 Dec 2023 • Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.

Autonomous Driving

101

Paper
Code

Deep Video Restoration for Under-Display Camera

no code implementations • 9 Sep 2023 • Xuanxi Chen, Tao Wang, Ziqian Shao, Kaihao Zhang, Wenhan Luo, Tong Lu, Zikun Liu, Tae-Kyun Kim, Hongdong Li

With the pipeline, we build the first large-scale UDC video restoration dataset called PexelsUDC, which includes two subsets named PexelsUDC-T and PexelsUDC-P corresponding to different displays for UDC.

Video Restoration

Paper
Add Code

Memory-and-Anticipation Transformer for Online Action Understanding

1 code implementation • ICCV 2023 • Jiahao Wang, Guo Chen, Yifei HUANG, LiMin Wang, Tong Lu

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

Ranked #1 on Action Detection on THUMOS' 14

Action Understanding Online Action Detection

Paper
Code

FB-BEV: BEV Representation from Forward-Backward View Transformations

1 code implementation • ICCV 2023 • Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez

Currently, the two most prominent VTM paradigms are forward projection and backward projection.

571

Paper
Code

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

1 code implementation • 3 Aug 2023 • Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao

We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world.

Question Answering Retrieval +1

390

Paper
Code

AVSegFormer: Audio-Visual Segmentation with Transformer

1 code implementation • 3 Jul 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture.

Decoder Scene Understanding +1

Paper
Code

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

no code implementations • 29 May 2023 • Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, Hongdong Li

Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer.

Image Restoration Rain Removal

Paper
Add Code

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation • 22 May 2023 • Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Decoder Video Understanding

152

Paper
Code

Graph Propagation Transformer for Graph Representation Learning

1 code implementation • 19 May 2023 • Zhe Chen, Hao Tan, Tao Wang, Tianrun Shen, Tong Lu, Qiuying Peng, Cheng Cheng, Yue Qi

The core insight of our method is to fully consider the information propagation among nodes and edges in a graph when building the attention module in the transformer blocks.

Ranked #2 on Graph Regression on PCQM4M-LSC (Validation MAE metric)

Graph Learning Graph Property Prediction +3

Paper
Code

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

2 code implementations • NeurIPS 2023 • Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie zhou, Yu Qiao, Jifeng Dai

We hope this model can set a new baseline for generalist vision and language models.

Decoder Language Modelling +1

3,145

Paper
Code

MRSN: Multi-Relation Support Network for Video Action Detection

no code implementations • 24 Apr 2023 • Yin-Dong Zheng, Guo Chen, Minglei Yuan, Tong Lu

Action detection is a challenging video understanding task, requiring modeling spatio-temporal and interaction relations.

Action Detection Relation +1

Paper
Add Code

DDP: Diffusion Model for Dense Visual Prediction

1 code implementation • ICCV 2023 • Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.

Ranked #2 on Monocular Depth Estimation on SUN-RGBD

Denoising Monocular Depth Estimation +2

152

Paper
Code

Champion Solution for the WSDM2023 Toloka VQA Challenge

1 code implementation • 22 Jan 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.

Question Answering Visual Grounding +1

1,147

Paper
Code

Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method

1 code implementation • 22 Dec 2022 • Tao Wang, Kaihao Zhang, Tianrun Shen, Wenhan Luo, Bjorn Stenger, Tong Lu

In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.

4k 8k +3

152

Paper
Code

Restoring Vision in Hazy Weather with Hierarchical Contrastive Learning

no code implementations • 22 Dec 2022 • Tao Wang, Guangpin Tao, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Xiaoqin Zhang, Tong Lu

HCD consists of a hierarchical dehazing network (HDN) and a novel hierarchical contrastive loss (HCL).

Contrastive Learning Image Dehazing +3

Paper
Add Code

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

In this report, we present our champion solutions to five tracks at Ego4D challenge.

Ranked #1 on State Change Object Detection on Ego4D

Future Hand Prediction Moment Queries +7

Paper
Code

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

no code implementations • 16 Nov 2022 • Yin-Dong Zheng, Guo Chen, Jiahao Wang, Tong Lu, LiMin Wang

Our method achieves an accuracy of 0. 796 on OSCC while achieving an absolute temporal localization error of 0. 516 on PNR.

Human-Object Interaction Detection Object +3

Paper
Add Code

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

2,357

Paper
Code

A Survey of Deep Face Restoration: Denoise, Super-Resolution, Deblur, Artifact Removal

1 code implementation • 5 Nov 2022 • Tao Wang, Kaihao Zhang, Xuanxi Chen, Wenhan Luo, Jiankang Deng, Tong Lu, Xiaochun Cao, Wei Liu, Hongdong Li, Stefanos Zafeiriou

Second, we discuss the challenges of face restoration.

Image Restoration Super-Resolution

381

Paper
Code

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

2 code implementations • 23 Sep 2022 • Ruo-Ze Liu, Zhen-Jia Pang, Zhou-Yu Meng, Wenhai Wang, Yang Yu, Tong Lu

In this work, we investigate a set of RL techniques for the full-length game of StarCraft II.

reinforcement-learning Reinforcement Learning (RL) +3

295

Paper
Code

Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

no code implementations • 26 Jul 2022 • Guangchen Shi, Yirui Wu, Jun Liu, Shaohua Wan, Wenhai Wang, Tong Lu

Second, to resist overfitting issues caused by few training samples, a hyper-class embedding is learned by clustering all category embeddings for initialization and aligned with category embedding of the new class for enhancement, where learned knowledge assists to learn new knowledge, thus alleviating performance dependence on training data scale.

Few-Shot Semantic Segmentation Segmentation +1

Paper
Add Code

SeedFormer: Patch Seeds based Point Cloud Completion with Upsample Transformer

1 code implementation • 21 Jul 2022 • Haoran Zhou, Yun Cao, Wenqing Chu, Junwei Zhu, Tong Lu, Ying Tai, Chengjie Wang

Point cloud completion has become increasingly popular among generation tasks of 3D point clouds, as it is a challenging yet indispensable problem to recover the complete shape of a 3D object from its partial observation.

Ranked #7 on Point Cloud Completion on Completion3D

Point Cloud Completion

Paper
Code

Vision Transformer Adapter for Dense Predictions

1 code implementation • 17 May 2022 • Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT).

Ranked #4 on Semantic Segmentation on PASCAL Context

Instance Segmentation Panoptic Segmentation +1

1,147

Paper
Code

Uncertainty-based Network for Few-shot Image Classification

no code implementations • 17 May 2022 • Minglei Yuan, Qian Xu, Chunhao Cai, Yin-Dong Zheng, Tao Wang, Tong Lu

Specifically, we first data augment and classify the query instance and calculate the mutual information of these classification scores.

Classification Few-Shot Image Classification +1

Paper
Add Code

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

2 code implementations • 5 May 2022 • Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, LiMin Wang

Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction.

Ranked #1 on Temporal Action Localization on THUMOS14

Action Detection object-detection +3

Paper
Code

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

Ranked #2 on Bird's-Eye View Semantic Segmentation on Lyft Level 5

3D Object Detection Autonomous Driving +2

2,984

Paper
Code

Refine-Net: Normal Refinement Neural Network for Noisy Point Clouds

1 code implementation • 23 Mar 2022 • Haoran Zhou, Honghua Chen, Yingkui Zhang, Mingqiang Wei, Haoran Xie, Jun Wang, Tong Lu, Jing Qin, Xiao-Ping Zhang

Differently, our network is designed to refine the initial normal of each point by extracting additional information from multiple feature representations.

Paper
Code

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

1 code implementation • 7 Dec 2021 • Guo Chen, Yin-Dong Zheng, LiMin Wang, Tong Lu

Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries.

Ranked #19 on Temporal Action Localization on ActivityNet-1.3

Action Detection Temporal Action Localization

Paper
Code

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

2 code implementations • 3 Nov 2021 • Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu

We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).

Ranked #2 on Scene Text Detection on MSRA-TD500

Image Classification Scene Text Detection +1

435

Paper
Code

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

no code implementations • NeurIPS 2021 • Guangpin Tao, Xiaozhong Ji, Wenzhuo Wang, Shuo Chen, Chuming Lin, Yun Cao, Tong Lu, Donghao Luo, Ying Tai

In this paper, we propose a novel blind SR framework to super-resolve LR images degraded by arbitrary blur kernel with accurate kernel estimation in frequency domain.

Image Super-Resolution Translation

Paper
Add Code

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

no code implementations • 20 Oct 2021 • Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu

Recent approaches for end-to-end text spotting have achieved promising results.

Text Detection Text Spotting

Paper
Add Code

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

2 code implementations • CVPR 2022 • Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu

Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.

Ranked #4 on Panoptic Segmentation on COCO test-dev

Decoder Instance Segmentation +2

200

Paper
Code

Learning Class-level Prototypes for Few-shot Learning

no code implementations • 25 Aug 2021 • Minglei Yuan, Wenhai Wang, Tao Wang, Chunhao Cai, Qian Xu, Tong Lu

Few-shot learning aims to recognize new categories using very few labeled samples.

Few-Shot Learning

Paper
Add Code

Adaptive Graph Convolution for Point Cloud Analysis

1 code implementation • ICCV 2021 • Haoran Zhou, Yidan Feng, Mingsheng Fang, Mingqiang Wei, Jing Qin, Tong Lu

Convolution on 3D point clouds that generalized from 2D grid-like domains is widely researched yet far from perfect.

Ranked #10 on 3D Point Cloud Classification on IntrA

3D Point Cloud Classification Point Cloud Classification

103

Paper
Code

PVT v2: Improved Baselines with Pyramid Vision Transformer

16 code implementations • 25 Jun 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

We hope this work will facilitate state-of-the-art Transformer researches in computer vision.

Ranked #23 on Object Detection on COCO-O

Image Classification Object Detection +1

30,231

Paper
Code

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

435

Paper
Code

An Introduction of mini-AlphaStar

1 code implementation • 14 Apr 2021 • Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu

StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.

Starcraft Starcraft II

295

Paper
Code

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

1 code implementation • 22 Mar 2021 • Zhe Chen, Wenhai Wang, Enze Xie, Tong Lu, Ping Luo

(1) We divide input image into small patches and adopt TIN, successfully transferring image style with arbitrary high-resolution.

Style Transfer

177

Paper
Code

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

9 code implementations • ICCV 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao

Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.

Ranked #5 on Semantic Segmentation on SynPASS

Image Classification Instance Segmentation +3

28,164

Paper
Code

Frequency Consistent Adaptation for Real World Super Resolution

no code implementations • 18 Dec 2020 • Xiaozhong Ji, Guangpin Tao, Yun Cao, Ying Tai, Tong Lu, Chengjie Wang, Jilin Li, Feiyue Huang

From this point of view, we design a novel Frequency Consistent Adaptation (FCA) that ensures the frequency domain consistency when applying existing SR methods to the real scene.

Super-Resolution

Paper
Add Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

Dynamic Sampling Networks for Efficient Action Recognition in Videos

no code implementations • 28 Jun 2020 • Yin-Dong Zheng, Zhao-Yang Liu, Tong Lu, Li-Min Wang

The existing action recognition methods are mainly based on clip-level classifiers such as two-stream CNNs or 3D CNNs, which are trained from the randomly selected clips and applied to densely sampled clips during testing.

Ranked #9 on Action Recognition on ActivityNet

Action Recognition In Videos

Paper
Add Code

Channel Relationship Prediction with Forget-Update Module for Few-shot Classification

no code implementations • 16 Jun 2020 • Minglei Yuan, Cunhao Cai, Tong Lu

The proposed pipeline, which consists of channel vector sequence construction module and forget-update module, can infer the relationship between the query sample and support samples in few-shot classification scenario.

General Classification

Paper
Add Code

A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

no code implementations • 26 May 2020 • Sauradip Nag, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein

The proposed method fuses gradient magnitude and direction coherence of text pixels in a new way for detecting candidate regions.

Clustering Text Detection

Paper
Add Code

TAM: Temporal Adaptive Module for Video Recognition

2 code implementations • ICCV 2021 • Zhao-Yang Liu, Li-Min Wang, Wayne Wu, Chen Qian, Tong Lu

Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities.

Action Recognition Video Recognition

193

Paper
Code

TEINet: Towards an Efficient Architecture for Video Recognition

no code implementations • 21 Nov 2019 • Zhao-Yang Liu, Donghao Luo, Yabiao Wang, Li-Min Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Tong Lu

To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet).

Action Recognition Video Recognition

Paper
Add Code

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Ranked #8 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Segmentation +1

4,131

Paper
Code

Shape Robust Text Detection with Progressive Scale Expansion Network

19 code implementations • CVPR 2019 • Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao

Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.

Ranked #12 on Scene Text Detection on SCUT-CTW1500

Optical Character Recognition (OCR) Scene Text Detection +1

39,309

Paper
Code

Efficient Reinforcement Learning for StarCraft by Abstract Forward Models and Transfer Learning

1 code implementation • 2 Mar 2019 • Ruo-Ze Liu, Haifeng Guo, Xiaozhong Ji, Yang Yu, Zhen-Jia Pang, Zitai Xiao, Yuzhou Wu, Tong Lu

Injecting human knowledge is an effective way to accelerate reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL) +3

Paper
Code

On Reinforcement Learning for Full-length Game of StarCraft

no code implementations • 23 Sep 2018 • Zhen-Jia Pang, Ruo-Ze Liu, Zhou-Yu Meng, Yi Zhang, Yang Yu, Tong Lu

The reinforcement training algorithm for this architecture is also investigated.

Hierarchical Reinforcement Learning reinforcement-learning +4

Paper
Add Code

A New COLD Feature based Handwriting Analysis for Ethnicity/Nationality Identification

no code implementations • 19 Jun 2018 • Sauradip Nag, Palaiahnakote Shivakumara, Wu Yirui, Umapada Pal, Tong Lu

For each line segment, the proposed method estimates angle and length, which gives a point in polar domain.

Paper
Add Code

Shape Robust Text Detection with Progressive Scale Expansion Network

9 code implementations • 7 Jun 2018 • Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang

To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.

Ranked #12 on Scene Text Detection on ICDAR 2017 MLT

Curved Text Detection Text Detection

1,164

Paper
Code

Mixed Link Networks

1 code implementation • 6 Feb 2018 • Wenhai Wang, Xiang Li, Jian Yang, Tong Lu

Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link").

Representation Learning

Paper
Code

Temporal Action Localization by Structured Maximal Sums

no code implementations • CVPR 2017 • Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng

We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores.

Action Detection General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.