Search Results for author: Tong He

Found 87 papers, 45 papers with code

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations • ECCV 2020 • Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

Paper
Add Code

PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

no code implementations • 5 May 2024 • Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars.

3D Object Detection Object +1

Paper
Add Code

STT: Stateful Tracking with Transformers for Autonomous Driving

no code implementations • 30 Apr 2024 • Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, CongCong Li

In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately.

Autonomous Driving

Paper
Add Code

Hallucination of Multimodal Large Language Models: A Survey

1 code implementation • 29 Apr 2024 • Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.

Hallucination

210

Paper
Code

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

no code implementations • 22 Mar 2024 • Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.

Novel View Synthesis

Paper
Add Code

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

no code implementations • 19 Mar 2024 • Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

Paper
Add Code

GVGEN: Text-to-3D Generation with Volumetric Representation

no code implementations • 19 Mar 2024 • Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline.

3D Generation 3D Reconstruction +1

Paper
Add Code

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations • 18 Mar 2024 • Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

Paper
Add Code

BloomGML: Graph Machine Learning through the Lens of Bilevel Optimization

1 code implementation • 7 Mar 2024 • Amber Yijia Zheng, Tong He, Yixuan Qiu, Minjie Wang, David Wipf

These optimal features typically depend on tunable parameters of the lower-level energy in such a way that the entire bilevel pipeline can be trained end-to-end.

Bilevel Optimization Graph Learning +1

Paper
Code

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

no code implementations • 4 Feb 2024 • Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

In this study, we explore the influence of different observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud.

Zero-shot Generalization

Paper
Add Code

Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

1 code implementation • 31 Jan 2024 • Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan

The Segment Anything Model (SAM) stands as a foundational framework for image segmentation.

Image Segmentation Segmentation +2

7,246

Paper
Code

CaMML: Context-Aware Multimodal Learner for Large Models

no code implementations • 6 Jan 2024 • Yixin Chen, Shuai Zhang, Boran Han, Tong He, Bo Li

In this work, we introduce Context-Aware MultiModal Learner (CaMML), for tuning large multimodal models (LMMs).

Ranked #51 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Add Code

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

no code implementations • 25 Dec 2023 • Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang

We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.

Paper
Add Code

Point Transformer V3: Simpler, Faster, Stronger

3 code implementations • 15 Dec 2023 • Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao

This paper is not motivated to seek innovation within the attention mechanism.

Ranked #1 on 3D Semantic Segmentation on ScanNet++ (using extra training data)

3D Semantic Segmentation LIDAR Semantic Segmentation +1

2,005

Paper
Code

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

no code implementations • 6 Dec 2023 • Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu

However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.

3D Object Reconstruction Novel View Synthesis +1

Paper
Add Code

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

Ranked #1 on Pedestrian Image Caption on CUHK-PEDES

3D Human Pose Estimation Action Recognition +8

211

Paper
Code

Consistent Video-to-Video Transfer Using Synthetic Dataset

1 code implementation • 1 Nov 2023 • Jiaxin Cheng, Tianjun Xiao, Tong He

We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning.

Video Editing

Paper
Code

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

128

Paper
Code

Compatible Transformer for Irregularly Sampled Multivariate Time Series

1 code implementation • 17 Oct 2023 • Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen

To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged.

Time Series

Paper
Code

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

1 code implementation • 12 Oct 2023 • Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang

In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.

3D Object Detection 3D Semantic Segmentation +3

134

Paper
Code

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #2 on Semantic Segmentation on ScanNet (using extra training data)

3D Object Detection 3D Reconstruction +5

300

Paper
Code

LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection

no code implementations • 28 Sep 2023 • Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan

We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.

3D Object Detection Object +1

Paper
Add Code

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

1 code implementation • ICCV 2023 • Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.

Object Video Segmentation +1

Paper
Code

Unsupervised Open-Vocabulary Object Localization in Videos

no code implementations • ICCV 2023 • Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.

Object Object Localization +1

Paper
Add Code

Object-Centric Multiple Object Tracking

1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking Object +3

Paper
Code

Coarse-to-Fine Amodal Segmentation with Shape Prior

1 code implementation • ICCV 2023 • Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.

Object Segmentation +1

Paper
Code

Boosting Residual Networks with Group Knowledge

1 code implementation • 26 Aug 2023 • Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.

Knowledge Distillation

Paper
Code

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

no code implementations • 11 Aug 2023 • Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang

As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?

Paper
Add Code

When Hyperspectral Image Classification Meets Diffusion Models: An Unsupervised Feature Learning Framework

no code implementations • 15 Jun 2023 • Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen

Learning effective spectral-spatial features is important for the hyperspectral image (HSI) classification task, but the majority of existing HSI classification methods still suffer from modeling complex spectral-spatial relations and characterizing low-level details and high-level semantics comprehensively.

Classification Hyperspectral Image Classification

Paper
Add Code

SAM3D: Segment Anything in 3D Scenes

1 code implementation • 6 Jun 2023 • Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.

Segmentation

874

Paper
Code

Learning for Transductive Threshold Calibration in Open-World Recognition

no code implementations • 19 May 2023 • Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing, Stefano Soatto

In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR).

Metric Learning Open Set Learning

Paper
Add Code

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

1 code implementation • CVPR 2023 • Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang

The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.

Autonomous Driving Quantization

Paper
Code

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

no code implementations • 4 May 2023 • Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.

Paper
Add Code

Learning Manifold Dimensions with Conditional Variational Autoencoders

1 code implementation • 23 Feb 2023 • Yijia Zheng, Tong He, Yixuan Qiu, David Wipf

Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold.

Paper
Code

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

no code implementations • 16 Feb 2023 • Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li

Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts.

Layout-to-Image Generation

Paper
Add Code

$β$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search

1 code implementation • 16 Jan 2023 • Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.

Neural Architecture Search

Paper
Code

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Paper
Add Code

Ponder: Point Cloud Pre-training via Neural Rendering

no code implementations • ICCV 2023 • Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.

3D Reconstruction Image Generation +2

Paper
Add Code

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

1 code implementation • 20 Dec 2022 • Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai

Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training.

Monocular 3D Object Detection object-detection

Paper
Code

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

no code implementations • CVPR 2023 • Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao

Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.

object-detection Object Detection +2

Paper
Add Code

EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder

2 code implementations • 8 Dec 2022 • Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang

These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation.

Few-Shot Learning Segmentation +1

270

Paper
Code

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

1 code implementation • CVPR 2023 • Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang

In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.

Decoder

103

Paper
Code

Reconstructing Hand-Held Objects from Monocular Video

no code implementations • 30 Nov 2022 • Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.

Hand Pose Estimation Object

Paper
Add Code

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation

no code implementations • 17 Nov 2022 • Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang

Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.

3D Instance Segmentation Segmentation +1

Paper
Add Code

LidarAugment: Searching for Scalable 3D LiDAR Data Augmentations

no code implementations • 24 Oct 2022 • Zhaoqi Leng, Guowang Li, Chenxi Liu, Ekin Dogus Cubuk, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

Data augmentations are important in training high-performance 3D object detectors for point clouds.

3D Object Detection Data Augmentation +1

Paper
Add Code

Self-supervised Amodal Video Object Segmentation

1 code implementation • 23 Oct 2022 • Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.

Object Segmentation +6

Paper
Code

The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition

1 code implementation • 11 Oct 2022 • Jingru Tan, Bo Li, Xin Lu, Yongqiang Yao, Fengwei Yu, Tong He, Wanli Ouyang

Long-tail distribution is widely spread in real-world applications.

Image Classification Long-tailed Object Detection +4

424

Paper
Code

Bridging the Gap to Real-World Object-Centric Learning

3 code implementations • 29 Sep 2022 • Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard Schölkopf, Thomas Brox, Francesco Locatello

Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world.

Object

Paper
Code

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm

no code implementations • 12 Jul 2022 • Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao

Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.

Point Cloud Completion

Paper
Add Code

PointInst3D: Segmenting 3D Instances by Points

no code implementations • 25 Apr 2022 • Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel

The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.

3D Instance Segmentation Clustering +2

Paper
Add Code

GRIN: Generative Relation and Intention Network for Multi-agent Trajectory Prediction

no code implementations • NeurIPS 2021 • Longyuan Li, Jian Yao, Li Wenliang, Tong He, Tianjun Xiao, Junchi Yan, David Wipf, Zheng Zhang

Learning the distribution of future trajectories conditioned on the past is a crucial problem for understanding multi-agent systems.

Decoder Graph Attention +2

Paper
Add Code

Graph-Enhanced Exploration for Goal-oriented Reinforcement Learning

no code implementations • ICLR 2022 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Tong He, Yong Yu, Rasool Fakoor

Goal-oriented Reinforcement Learning (GoRL) is a promising approach for scaling up RL techniques on sparse reward environments requiring long horizon planning.

Continuous Control graph construction +2

Paper
Add Code

ARCH++: Animation-Ready Clothed Human Reconstruction Revisited

no code implementations • ICCV 2021 • Tong He, Yuanlu Xu, Shunsuke Saito, Stefano Soatto, Tony Tung

We present ARCH++, an image-based method to reconstruct 3D avatars with arbitrary clothing styles.

Ranked #1 on 3D Object Reconstruction From A Single Image on RenderPeople (using extra training data)

3D Object Reconstruction From A Single Image Image-to-Image Translation

Paper
Add Code

Progressive Coordinate Transforms for Monocular 3D Object Detection

1 code implementation • NeurIPS 2021 • Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, xiangyang xue

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment.

Monocular 3D Object Detection Object +2

Paper
Code

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

117

Paper
Code

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations • ICCV 2021 • Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering

13,099

Paper
Code

HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding

no code implementations • CVPR 2021 • Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, Chunhua Shen

Scene flow in 3D point clouds plays an important role in understanding dynamic environments.

Position

Paper
Add Code

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Ranked #7 on Text Spotting on Inverse-Text

Text Spotting

Paper
Code

Explore with Dynamic Map: Graph Structured Reinforcement Learning

no code implementations • 1 Jan 2021 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Tong He, Yong Yu, Zheng Zhang, Alex Smola

In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

117

Paper
Code

Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction

1 code implementation • NeurIPS 2020 • Tong He, John Collomosse, Hailin Jin, Stefano Soatto

We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person.

109

Paper
Code

FCOS: A simple and strong anchor-free object detector

no code implementations • 14 Jun 2020 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.

Object Object Detection +1

Paper
Add Code

Improving Semantic Segmentation via Self-Training

no code implementations • 30 Apr 2020 • Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.

Domain Generalization Segmentation +1

Paper
Add Code

ResNeSt: Split-Attention Networks

35 code implementations • 19 Apr 2020 • Hang Zhang, Chongruo wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola

It is well known that featuremap attention and multi-path representation are important for visual recognition.

Ranked #8 on Instance Segmentation on COCO test-dev (APM metric)

Image Classification Instance Segmentation +3

30,218

Paper
Code

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang

Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.

Ranked #9 on Text Spotting on Inverse-Text

Scene Text Detection Text Detection +1

3,332

Paper
Code

Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation

no code implementations • ECCV 2020 • Tong He, Dong Gong, Zhi Tian, Chunhua Shen

3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.

Ranked #28 on 3D Instance Segmentation on ScanNet(v2)

3D Instance Segmentation Scene Understanding +1

Paper
Add Code

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

no code implementations • 24 Dec 2019 • Jialin Gao, Tong He, Xi Zhou, Shiming Ge

A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints.

Ranked #36 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Skeleton Based Action Recognition

Paper
Add Code

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Ranked #3 on Scene Text Detection on ICDAR 2017 MLT

Scene Text Detection Text Detection

272

Paper
Code

SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning

1 code implementation • 6 Dec 2019 • Albert Zhao, Tong He, Yitao Liang, Haibin Huang, Guy Van Den Broeck, Stefano Soatto

To learn this representation, we train a squeeze network to drive using annotations for the side task as input.

Semantic Segmentation

Paper
Code

Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks

7 code implementations • 3 Sep 2019 • Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, Zheng Zhang

Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs.

Ranked #35 on Node Classification on Cora

Graph Learning Node Classification

13,095

Paper
Code

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

3 code implementations • 9 Jul 2019 • Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

2,552

Paper
Code

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations • 26 Apr 2019 • Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

Paper
Code

FCOS: Fully Convolutional One-Stage Object Detection

86 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Ranked #4 on Pedestrian Detection on TJU-Ped-campus

Object Object Detection +2

28,139

Paper
Code

Knowledge Adaptation for Efficient Semantic Segmentation

1 code implementation • CVPR 2019 • Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.

Knowledge Distillation Segmentation +1

1,293

Paper
Code

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

no code implementations • CVPR 2019 • Zhi Tian, Tong He, Chunhua Shen, Youliang Yan

In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.

Ranked #46 on Semantic Segmentation on PASCAL Context

Decoder Segmentation +1

Paper
Add Code

Bag of Freebies for Training Object Detection Neural Networks

3 code implementations • 11 Feb 2019 • Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}.

General Classification Image Classification +2

5,762

Paper
Code

Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors

no code implementations • 11 Jan 2019 • Tong He, Stefano Soatto

We present a method to infer 3D pose and shape of vehicles from a single image.

Paper
Add Code

GIF2Video: Color Dequantization and Temporal Interpolation of GIF images

no code implementations • CVPR 2019 • Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai

In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild.

Quantization

Paper
Add Code

GeoNet: Deep Geodesic Networks for Point Cloud Analysis

no code implementations • CVPR 2019 • Tong He, Haibin Huang, Li Yi, Yuqian Zhou, Chi-Hao Wu, Jue Wang, Stefano Soatto

Surface-based geodesic topology provides strong cues for object semantic analysis and geometric modeling.

General Classification

Paper
Add Code

Bag of Tricks for Image Classification with Convolutional Neural Networks

27 code implementations • CVPR 2019 • Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods.

Ranked #38 on Domain Generalization on VizWiz-Classification

Domain Generalization General Classification +4

39,275

Paper
Code

FashionNet: Personalized Outfit Recommendation with Deep Neural Network

no code implementations • 4 Oct 2018 • Tong He, Yang Hu

Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation.

Recommendation Systems

Paper
Add Code

An end-to-end TextSpotter with Explicit Alignment and Attention

2 code implementations • CVPR 2018 • Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.

Text Detection

324

Paper
Code

Single Shot Text Detector with Regional Attention

1 code implementation • ICCV 2017 • Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li

Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28].

Ranked #4 on Scene Text Detection on COCO-Text

Scene Text Detection

212

Paper
Code

Detecting Text in Natural Image with Connectionist Text Proposal Network

27 code implementations • 12 Sep 2016 • Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image.

Scene Text Detection

3,416

Paper
Code

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

1 code implementation • 31 Mar 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.

Scene Text Detection Text Detection

Paper
Code

Text-attentional convolutional neural network for scene text detection

no code implementations • IEEE Trans. on Image Processing, 2016 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images.

Multi-Task Learning Scene Text Detection +3

Paper
Add Code

Text-Attentional Convolutional Neural Networks for Scene Text Detection

no code implementations • 12 Oct 2015 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.

Multi-Task Learning Scene Text Detection +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.