Search Results for author: Tong He

Found 74 papers, 39 papers with code

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations ECCV 2020 Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

DreamComposer: Controllable 3D Object Generation via Multi-View Conditions

no code implementations6 Dec 2023 Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu

However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.

3D Object Reconstruction Novel View Synthesis

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

1 code implementation4 Dec 2023 Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., human mesh recovery, pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

Action Recognition Human Mesh Recovery +3

Consistent Video-to-Video Transfer Using Synthetic Dataset

1 code implementation1 Nov 2023 Jiaxin Cheng, Tianjun Xiao, Tong He

We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning.

Video Editing

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation24 Oct 2023 Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

Compatible Transformer for Irregularly Sampled Multivariate Time Series

1 code implementation17 Oct 2023 Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen

To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged.

Time Series

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

2 code implementations12 Oct 2023 Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Wanli Ouyang

In this paper, we introduce a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models.

 Ranked #1 on Semantic Segmentation on S3DIS (using extra training data)

3D Reconstruction 3D Semantic Segmentation +2

LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection

no code implementations28 Sep 2023 Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan

We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.

3D Object Detection object-detection

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

1 code implementation ICCV 2023 Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.

Video Segmentation Video Semantic Segmentation

Unsupervised Open-Vocabulary Object Localization in Videos

no code implementations ICCV 2023 Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.

Object Localization Representation Learning

Object-Centric Multiple Object Tracking

1 code implementation ICCV 2023 Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking object-detection +2

Coarse-to-Fine Amodal Segmentation with Shape Prior

no code implementations ICCV 2023 Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.

Segmentation Semantic Segmentation

Boosting Residual Networks with Group Knowledge

no code implementations26 Aug 2023 Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.

Knowledge Distillation

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

no code implementations11 Aug 2023 Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang

As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?

When Hyperspectral Image Classification Meets Diffusion Models: An Unsupervised Feature Learning Framework

no code implementations15 Jun 2023 Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen

Learning effective spectral-spatial features is important for the hyperspectral image (HSI) classification task, but the majority of existing HSI classification methods still suffer from modeling complex spectral-spatial relations and characterizing low-level details and high-level semantics comprehensively.

Classification Hyperspectral Image Classification

SAM3D: Segment Anything in 3D Scenes

1 code implementation6 Jun 2023 Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.


Learning for Open-World Calibration with Graph Neural Networks

no code implementations19 May 2023 Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing

We tackle the problem of threshold calibration for open-world recognition by incorporating representation compactness measures into clustering.

Open Set Learning

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

1 code implementation CVPR 2023 Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang

The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.

Autonomous Driving Quantization

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

no code implementations4 May 2023 Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.

Learning Manifold Dimensions with Conditional Variational Autoencoders

1 code implementation23 Feb 2023 Yijia Zheng, Tong He, Yixuan Qiu, David Wipf

Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold.

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

no code implementations16 Feb 2023 Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li

Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts.

Layout-to-Image Generation

$β$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search

1 code implementation16 Jan 2023 Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.

Neural Architecture Search

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Ponder: Point Cloud Pre-training via Neural Rendering

no code implementations ICCV 2023 Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.

3D Reconstruction Image Generation +2

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

no code implementations CVPR 2023 Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao

Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.

object-detection Object Detection +2

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

1 code implementation20 Dec 2022 Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai

In contrast to the original hard depth labels, such soft pseudo labels with quality scores allow the network to learn a reasonable depth range, boosting training stability and thus improving final performance.

Monocular 3D Object Detection object-detection

Frozen CLIP Model is An Efficient Point Cloud Backbone

1 code implementation8 Dec 2022 Xiaoshui Huang, Sheng Li, Wentao Qu, Tong He, Yifan Zuo, Wanli Ouyang

This paper introduces Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model.

Few-Shot Learning Semantic Segmentation

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

1 code implementation CVPR 2023 Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang

In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.

Reconstructing Hand-Held Objects from Monocular Video

no code implementations30 Nov 2022 Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.

Hand Pose Estimation

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation

no code implementations17 Nov 2022 Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang

Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.

3D Instance Segmentation Segmentation +1

Self-supervised Amodal Video Object Segmentation

1 code implementation23 Oct 2022 Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.

Segmentation Self-Supervised Learning +4

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm

no code implementations12 Jul 2022 Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao

Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.

Point Cloud Completion

PointInst3D: Segmenting 3D Instances by Points

no code implementations25 Apr 2022 Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel

The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.

3D Instance Segmentation Clustering +2

Graph-Enhanced Exploration for Goal-oriented Reinforcement Learning

no code implementations ICLR 2022 Jiarui Jin, Sijin Zhou, Weinan Zhang, Tong He, Yong Yu, Rasool Fakoor

Goal-oriented Reinforcement Learning (GoRL) is a promising approach for scaling up RL techniques on sparse reward environments requiring long horizon planning.

Continuous Control graph construction +2

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation18 Jul 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations ICCV 2021 Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation8 May 2021 Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Text Spotting

Explore with Dynamic Map: Graph Structured Reinforcement Learning

no code implementations1 Jan 2021 Jiarui Jin, Sijin Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Tong He, Yong Yu, Zheng Zhang, Alex Smola

In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation.

reinforcement-learning Reinforcement Learning (RL)

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation CVPR 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

FCOS: A simple and strong anchor-free object detector

no code implementations14 Jun 2020 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.

Object Detection Semantic Segmentation

Improving Semantic Segmentation via Self-Training

no code implementations30 Apr 2020 Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.

Domain Generalization Semantic Segmentation

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

no code implementations24 Dec 2019 Jialin Gao, Tong He, Xi Zhou, Shiming Ge

A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints.

Action Recognition Skeleton Based Action Recognition

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation20 Dec 2019 Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Scene Text Detection Text Detection

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

4 code implementations9 Jul 2019 Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations26 Apr 2019 Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

FCOS: Fully Convolutional One-Stage Object Detection

83 code implementations ICCV 2019 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Object Detection Pedestrian Detection +1

Knowledge Adaptation for Efficient Semantic Segmentation

1 code implementation CVPR 2019 Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.

Knowledge Distillation Segmentation +2

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

no code implementations CVPR 2019 Zhi Tian, Tong He, Chunhua Shen, Youliang Yan

In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.

Segmentation Semantic Segmentation

Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors

no code implementations11 Jan 2019 Tong He, Stefano Soatto

We present a method to infer 3D pose and shape of vehicles from a single image.

GIF2Video: Color Dequantization and Temporal Interpolation of GIF images

no code implementations CVPR 2019 Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai

In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild.


Bag of Tricks for Image Classification with Convolutional Neural Networks

26 code implementations CVPR 2019 Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods.

Domain Generalization General Classification +4

FashionNet: Personalized Outfit Recommendation with Deep Neural Network

no code implementations4 Oct 2018 Tong He, Yang Hu

Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation.

Recommendation Systems

An end-to-end TextSpotter with Explicit Alignment and Attention

2 code implementations CVPR 2018 Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.

Text Detection

Single Shot Text Detector with Regional Attention

1 code implementation ICCV 2017 Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li

Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28].

Scene Text Detection

Detecting Text in Natural Image with Connectionist Text Proposal Network

22 code implementations12 Sep 2016 Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image.

Scene Text Detection

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

1 code implementation31 Mar 2016 Tong He, Weilin Huang, Yu Qiao, Jian Yao

We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.

Scene Text Detection Text Detection

Text-Attentional Convolutional Neural Networks for Scene Text Detection

no code implementations12 Oct 2015 Tong He, Weilin Huang, Yu Qiao, Jian Yao

The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.

Multi-Task Learning Scene Text Detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.