no code implementations • ECCV 2020 • Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun
However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.
no code implementations • 6 Dec 2023 • Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu
However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.
1 code implementation • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang
Human-centric perception tasks, e. g., human mesh recovery, pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.
1 code implementation • 1 Nov 2023 • Jiaxin Cheng, Tianjun Xiao, Tong He
We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning.
1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
1 code implementation • 17 Oct 2023 • Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen
To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged.
1 code implementation • 12 Oct 2023 • Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang
In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.
2 code implementations • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Wanli Ouyang
In this paper, we introduce a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models.
Ranked #1 on
Semantic Segmentation
on S3DIS
(using extra training data)
no code implementations • 28 Sep 2023 • Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan
We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.
1 code implementation • ICCV 2023 • Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.
no code implementations • ICCV 2023 • Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.
1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
no code implementations • ICCV 2023 • Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.
no code implementations • 26 Aug 2023 • Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang
Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.
no code implementations • 11 Aug 2023 • Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang
As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?
no code implementations • 15 Jun 2023 • Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen
Learning effective spectral-spatial features is important for the hyperspectral image (HSI) classification task, but the majority of existing HSI classification methods still suffer from modeling complex spectral-spatial relations and characterizing low-level details and high-level semantics comprehensively.
1 code implementation • 6 Jun 2023 • Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu
In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.
no code implementations • 19 May 2023 • Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing
We tackle the problem of threshold calibration for open-world recognition by incorporating representation compactness measures into clustering.
1 code implementation • CVPR 2023 • Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang
The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.
no code implementations • 4 May 2023 • Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang
In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.
1 code implementation • 23 Feb 2023 • Yijia Zheng, Tong He, Yixuan Qiu, David Wipf
Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold.
no code implementations • 16 Feb 2023 • Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li
Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts.
1 code implementation • 16 Jan 2023 • Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang
To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.
no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang
In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.
no code implementations • ICCV 2023 • Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.
no code implementations • CVPR 2023 • Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao
Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.
1 code implementation • 20 Dec 2022 • Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai
In contrast to the original hard depth labels, such soft pseudo labels with quality scores allow the network to learn a reasonable depth range, boosting training stability and thus improving final performance.
1 code implementation • 8 Dec 2022 • Xiaoshui Huang, Sheng Li, Wentao Qu, Tong He, Yifan Zuo, Wanli Ouyang
This paper introduces Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model.
1 code implementation • CVPR 2023 • Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang
In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.
no code implementations • 30 Nov 2022 • Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou
The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.
no code implementations • 17 Nov 2022 • Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang
Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.
no code implementations • 24 Oct 2022 • Zhaoqi Leng, Guowang Li, Chenxi Liu, Ekin Dogus Cubuk, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan
Data augmentations are important in training high-performance 3D object detectors for point clouds.
1 code implementation • 23 Oct 2022 • Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang
The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.
1 code implementation • 11 Oct 2022 • Jingru Tan, Bo Li, Xin Lu, Yongqiang Yao, Fengwei Yu, Tong He, Wanli Ouyang
Long-tail distribution is widely spread in real-world applications.
1 code implementation • 29 Sep 2022 • Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard Schölkopf, Thomas Brox, Francesco Locatello
Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world.
no code implementations • 12 Jul 2022 • Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao
Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.
no code implementations • 25 Apr 2022 • Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel
The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.
no code implementations • NeurIPS 2021 • Longyuan Li, Jian Yao, Li Wenliang, Tong He, Tianjun Xiao, Junchi Yan, David Wipf, Zheng Zhang
Learning the distribution of future trajectories conditioned on the past is a crucial problem for understanding multi-agent systems.
no code implementations • ICLR 2022 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Tong He, Yong Yu, Rasool Fakoor
Goal-oriented Reinforcement Learning (GoRL) is a promising approach for scaling up RL techniques on sparse reward environments requiring long horizon planning.
no code implementations • ICCV 2021 • Tong He, Yuanlu Xu, Shunsuke Saito, Stefano Soatto, Tony Tung
We present ARCH++, an image-based method to reconstruct 3D avatars with arbitrary clothing styles.
Ranked #1 on
3D Object Reconstruction From A Single Image
on RenderPeople
(using extra training data)
3D Object Reconstruction From A Single Image
Image-to-Image Translation
1 code implementation • NeurIPS 2021 • Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, xiangyang xue
Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment.
1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
2 code implementations • ICCV 2021 • Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
no code implementations • CVPR 2021 • Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, Chunhua Shen
Scene flow in 3D point clouds plays an important role in understanding dynamic environments.
1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen
Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.
Ranked #7 on
Text Spotting
on Inverse-Text
no code implementations • 1 Jan 2021 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Tong He, Yong Yu, Zheng Zhang, Alex Smola
In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation.
1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.
1 code implementation • NeurIPS 2020 • Tong He, John Collomosse, Hailin Jin, Stefano Soatto
We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person.
no code implementations • 14 Jun 2020 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.
no code implementations • 30 Apr 2020 • Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola
In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.
33 code implementations • 19 Apr 2020 • Hang Zhang, Chongruo wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola
It is well known that featuremap attention and multi-path representation are important for visual recognition.
Ranked #8 on
Instance Segmentation
on COCO test-dev
(APM metric)
11 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang
Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.
Ranked #9 on
Text Spotting
on Inverse-Text
no code implementations • ECCV 2020 • Tong He, Dong Gong, Zhi Tian, Chunhua Shen
3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.
Ranked #24 on
3D Instance Segmentation
on ScanNet(v2)
no code implementations • 24 Dec 2019 • Jialin Gao, Tong He, Xi Zhou, Shiming Ge
A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints.
Ranked #32 on
Skeleton Based Action Recognition
on NTU RGB+D
1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin
More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.
Ranked #3 on
Scene Text Detection
on ICDAR 2017 MLT
1 code implementation • 6 Dec 2019 • Albert Zhao, Tong He, Yitao Liang, Haibin Huang, Guy Van Den Broeck, Stefano Soatto
To learn this representation, we train a squeeze network to drive using annotations for the side task as input.
7 code implementations • 3 Sep 2019 • Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, Zheng Zhang
Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs.
Ranked #34 on
Node Classification
on Cora
4 code implementations • 9 Jul 2019 • Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu
We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).
2 code implementations • 26 Apr 2019 • Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li
One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.
83 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.
Ranked #5 on
Pedestrian Detection
on TJU-Ped-campus
1 code implementation • CVPR 2019 • Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan
To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.
no code implementations • CVPR 2019 • Zhi Tian, Tong He, Chunhua Shen, Youliang Yan
In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.
Ranked #46 on
Semantic Segmentation
on PASCAL Context
3 code implementations • 11 Feb 2019 • Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li
Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}.
no code implementations • 11 Jan 2019 • Tong He, Stefano Soatto
We present a method to infer 3D pose and shape of vehicles from a single image.
no code implementations • CVPR 2019 • Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai
In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild.
no code implementations • CVPR 2019 • Tong He, Haibin Huang, Li Yi, Yuqian Zhou, Chi-Hao Wu, Jue Wang, Stefano Soatto
Surface-based geodesic topology provides strong cues for object semantic analysis and geometric modeling.
26 code implementations • CVPR 2019 • Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li
Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods.
Ranked #44 on
Domain Generalization
on VizWiz-Classification
no code implementations • 4 Oct 2018 • Tong He, Yang Hu
Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation.
2 code implementations • CVPR 2018 • Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun
This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.
1 code implementation • ICCV 2017 • Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li
Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28].
Ranked #4 on
Scene Text Detection
on COCO-Text
22 code implementations • 12 Sep 2016 • Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image.
1 code implementation • 31 Mar 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao
We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.
no code implementations • IEEE Trans. on Image Processing, 2016 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images.
no code implementations • 12 Oct 2015 • Tong He, Weilin Huang, Yu Qiao, Jian Yao
The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.