no code implementations • ECCV 2020 • Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun
However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.
1 code implementation • 7 Mar 2025 • Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, Wei Yin
Furthermore, GoalFlow employs an efficient generative method, Flow Matching, to generate multimodal trajectories, and incorporates a refined scoring mechanism to select the optimal trajectory from the candidates.
1 code implementation • 24 Feb 2025 • Canyu Zhao, MingYu Liu, Huanyi Zheng, Muzhi Zhu, Zhiyue Zhao, Hao Chen, Tong He, Chunhua Shen
We achieve results on par with SAM-vit-h using only 0. 06% of their data (e. g., 600K vs. 1B pixel-level annotated images).
no code implementations • 23 Dec 2024 • Yuanyuan Gao, Yalun Dai, Hao Li, Weicai Ye, Junyi Chen, Danpeng Chen, Dingwen Zhang, Tong He, Guofeng Zhang, Junwei Han
3D Gaussian Splatting (3DGS) has demonstrated impressive performance in scene reconstruction.
no code implementations • 19 Dec 2024 • Zhiqiang Tang, Zihan Zhong, Tong He, Gerald Friedland
We curate a benchmark comprising 22 multimodal datasets from diverse real-world applications, encompassing all 4 combinations of the 3 modalities.
1 code implementation • 6 Dec 2024 • Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Kaipeng Zhang, LiMin Wang, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang
We introduce InternVL 2. 5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2. 0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data quality.
Ranked #1 on
Video Question Answering
on NExT-QA
no code implementations • 1 Dec 2024 • Hongrui Lu, Yuxiong Huang, Tong He, Gengfeng Li
Unit maintenance and unit commitment are two critical and interrelated aspects of electric power system operation, both of which face the challenge of coordinating efforts to enhance reliability and economic performance.
no code implementations • 25 Nov 2024 • Zechen Bai, Jianxiong Gao, Ziteng Gao, Pichao Wang, Zheng Zhang, Tong He, Mike Zheng Shou
Visual tokenizers are fundamental to image generation.
no code implementations • 20 Nov 2024 • Weicai Ye, Xinyu Chen, Ruohao Zhan, Di Huang, Xiaoshui Huang, Haoyi Zhu, Hujun Bao, Wanli Ouyang, Tong He, Guofeng Zhang
To tackle these challenges, we propose a dynamic-aware tracking any point (DATAP) method that leverages consistent video depth and point tracking.
1 code implementation • 31 Oct 2024 • Weicai Ye, Chenhao Ji, Zheng Chen, Junyao Gao, Xiaoshui Huang, Song-Hai Zhang, Wanli Ouyang, Tong He, Cairong Zhao, Guofeng Zhang
Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
no code implementations • 30 Oct 2024 • Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, Yin Zhou, James Guo, Dragomir Anguelov, Mingxing Tan
We show that co-training EMMA with planner trajectories, object detection, and road graph tasks yields improvements across all three domains, highlighting EMMA's potential as a generalist model for autonomous driving applications.
no code implementations • 24 Oct 2024 • Junyi Chen, Di Huang, Weicai Ye, Wanli Ouyang, Tong He
Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
1 code implementation • 14 Oct 2024 • Honghui Yang, Di Huang, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, Tong He
Video depth estimation has long been hindered by the scarcity of consistent and scalable ground truth data, leading to inconsistent and unreliable results.
no code implementations • 11 Oct 2024 • Pinxue Guo, Zixu Zhao, Jianxiong Gao, Chongruo wu, Tong He, Zheng Zhang, Tianjun Xiao, Wenqiang Zhang
Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception and object association across video frames are critical.
1 code implementation • 10 Oct 2024 • Haoyi Zhu, Honghui Yang, Yating Wang, Jiange Yang, LiMin Wang, Tong He
The results are compelling: SPA consistently outperforms more than 10 state-of-the-art representation methods, including those specifically designed for embodied AI, vision-centric tasks, and multi-modal applications, while using less training data.
1 code implementation • 29 Sep 2024 • Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou
We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos.
no code implementations • 10 Sep 2024 • Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He
To this end, we propose GigaGS, the first work for high-quality surface reconstruction for large-scale scenes using 3DGS.
1 code implementation • 7 Sep 2024 • Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng Zhang
Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing.
1 code implementation • 26 Aug 2024 • Weiwei Cai, Weicai Ye, Peng Ye, Tong He, Tao Chen
Extensive experiments demonstrate that DynaSurfGS surpasses state-of-the-art methods in both high-fidelity surface reconstruction and photorealistic rendering.
1 code implementation • 22 Aug 2024 • Ziyu Tang, Weicai Ye, Yifan Wang, Di Huang, Hujun Bao, Tong He, Guofeng Zhang
Neural implicit reconstruction via volume rendering has demonstrated its effectiveness in recovering dense 3D surfaces.
no code implementations • 19 Aug 2024 • Yifan Wang, Di Huang, Weicai Ye, Guofeng Zhang, Wanli Ouyang, Tong He
Signed Distance Function (SDF)-based volume rendering has demonstrated significant capabilities in surface reconstruction.
1 code implementation • 15 Aug 2024 • Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, PengFei Liu, Yue Zhang, Zheng Zhang
Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements.
no code implementations • 14 Aug 2024 • Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou
This paper introduces a novel data-centric approach, Generalized Query Expansion (GQE), to address the inherent information imbalance between text and video, enhancing the effectiveness of text-video retrieval systems.
1 code implementation • 25 Jul 2024 • YiFan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He
On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse representation and interpretable, providing exact matches for individual words.
no code implementations • 21 Jul 2024 • Xiaoyang Wu, Xiang Xu, Lingdong Kong, Liang Pan, Ziwei Liu, Tong He, Wanli Ouyang, Hengshuang Zhao
In this technical report, we detail our first-place solution for the 2024 Waymo Open Dataset Challenge's semantic segmentation track.
1 code implementation • CVPR 2024 • Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang
To fully exploit rich information hidden in long-term temporal point clouds and images, we present the Temporal Aggregation Network, termed TASeg.
no code implementations • 12 Jul 2024 • Xiangkun Hu, Tong He, David Wipf
Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences.
1 code implementation • 11 Jul 2024 • Zidong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai
In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks.
no code implementations • 17 Jun 2024 • YuAn Wang, Zhao Wang, Junhao Gong, Di Huang, Tong He, Wanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, Dan Xu
In this paper, we introduce a novel path to $\textit{general}$ human motion generation by focusing on 2D space.
1 code implementation • 14 Jun 2024 • YiWen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang
Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement.
1 code implementation • CVPR 2024 • Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang
Moreover, our analysis substantiates that our method exhibits the capability to dynamically adapt the slot number according to each instance's complexity, offering the potential for further exploration in slot attention research.
no code implementations • 5 Jun 2024 • Minglei Li, Peng Ye, Yongqi Huang, Lin Zhang, Tao Chen, Tong He, Jiayuan Fan, Wanli Ouyang
Parameter-efficient fine-tuning (PEFT) has become increasingly important as foundation models continue to grow in both popularity and size.
1 code implementation • 23 May 2024 • Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang
In this paper, we rethink and analyze the existing model merging paradigm.
no code implementations • 5 May 2024 • Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan
3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars.
no code implementations • 30 Apr 2024 • Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, CongCong Li
In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately.
1 code implementation • 29 Apr 2024 • Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou
By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.
no code implementations • 22 Mar 2024 • Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.
no code implementations • 19 Mar 2024 • Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He
To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization.
1 code implementation • 19 Mar 2024 • Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Philip Torr, Jian Wu
We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.
no code implementations • 18 Mar 2024 • Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang
The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.
1 code implementation • 7 Mar 2024 • Amber Yijia Zheng, Tong He, Yixuan Qiu, Minjie Wang, David Wipf
These optimal features typically depend on tunable parameters of the lower-level energy in such a way that the entire bilevel pipeline can be trained end-to-end.
2 code implementations • 4 Feb 2024 • Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He
These outcomes suggest that the 3D point cloud is a valuable observation modality for intricate robotic tasks.
1 code implementation • 31 Jan 2024 • Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan
The Segment Anything Model (SAM) stands as a foundational framework for image segmentation.
1 code implementation • 6 Jan 2024 • Yixin Chen, Shuai Zhang, Boran Han, Tong He, Bo Li
In this work, we introduce Context-Aware MultiModal Learner (CaMML), for tuning large multimodal models (LMMs).
Ranked #141 on
Visual Question Answering
on MM-Vet
1 code implementation • CVPR 2024 • Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao
This paper is not motivated to seek innovation within the attention mechanism.
no code implementations • 25 Dec 2023 • Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang
We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.
3 code implementations • 15 Dec 2023 • Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, Hengshuang Zhao
This paper is not motivated to seek innovation within the attention mechanism.
Ranked #1 on
3D Semantic Segmentation
on ScanNet++
(using extra training data)
1 code implementation • CVPR 2024 • Yunhan Yang, Yukun Huang, Xiaoyang Wu, Yuan-Chen Guo, Song-Hai Zhang, Hengshuang Zhao, Tong He, Xihui Liu
However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views.
2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang
Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.
Ranked #1 on
Pedestrian Image Caption
on CUHK-PEDES
1 code implementation • 1 Nov 2023 • Jiaxin Cheng, Tianjun Xiao, Tong He
We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning.
1 code implementation • 24 Oct 2023 • Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang
It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.
1 code implementation • 17 Oct 2023 • Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen
To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged.
1 code implementation • CVPR 2024 • Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang
In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.
1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang
In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.
Ranked #2 on
Semantic Segmentation
on S3DIS
(using extra training data)
no code implementations • 28 Sep 2023 • Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan
We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.
1 code implementation • ICCV 2023 • Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.
1 code implementation • ICCV 2023 • Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.
1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
1 code implementation • ICCV 2023 • Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu
To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.
1 code implementation • 26 Aug 2023 • Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang
Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.
no code implementations • 11 Aug 2023 • Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang
As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?
1 code implementation • 15 Jun 2023 • Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen
To address this issue, we propose a novel diffusion-based feature learning framework that explores Multi-Timestep Multi-Stage Diffusion features for HSI classification for the first time, called MTMSD.
1 code implementation • 6 Jun 2023 • Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu
In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.
no code implementations • CVPR 2024 • Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing, Stefano Soatto
In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR).
1 code implementation • CVPR 2023 • Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang
The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.
1 code implementation • 4 May 2023 • Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang
In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.
1 code implementation • 23 Feb 2023 • Yijia Zheng, Tong He, Yixuan Qiu, David Wipf
Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold.
no code implementations • 16 Feb 2023 • Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li
Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts.
1 code implementation • 16 Jan 2023 • Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang
To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.
no code implementations • CVPR 2023 • Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang
In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.
no code implementations • ICCV 2023 • Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.
1 code implementation • 20 Dec 2022 • Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai
Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training.
no code implementations • CVPR 2023 • Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao
Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.
2 code implementations • 8 Dec 2022 • Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang
These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation.
1 code implementation • CVPR 2023 • Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang
In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.
no code implementations • 30 Nov 2022 • Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou
The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.
no code implementations • 17 Nov 2022 • Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang
Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.
no code implementations • 24 Oct 2022 • Zhaoqi Leng, Guowang Li, Chenxi Liu, Ekin Dogus Cubuk, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan
Data augmentations are important in training high-performance 3D object detectors for point clouds.
1 code implementation • 23 Oct 2022 • Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang
The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.
1 code implementation • 11 Oct 2022 • Jingru Tan, Bo Li, Xin Lu, Yongqiang Yao, Fengwei Yu, Tong He, Wanli Ouyang
Long-tail distribution is widely spread in real-world applications.
4 code implementations • 29 Sep 2022 • Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard Schölkopf, Thomas Brox, Francesco Locatello
Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world.
no code implementations • 12 Jul 2022 • Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao
Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.
no code implementations • 25 Apr 2022 • Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel
The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.
no code implementations • NeurIPS 2021 • Longyuan Li, Jian Yao, Li Wenliang, Tong He, Tianjun Xiao, Junchi Yan, David Wipf, Zheng Zhang
Learning the distribution of future trajectories conditioned on the past is a crucial problem for understanding multi-agent systems.
no code implementations • ICLR 2022 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Tong He, Yong Yu, Rasool Fakoor
Goal-oriented Reinforcement Learning (GoRL) is a promising approach for scaling up RL techniques on sparse reward environments requiring long horizon planning.
no code implementations • ICCV 2021 • Tong He, Yuanlu Xu, Shunsuke Saito, Stefano Soatto, Tony Tung
We present ARCH++, an image-based method to reconstruct 3D avatars with arbitrary clothing styles.
Ranked #1 on
3D Object Reconstruction From A Single Image
on RenderPeople
(using extra training data)
3D Object Reconstruction From A Single Image
Image-to-Image Translation
1 code implementation • NeurIPS 2021 • Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, xiangyang xue
Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment.
1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
3 code implementations • ICCV 2021 • Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
no code implementations • CVPR 2021 • Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, Chunhua Shen
Scene flow in 3D point clouds plays an important role in understanding dynamic environments.
1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen
Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.
Ranked #7 on
Text Spotting
on Inverse-Text
no code implementations • 1 Jan 2021 • Jiarui Jin, Sijin Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Tong He, Yong Yu, Zheng Zhang, Alex Smola
In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation.
1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.
1 code implementation • NeurIPS 2020 • Tong He, John Collomosse, Hailin Jin, Stefano Soatto
We propose Geo-PIFu, a method to recover a 3D mesh from a monocular color image of a clothed person.
1 code implementation • 14 Jun 2020 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.
no code implementations • 30 Apr 2020 • Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola
In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.
36 code implementations • 19 Apr 2020 • Hang Zhang, Chongruo wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola
It is well known that featuremap attention and multi-path representation are important for visual recognition.
Ranked #10 on
Instance Segmentation
on COCO test-dev
(APM metric)
15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang
Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.
Ranked #9 on
Text Spotting
on Inverse-Text
no code implementations • ECCV 2020 • Tong He, Dong Gong, Zhi Tian, Chunhua Shen
3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.
Ranked #29 on
3D Instance Segmentation
on ScanNet(v2)
no code implementations • 24 Dec 2019 • Jialin Gao, Tong He, Xi Zhou, Shiming Ge
A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints.
Ranked #46 on
Skeleton Based Action Recognition
on NTU RGB+D
1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin
More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.
Ranked #3 on
Scene Text Detection
on ICDAR 2017 MLT
1 code implementation • 6 Dec 2019 • Albert Zhao, Tong He, Yitao Liang, Haibin Huang, Guy Van Den Broeck, Stefano Soatto
To learn this representation, we train a squeeze network to drive using annotations for the side task as input.
7 code implementations • 3 Sep 2019 • Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, Zheng Zhang
Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs.
Ranked #34 on
Node Classification
on Cora
3 code implementations • 9 Jul 2019 • Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu
We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).
2 code implementations • 26 Apr 2019 • Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li
One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.
87 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.
Ranked #4 on
Pedestrian Detection
on TJU-Ped-campus
1 code implementation • CVPR 2019 • Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan
To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.
no code implementations • CVPR 2019 • Zhi Tian, Tong He, Chunhua Shen, Youliang Yan
In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.
Ranked #49 on
Semantic Segmentation
on PASCAL Context
3 code implementations • 11 Feb 2019 • Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li
Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}.
no code implementations • 11 Jan 2019 • Tong He, Stefano Soatto
We present a method to infer 3D pose and shape of vehicles from a single image.
no code implementations • CVPR 2019 • Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai
In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild.
no code implementations • CVPR 2019 • Tong He, Haibin Huang, Li Yi, Yuqian Zhou, Chi-Hao Wu, Jue Wang, Stefano Soatto
Surface-based geodesic topology provides strong cues for object semantic analysis and geometric modeling.
27 code implementations • CVPR 2019 • Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li
Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods.
Ranked #38 on
Domain Generalization
on VizWiz-Classification
no code implementations • 4 Oct 2018 • Tong He, Yang Hu
Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation.
2 code implementations • CVPR 2018 • Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun
This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.
1 code implementation • ICCV 2017 • Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li
Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28].
Ranked #4 on
Scene Text Detection
on COCO-Text
27 code implementations • 12 Sep 2016 • Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image.
1 code implementation • 31 Mar 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao
We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.
no code implementations • IEEE Trans. on Image Processing, 2016 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images.
no code implementations • 12 Oct 2015 • Tong He, Weilin Huang, Yu Qiao, Jian Yao
The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.