Search Results for author: Tong He

Found 84 papers, 44 papers with code

Bag of Tricks for Image Classification with Convolutional Neural Networks

26 code implementations CVPR 2019 Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods.

Domain Generalization General Classification +4

FCOS: Fully Convolutional One-Stage Object Detection

85 code implementations ICCV 2019 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Object Object Detection +2

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations ICCV 2021 Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering

Detecting Text in Natural Image with Connectionist Text Proposal Network

26 code implementations12 Sep 2016 Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image.

Scene Text Detection

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

4 code implementations9 Jul 2019 Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng, Yi Zhu

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating).

Knowledge Adaptation for Efficient Semantic Segmentation

1 code implementation CVPR 2019 Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan

To tackle this dilemma, we propose a knowledge distillation method tailored for semantic segmentation to improve the performance of the compact FCNs with large overall stride.

Knowledge Distillation Segmentation +1

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

2 code implementations12 Oct 2023 Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #2 on Semantic Segmentation on ScanNet (using extra training data)

3D Object Detection 3D Reconstruction +5

SAM3D: Segment Anything in 3D Scenes

1 code implementation6 Jun 2023 Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.

Segmentation

An end-to-end TextSpotter with Explicit Alignment and Attention

2 code implementations CVPR 2018 Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun

This allows the two tasks to work collaboratively by shar- ing convolutional features, which is critical to identify challenging text instances.

Text Detection

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation20 Dec 2019 Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Scene Text Detection Text Detection

EPCL: Frozen CLIP Transformer is An Efficient Point Cloud Encoder

2 code implementations8 Dec 2022 Xiaoshui Huang, Zhou Huang, Sheng Li, Wentao Qu, Tong He, Yuenan Hou, Yifan Zuo, Wanli Ouyang

These token embeddings are concatenated with a task token and fed into the frozen CLIP transformer to learn point cloud representation.

Few-Shot Learning Segmentation +1

Single Shot Text Detector with Regional Attention

1 code implementation ICCV 2017 Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li

Our text detector achieves an F-measure of 77% on the ICDAR 2015 bench- mark, advancing the state-of-the-art results in [18, 28].

Scene Text Detection

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations4 Dec 2023 Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

3D Human Pose Estimation Action Recognition +8

GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection

1 code implementation24 Oct 2023 Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Tong He, Yonghui Li, Wanli Ouyang

It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning.

Monocular 3D Object Detection object-detection

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation CVPR 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation18 Jul 2021 Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

1 code implementation CVPR 2023 Honghui Yang, Tong He, Jiaheng Liu, Hua Chen, Boxi Wu, Binbin Lin, Xiaofei He, Wanli Ouyang

In contrast to previous 3D MAE frameworks, which either design a complex decoder to infer masked information from maintained regions or adopt sophisticated masking strategies, we instead propose a much simpler paradigm.

$β$-DARTS++: Bi-level Regularization for Proxy-robust Differentiable Architecture Search

1 code implementation16 Jan 2023 Peng Ye, Tong He, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

To address the robustness problem, we first benchmark different NAS methods under a wide range of proxy data, proxy channels, proxy layers and proxy epochs, since the robustness of NAS under different kinds of proxies has not been explored before.

Neural Architecture Search

Progressive Coordinate Transforms for Monocular 3D Object Detection

1 code implementation NeurIPS 2021 Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, xiangyang xue

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment.

Monocular 3D Object Detection Object +2

Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources

2 code implementations26 Apr 2019 Haibin Lin, Hang Zhang, Yifei Ma, Tong He, Zhi Zhang, Sheng Zha, Mu Li

One difficulty we observe is that the noise in the stochastic momentum estimation is accumulated over time and will have delayed effects when the batch size changes.

Image Classification object-detection +3

Consistent Video-to-Video Transfer Using Synthetic Dataset

1 code implementation1 Nov 2023 Jiaxin Cheng, Tianjun Xiao, Tong He

We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning.

Video Editing

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation8 May 2021 Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Text Spotting

PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer

1 code implementation CVPR 2023 Honghui Yang, Wenxiao Wang, Minghao Chen, Binbin Lin, Tong He, Hua Chen, Xiaofei He, Wanli Ouyang

The key to associating the two different representations is our introduced input-dependent Query Initialization module, which could efficiently generate reference points and content queries.

Autonomous Driving Quantization

Coarse-to-Fine Amodal Segmentation with Shape Prior

1 code implementation ICCV 2023 Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.

Object Segmentation +1

Object-Centric Multiple Object Tracking

1 code implementation ICCV 2023 Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking Object +3

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

1 code implementation31 Mar 2016 Tong He, Weilin Huang, Yu Qiao, Jian Yao

We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.

Scene Text Detection Text Detection

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

1 code implementation ICCV 2023 Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.

Object Video Segmentation +1

Learning Manifold Dimensions with Conditional Variational Autoencoders

1 code implementation23 Feb 2023 Yijia Zheng, Tong He, Yixuan Qiu, David Wipf

Although the variational autoencoder (VAE) and its conditional extension (CVAE) are capable of state-of-the-art results across multiple domains, their precise behavior is still not fully understood, particularly in the context of data (like images) that lie on or near a low-dimensional manifold.

SAM: Squeeze-and-Mimic Networks for Conditional Visual Driving Policy Learning

1 code implementation6 Dec 2019 Albert Zhao, Tong He, Yitao Liang, Haibin Huang, Guy Van Den Broeck, Stefano Soatto

To learn this representation, we train a squeeze network to drive using annotations for the side task as input.

Semantic Segmentation

Compatible Transformer for Irregularly Sampled Multivariate Time Series

1 code implementation17 Oct 2023 Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen

To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged.

Time Series

OBMO: One Bounding Box Multiple Objects for Monocular 3D Object Detection

1 code implementation20 Dec 2022 Chenxi Huang, Tong He, Haidong Ren, Wenxiao Wang, Binbin Lin, Deng Cai

Unfortunately, the network cannot accurately distinguish different depths from such non-discriminative visual features, resulting in unstable depth training.

Monocular 3D Object Detection object-detection

Self-supervised Amodal Video Object Segmentation

1 code implementation23 Oct 2022 Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.

Object Segmentation +6

Boosting Residual Networks with Group Knowledge

1 code implementation26 Aug 2023 Shengji Tang, Peng Ye, Baopu Li, Weihao Lin, Tao Chen, Tong He, Chong Yu, Wanli Ouyang

Specifically, we implicitly divide all subnets into hierarchical groups by subnet-in-subnet sampling, aggregate the knowledge of different subnets in each group during training, and exploit upper-level group knowledge to supervise lower-level subnet groups.

Knowledge Distillation

Text-Attentional Convolutional Neural Networks for Scene Text Detection

no code implementations12 Oct 2015 Tong He, Weilin Huang, Yu Qiao, Jian Yao

The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.

Multi-Task Learning Scene Text Detection +3

FashionNet: Personalized Outfit Recommendation with Deep Neural Network

no code implementations4 Oct 2018 Tong He, Yang Hu

Our system, dubbed FashionNet, consists of two components, a feature network for feature extraction and a matching network for compatibility computation.

Recommendation Systems

GIF2Video: Color Dequantization and Temporal Interpolation of GIF images

no code implementations CVPR 2019 Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai

In this paper, we propose GIF2Video, the first learning-based method for enhancing the visual quality of GIFs in the wild.

Quantization

Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors

no code implementations11 Jan 2019 Tong He, Stefano Soatto

We present a method to infer 3D pose and shape of vehicles from a single image.

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

no code implementations CVPR 2019 Zhi Tian, Tong He, Chunhua Shen, Youliang Yan

In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs.

Segmentation Semantic Segmentation

Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks for Skeleton-based Action Recognition

no code implementations24 Dec 2019 Jialin Gao, Tong He, Xi Zhou, Shiming Ge

A collection of approaches based on graph convolutional networks have proven success in skeleton-based action recognition by exploring neighborhood information and dense dependencies between intra-frame joints.

Action Recognition Skeleton Based Action Recognition

Improving Semantic Segmentation via Self-Training

no code implementations30 Apr 2020 Yi Zhu, Zhongyue Zhang, Chongruo wu, Zhi Zhang, Tong He, Hang Zhang, R. Manmatha, Mu Li, Alexander Smola

In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models.

Domain Generalization Segmentation +1

FCOS: A simple and strong anchor-free object detector

no code implementations14 Jun 2020 Zhi Tian, Chunhua Shen, Hao Chen, Tong He

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.

Object Object Detection +1

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations ECCV 2020 Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

Explore with Dynamic Map: Graph Structured Reinforcement Learning

no code implementations1 Jan 2021 Jiarui Jin, Sijin Zhou, Weinan Zhang, Rasool Fakoor, David Wipf, Tong He, Yong Yu, Zheng Zhang, Alex Smola

In reinforcement learning, a map with states and transitions built based on historical trajectories is often helpful in exploration and exploitation.

reinforcement-learning Reinforcement Learning (RL)

Graph-Enhanced Exploration for Goal-oriented Reinforcement Learning

no code implementations ICLR 2022 Jiarui Jin, Sijin Zhou, Weinan Zhang, Tong He, Yong Yu, Rasool Fakoor

Goal-oriented Reinforcement Learning (GoRL) is a promising approach for scaling up RL techniques on sparse reward environments requiring long horizon planning.

Continuous Control graph construction +2

PointInst3D: Segmenting 3D Instances by Points

no code implementations25 Apr 2022 Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel

The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.

3D Instance Segmentation Clustering +2

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm

no code implementations12 Jul 2022 Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao

Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.

Point Cloud Completion

3D-QueryIS: A Query-based Framework for 3D Instance Segmentation

no code implementations17 Nov 2022 Jiaheng Liu, Tong He, Honghui Yang, Rui Su, Jiayi Tian, Junran Wu, Hongcheng Guo, Ke Xu, Wanli Ouyang

Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness.

3D Instance Segmentation Segmentation +1

Reconstructing Hand-Held Objects from Monocular Video

no code implementations30 Nov 2022 Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.

Hand Pose Estimation Object

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

no code implementations CVPR 2023 Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao

Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.

object-detection Object Detection +2

Ponder: Point Cloud Pre-training via Neural Rendering

no code implementations ICCV 2023 Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.

3D Reconstruction Image Generation +2

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

no code implementations16 Feb 2023 Jiaxin Cheng, Xiao Liang, Xingjian Shi, Tong He, Tianjun Xiao, Mu Li

Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts.

Layout-to-Image Generation

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

no code implementations4 May 2023 Peng Ye, Tong He, Shengji Tang, Baopu Li, Tao Chen, Lei Bai, Wanli Ouyang

In this work, we aim to re-investigate the training process of residual networks from a novel social psychology perspective of loafing, and further propose a new training scheme as well as three improved strategies for boosting residual networks beyond their performance limits.

Crossing the Gap: Domain Generalization for Image Captioning

no code implementations CVPR 2023 Yuchen Ren, Zhendong Mao, Shancheng Fang, Yan Lu, Tong He, Hao Du, Yongdong Zhang, Wanli Ouyang

In this paper, we introduce a new setting called Domain Generalization for Image Captioning (DGIC), where the data from the target domain is unseen in the learning process.

Domain Generalization Image Captioning +1

Learning for Transductive Threshold Calibration in Open-World Recognition

no code implementations19 May 2023 Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing, Stefano Soatto

In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR).

Metric Learning Open Set Learning

When Hyperspectral Image Classification Meets Diffusion Models: An Unsupervised Feature Learning Framework

no code implementations15 Jun 2023 Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen

Learning effective spectral-spatial features is important for the hyperspectral image (HSI) classification task, but the majority of existing HSI classification methods still suffer from modeling complex spectral-spatial relations and characterizing low-level details and high-level semantics comprehensively.

Classification Hyperspectral Image Classification

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

no code implementations11 Aug 2023 Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Tong He, Wanli Ouyang

As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost?

Unsupervised Open-Vocabulary Object Localization in Videos

no code implementations ICCV 2023 Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.

Object Object Localization +1

LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection

no code implementations28 Sep 2023 Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan

We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds.

3D Object Detection Object +1

Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers

no code implementations25 Dec 2023 Peng Ye, Yongqi Huang, Chongjun Tu, Minglei Li, Tao Chen, Tong He, Wanli Ouyang

We first validate eight manually-defined partial fine-tuning strategies across kinds of datasets and vision transformer architectures, and find that some partial fine-tuning strategies (e. g., ffn only or attention only) can achieve better performance with fewer tuned parameters than full fine-tuning, and selecting appropriate layers is critical to partial fine-tuning.

CaMML: Context-Aware Multimodal Learner for Large Models

no code implementations6 Jan 2024 Yixin Chen, Shuai Zhang, Boran Han, Tong He, Bo Li

In this work, we introduce Context-Aware MultiModal Learner (CaMML), for tuning large multimodal models (LMMs).

Visual Question Answering

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

no code implementations4 Feb 2024 Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

In this study, we explore the influence of different observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud.

Zero-shot Generalization

BloomGML: Graph Machine Learning through the Lens of Bilevel Optimization

1 code implementation7 Mar 2024 Amber Yijia Zheng, Tong He, Yixuan Qiu, Minjie Wang, David Wipf

These optimal features typically depend on tunable parameters of the lower-level energy in such a way that the entire bilevel pipeline can be trained end-to-end.

Bilevel Optimization Graph Learning +1

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations18 Mar 2024 Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM

no code implementations19 Mar 2024 Yixuan Wu, Yizhou Wang, Shixiang Tang, Wenhao Wu, Tong He, Wanli Ouyang, Jian Wu, Philip Torr

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object detection ability of multimodal large language models (MLLMs), such as GPT-4V and Gemini.

Object object-detection +3

GVGEN: Text-to-3D Generation with Volumetric Representation

no code implementations19 Mar 2024 Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline.

3D Reconstruction Text to 3D

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

no code implementations22 Mar 2024 Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.

Novel View Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.