Search Results for author: Hao Zhao

Found 69 papers, 46 papers with code

Network Sketching: Exploiting Binary Structure in Deep CNNs

no code implementations • CVPR 2017 • Yiwen Guo, Anbang Yao, Hao Zhao, Yurong Chen

Convolutional neural networks (CNNs) with deep architectures have substantially advanced the state-of-the-art in computer vision tasks.

Paper
Add Code

Physics Inspired Optimization on Semantic Transfer Features: An Alternative Method for Room Layout Estimation

no code implementations • CVPR 2017 • Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang

In this paper, we propose an alternative method to estimate room layouts of cluttered indoor scenes.

Room Layout Estimation

Paper
Add Code

Decoder Network Over Lightweight Reconstructed Feature for Fast Semantic Style Transfer

no code implementations • ICCV 2017 • Ming Lu, Hao Zhao, Anbang Yao, Feng Xu, Yurong Chen, Li Zhang

Our method decomposes the semantic style transfer problem into feature reconstruction part and feature decoder part.

Style Transfer

Paper
Add Code

SnapQuant: A Probabilistic and Nested Parameterization for Binary Networks

no code implementations • 27 Sep 2018 • Kuan Wang, Hao Zhao, Anbang Yao, Aojun Zhou, Dawei Sun, Yurong Chen

During the training phase, we generate binary weights on-the-fly since what we actually maintain is the policy network, and all the binary weights are used in a burn-after-reading style.

Paper
Add Code

A Closed-form Solution to Universal Style Transfer

2 code implementations • ICCV 2019 • Ming Lu, Hao Zhao, Anbang Yao, Yurong Chen, Feng Xu, Li Zhang

Although plenty of methods have been proposed, a theoretical analysis of feature transform is still missing.

Style Transfer

Paper
Code

Deeply-supervised Knowledge Synergy

1 code implementation • CVPR 2019 • Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao

Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet.

General Classification Image Classification

Paper
Code

Efficient Semantic Scene Completion Network with Spatial Group Convolution

1 code implementation • ECCV 2018 • Jiahui Zhang, Hao Zhao, Anbang Yao, Yurong Chen, Li Zhang, Hongen Liao

We introduce Spatial Group Convolution (SGC) for accelerating the computation of 3D dense prediction tasks.

Ranked #9 on 3D Semantic Scene Completion on SemanticKITTI

3D Semantic Scene Completion valid

Paper
Code

Constrained R-CNN: A general image manipulation detection model

no code implementations • 19 Nov 2019 • Chao Yang, Huizhou Li, Fangting Lin, Bin Jiang, Hao Zhao

Finally, the coarse localization information guides the model to further learn the finer local features and segment out the tampered region.

Ranked #5 on Image Manipulation Localization on COVERAGE

General Classification Image Forensics +4

Paper
Add Code

LID 2020: The Learning from Imperfect Data Challenge Results

no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He

The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.

object-detection Object Detection +5

Paper
Add Code

Federated Meta Learning Enhanced Acoustic Radio Cooperative Framework for Ocean of Things Underwater Acoustic Communications

no code implementations • 24 May 2021 • Hao Zhao, Fei Ji, Quansheng Guan, Qiang Li, Shuai Wang, Hefeng Dong, Miaowen Wen

In summary, the proposed ARC/FML for OoT is a promising scheme for information exchange across water and air.

Federated Learning Meta-Learning +1

Paper
Add Code

PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds

1 code implementation • 12 Sep 2021 • Xiaoxue Chen, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

Such a scheme has two limitations: 1) Storing and running several networks for different tasks are expensive for typical robotic platforms.

object-detection Object Detection +2

Paper
Code

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck

1 code implementation • 17 Sep 2021 • Liyi Luo, Beiwen Tian, Hao Zhao, Guyue Zhou

Semantic understanding of 3D point clouds is important for various robotics applications.

Contrastive Learning Representation Learning +1

Paper
Code

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

1 code implementation • CVPR 2022 • Xiaoxue Chen, Tianyu Liu, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance.

Ranked #51 on Semantic Segmentation on NYU Depth v2

Attribute Scene Understanding +2

Paper
Code

Semi-supervised Implicit Scene Completion from Sparse LiDAR

1 code implementation • 29 Nov 2021 • Pengfei Li, Yongliang Shi, Tianyu Liu, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

Recent advances show that semi-supervised implicit representation learning can be achieved through physical constraints like Eikonal equations.

Representation Learning

120

Paper
Code

Transferable End-to-end Room Layout Estimation via Implicit Encoding

no code implementations • 21 Dec 2021 • Hao Zhao, Rene Ranftl, Yurong Chen, Hongbin Zha

Here we propose an end-to-end method that directly predicts parametric layouts from an input panorama image.

Room Layout Estimation

Paper
Add Code

High-Fidelity Human Avatars From a Single RGB Camera

no code implementations • CVPR 2022 • Hao Zhao, Jinsong Zhang, Yu-Kun Lai, Zerong Zheng, Yingdi Xie, Yebin Liu, Kun Li

To cope with the complexity of textures and generate photo-realistic results, we propose a reference-based neural rendering network and exploit a bottom-up sharpening-guided fine-tuning strategy to obtain detailed textures.

Neural Rendering Vocal Bursts Intensity Prediction

Paper
Add Code

SNAKE: Shape-aware Neural 3D Keypoint Field

1 code implementation • 3 Jun 2022 • Chengliang Zhong, Peixing You, Xiaoxue Chen, Hao Zhao, Fuchun Sun, Guyue Zhou, Xiaodong Mu, Chuang Gan, Wenbing Huang

Detecting 3D keypoints from point clouds is important for shape reconstruction, while this work investigates the dual question: can shape reconstruction benefit 3D keypoint detection?

Keypoint Detection

208

Paper
Code

Model-Driven Based Deep Unfolding Equalizer for Underwater Acoustic OFDM Communications

no code implementations • 10 Jul 2022 • Hao Zhao, Cui Yang, Yalu Xu, Fei Ji, Miaowen Wen, Yankun Chen

Each layer of UDNet is designed according to the classical minimum mean square error (MMSE) equalizer.

Paper
Add Code

Language-guided Semantic Style Transfer of 3D Indoor Scenes

1 code implementation • 16 Aug 2022 • Bu Jin, Beiwen Tian, Hao Zhao, Guyue Zhou

We address the new problem of language-guided semantic style transfer of 3D indoor scenes.

Style Transfer

Paper
Code

Distance-Aware Occlusion Detection with Focused Attention

1 code implementation • 23 Aug 2022 • Yang Li, Yucheng Tu, Xiaoxue Chen, Hao Zhao, Guyue Zhou

In this work, (1) we propose a novel three-decoder architecture as the infrastructure for focused attention; 2) we use the generalized intersection box prediction task to effectively guide our model to focus on occlusion-specific regions; 3) our model achieves a new state-of-the-art performance on distance-aware relationship detection.

Human-Object Interaction Detection Relationship Detection +1

Paper
Code

SOM-Net: Unrolling the Subspace-based Optimization for Solving Full-wave Inverse Scattering Problems

no code implementations • 8 Sep 2022 • Yu Liu, Hao Zhao, Rencheng Song, Xudong Chen, Chang Li, Xun Chen

The final output of the SOM-Net is the full predicted induced current, from which the scattered field and the permittivity image can also be deduced analytically.

Rolling Shutter Correction

Paper
Add Code

LATITUDE: Robotic Global Localization with Truncated Dynamic Low-pass Filter in City-scale NeRF

1 code implementation • 18 Sep 2022 • Zhenxin Zhu, Yuantao Chen, Zirui Wu, Chao Hou, Yongliang Shi, Chuxuan Li, Pengfei Li, Hao Zhao, Guyue Zhou

In this paper, we present LATITUDE: Global Localization with Truncated Dynamic Low-pass Filter, which introduces a two-stage localization mechanism in city-scale NeRF.

Pose Prediction

Paper
Code

City-scale Incremental Neural Mapping with Three-layer Sampling and Panoptic Representation

no code implementations • 28 Sep 2022 • Yongliang Shi, Runyi Yang, Pengfei Li, Zirui Wu, Hao Zhao, Guyue Zhou

Neural implicit representations are drawing a lot of attention from the robotics community recently, as they are expressive, continuous and compact.

Paper
Add Code

Planning Assembly Sequence with Graph Transformer

1 code implementation • 11 Oct 2022 • Lin Ma, Jiangtao Gong, Hao Xu, Hao Chen, Hao Zhao, Wenbing Huang, Guyue Zhou

In this paper, we present a graph-transformer based framework for the ASP problem which is trained and demonstrated on a self-collected ASP database.

Paper
Code

Understanding Embodied Reference with Touch-Line Transformer

1 code implementation • 11 Oct 2022 • Yang Li, Xiaoxue Chen, Hao Zhao, Jiangtao Gong, Guyue Zhou, Federico Rossano, Yixin Zhu

Human studies have revealed that objects referred to or pointed to do not lie on the elbow-wrist line, a common misconception; instead, they lie on the so-called virtual touch line.

Paper
Code

TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation

1 code implementation • 19 Oct 2022 • Pengfei Li, Beiwen Tian, Yongliang Shi, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

As such, we study the challenging problem of task oriented detection, which aims to find objects that best afford an action indicated by verbs like sit comfortably on.

Instance Segmentation Referring Expression +2

121

Paper
Code

VIBUS: Data-efficient 3D Scene Parsing with VIewpoint Bottleneck and Uncertainty-Spectrum Modeling

1 code implementation • 20 Oct 2022 • Beiwen Tian, Liyi Luo, Hao Zhao, Guyue Zhou

In the first stage, we perform self-supervised representation learning on unlabeled points with the proposed Viewpoint Bottleneck loss function.

Representation Learning Scene Parsing

156

Paper
Code

SC-wLS: Towards Interpretable Feed-forward Camera Re-localization

1 code implementation • 23 Oct 2022 • Xin Wu, Hao Zhao, Shunkai Li, Yingdian Cao, Hongbin Zha

Visual re-localization aims to recover camera poses in a known environment, which is vital for applications like robotics or augmented reality.

regression Test-time Adaptation

Paper
Code

Self-Aligning Depth-regularized Radiance Fields for Asynchronous RGB-D Sequences

no code implementations • 14 Nov 2022 • Yuxin Huang, Andong Yang, Zirui Wu, Yuantao Chen, Runyi Yang, Zhenxin Zhu, Chao Hou, Hao Zhao, Guyue Zhou

It has been shown that learning radiance fields with depth rendering and depth supervision can effectively promote the quality and convergence of view synthesis.

Autonomous Driving Benchmarking

Paper
Add Code

INT2: Interactive Trajectory Prediction at Intersections

1 code implementation • ICCV 2023 • Zhijie Yan, Pengfei Li, Zheng Fu, Shaocong Xu, Yongliang Shi, Xiaoxue Chen, Yuhang Zheng, Yang Li, Tianyu Liu, Chuxuan Li, Nairui Luo, Xu Gao, Yilun Chen, Zuoxu Wang, Yifeng Shi, Pengfei Huang, Zhengxiao Han, Jirui Yuan, Jiangtao Gong, Guyue Zhou, Hang Zhao, Hao Zhao

One of the most challenging problems in motion forecasting is interactive trajectory prediction, whose goal is to jointly forecasts the future trajectories of interacting agents.

Motion Forecasting Trajectory Prediction

Paper
Code

From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds

1 code implementation • 31 Jan 2023 • Huan-ang Gao, Beiwen Tian, Pengfei Li, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yurong Chen, Hongbin Zha

But adapting this scheme to the state-of-the-art (SOTA) solution for PC-based layout estimation is not straightforward.

Motion Planning Pseudo Label +2

109

Paper
Code

ADAPT: Action-aware Driving Caption Transformer

1 code implementation • 1 Feb 2023 • Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, Jingjing Liu

To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.

Autonomous Driving Decision Making

366

Paper
Code

STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

1 code implementation • 2 Feb 2023 • Yupeng Zheng, Chengliang Zhong, Pengfei Li, Huan-ang Gao, Yuhang Zheng, Bu Jin, Ling Wang, Hao Zhao, Guyue Zhou, Qichao Zhang, Dongbin Zhao

By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally.

Depth Estimation Image Enhancement

168

Paper
Code

LODE: Locally Conditioned Eikonal Implicit Scene Completion from Sparse LiDAR

1 code implementation • 27 Feb 2023 • Pengfei Li, Ruowen Zhao, Yongliang Shi, Hao Zhao, Jirui Yuan, Guyue Zhou, Ya-Qin Zhang

In this paper, we propose a novel Eikonal formulation that conditions the implicit representation on localized shape priors which function as dense boundary value constraints, and demonstrate it works on SemanticKITTI and SemanticPOSS.

Autonomous Driving Representation Learning

168

Paper
Code

DPF: Learning Dense Prediction Fields with Weak Supervision

1 code implementation • CVPR 2023 • Xiaoxue Chen, Yuhang Zheng, Yupeng Zheng, Qiang Zhou, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

We showcase the effectiveness of DPFs using two substantially different tasks: high-level semantic parsing and low-level intrinsic image decomposition.

Intrinsic Image Decomposition Scene Understanding +1

Paper
Code

STRAP: Structured Object Affordance Segmentation with Point Supervision

1 code implementation • 17 Apr 2023 • Leiyao Cui, Xiaoxue Chen, Hao Zhao, Guyue Zhou, Yixin Zhu

By label affinity, we refer to affordance segmentation as a multi-label prediction problem: A plate can be both holdable and containable.

Object Scene Understanding

Paper
Code

Delving into Shape-aware Zero-shot Semantic Segmentation

1 code implementation • CVPR 2023 • Xinyu Liu, Beiwen Tian, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang, Hao Zhao, Guyue Zhou

Thanks to the impressive progress of large-scale vision-language pretraining, recent recognition models can classify arbitrary objects in a zero-shot and open-set manner, with a surprisingly high accuracy.

Image Segmentation Segmentation +2

108

Paper
Code

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

1 code implementation • ICCV 2023 • Huan-ang Gao, Beiwen Tian, Pengfei Li, Hao Zhao, Guyue Zhou

While this paradigm is natural for image-level or pixel-level prediction, adapting it to the detection problem is challenged by the issue of proposal matching.

3D Object Detection object-detection +1

Paper
Code

Simulate Bumblebee and Extend It to Support LE Coded PHY in BLE version 5

no code implementations • 17 May 2023 • Hao Zhao

Subsequently, we extend Bumblebee to support LE Coded PHY in BLE version 5 and conduct experiments to verify its performance.

Paper
Add Code

On Pitfalls of Test-Time Adaptation

1 code implementation • 6 Jun 2023 • Hao Zhao, Yuejiang Liu, Alexandre Alahi, Tao Lin

Test-Time Adaptation (TTA) has recently emerged as a promising approach for tackling the robustness challenge under distribution shifts.

Model Selection Test-time Adaptation

Paper
Code

Car-Studio: Learning Car Radiance Fields from Single-View and Endless In-the-wild Images

1 code implementation • 26 Jul 2023 • Tianyu Liu, Hao Zhao, Yang Yu, Guyue Zhou, Ming Liu

However, previous studies learned within a sequence of autonomous driving datasets, resulting in unsatisfactory blurring when rotating the car in the simulator.

Autonomous Driving

Paper
Code

MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving

1 code implementation • 27 Jul 2023 • Zirui Wu, Tianyu Liu, Liyi Luo, Zhide Zhong, Jianteng Chen, Hongmin Xiao, Chao Hou, Haozhe Lou, Yuantao Chen, Runyi Yang, Yuxin Huang, Xiaoyu Ye, Zike Yan, Yongliang Shi, Yiyi Liao, Hao Zhao

We expect this modular design to boost academic progress and industrial deployment of NeRF-based autonomous driving simulation.

Autonomous Driving

621

Paper
Code

ECT: Fine-grained Edge Detection with Learned Cause Tokens

1 code implementation • 6 Aug 2023 • Shaocong Xu, Xiaoxue Chen, Yuhang Zheng, Guyue Zhou, Yurong Chen, Hongbin Zha, Hao Zhao

To address these three issues, we propose a two-stage transformer-based network sequentially predicting generic edges and fine-grained edges, which has a global receptive field thanks to the attention mechanism.

Edge Detection

Paper
Code

3D Implicit Transporter for Temporally Consistent Keypoint Discovery

1 code implementation • ICCV 2023 • Chengliang Zhong, Yuhang Zheng, Yupeng Zheng, Hao Zhao, Li Yi, Xiaodong Mu, Ling Wang, Pengfei Li, Guyue Zhou, Chao Yang, Xinliang Zhang, Jian Zhao

To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information.

Paper
Code

Learning Point-wise Abstaining Penalty for Point Cloud Anomaly Detection

1 code implementation • 19 Sep 2023 • Shaocong Xu, Pengfei Li, Xinyu Liu, Qianpu Sun, Yang Li, Shihui Guo, Zhen Wang, Bo Jiang, Rui Wang, Kehua Sheng, Bo Zhang, Hao Zhao

We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance.

Anomaly Detection Autonomous Driving +1

Paper
Code

NeRRF: 3D Reconstruction and View Synthesis for Transparent and Specular Objects with Neural Refractive-Reflective Fields

1 code implementation • 22 Sep 2023 • Xiaoxue Chen, Junchen Liu, Hao Zhao, Guyue Zhou, Ya-Qin Zhang

In this paper, we introduce the refractive-reflective field.

3D Reconstruction Object

Paper
Code

PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection

1 code implementation • NeurIPS 2023 • Qiang Zhou, Weize Li, Lihan Jiang, Guoliang Wang, Guyue Zhou, Shanghang Zhang, Hao Zhao

Furthermore, we provide an open-source benchmark library, including dataset and baseline methods that cover 8 anomaly detection paradigms, to facilitate future research and application in this domain.

Ranked #2 on Anomaly Detection on PAD Dataset

4k Anomaly Detection

Paper
Code

Prototype-based HyperAdapter for Sample-Efficient Multi-task Tuning

1 code implementation • 18 Oct 2023 • Hao Zhao, Jie Fu, Zhaofeng He

Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters.

Multi-Task Learning

Paper
Code

ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation

no code implementations • 10 Nov 2023 • Zhide Zhong, Jiakai Cao, Songen Gu, Sirui Xie, Weibo Gao, Liyi Luo, Zike Yan, Hao Zhao, Guyue Zhou

We present ASSIST, an object-wise neural radiance field as a panoptic representation for compositional and realistic simulation.

Panoptic Segmentation

Paper
Add Code

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

1 code implementation • 29 Nov 2023 • Ziqiao Peng, Wentao Hu, Yue Shi, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Jun He, Hongyan Liu, Zhaoxin Fan

A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses.

Talking Face Generation Talking Head Generation

764

Paper
Code

Active Learning for Abrupt Shifts Change-point Detection via Derivative-Aware Gaussian Processes

no code implementations • 5 Dec 2023 • Hao Zhao, Rong pan

We investigate the effectiveness of DACD method in diverse scenarios and show it outperforms other active learning change-point detection approaches.

Active Learning Change Detection +3

Paper
Add Code

SlimmeRF: Slimmable Radiance Fields

1 code implementation • 15 Dec 2023 • Shiran Yuan, Hao Zhao

To this end, we present SlimmeRF, a model that allows for instant test-time trade-offs between model size and accuracy through slimming, thus making the model simultaneously suitable for scenarios with different computing budgets.

3D Scene Reconstruction Novel View Synthesis

147

Paper
Code

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

no code implementations • 10 Jan 2024 • Beiwen Tian, Huan-ang Gao, Leiyao Cui, Yupeng Zheng, Lan Luo, Baofeng Wang, Rong Zhi, Guyue Zhou, Hao Zhao

We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.

Autonomous Driving Benchmarking +2

Paper
Add Code

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction

1 code implementation • 6 Feb 2024 • Yonggang Jin, Ge Zhang, Hao Zhao, Tianyu Zheng, Jiawei Guo, Liuyu Xiang, Shawn Yue, Stephen W. Huang, Zhaofeng He, Jie Fu

Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer.

Paper
Code

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

1 code implementation • 7 Feb 2024 • Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they?

Paper
Code

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

1 code implementation • 8 Feb 2024 • Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang

Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering.

Autonomous Driving Language Modelling +2

192

Paper
Code

Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images

no code implementations • 8 Feb 2024 • Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Lingjie Liu, Hao Zhao, Guyue Zhou, Wenping Wang

We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context.

Depth Estimation

Paper
Add Code

Key Patch Proposer: Key Patches Contain Rich Information

1 code implementation • 18 Feb 2024 • Jing Xu, Beiwen Tian, Hao Zhao

In this paper, we introduce a novel algorithm named Key Patch Proposer (KPP) designed to select key patches in an image without additional training.

Active Learning Semantic Segmentation

Paper
Code

HyperMoE: Paying Attention to Unselected Experts in Mixture of Experts via Dynamic Transfer

1 code implementation • 20 Feb 2024 • Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu

The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing.

Multi-Task Learning

Paper
Code

MonoOcc: Digging into Monocular Semantic Occupancy Prediction

1 code implementation • 13 Mar 2024 • Yupeng Zheng, Xiang Li, Pengfei Li, Yuhang Zheng, Bu Jin, Chengliang Zhong, Xiaoxiao Long, Hao Zhao, Qichao Zhang

However, existing methods rely on a complex cascaded framework with relatively limited information to restore 3D scenes, including a dependency on supervision solely on the whole network's output, single-frame input, and the utilization of a small backbone.

Autonomous Vehicles

Paper
Code

FastMAC: Stochastic Spectral Sampling of Correspondence Graph

1 code implementation • 13 Mar 2024 • Yifei Zhang, Hao Zhao, Hongyang Li, Siheng Chen

As such, the core of our method is the stochastic spectral sampling of correspondence graph.

Point Cloud Registration

Paper
Code

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior

no code implementations • 14 Mar 2024 • Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao

Semantic image synthesis (SIS) shows good promises for sensor simulation.

Image Generation

Paper
Add Code

P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors

no code implementations • 15 Mar 2024 • Zhou Jiang, Zhenxin Zhu, Pengfei Li, Huan-ang Gao, Tianyuan Yuan, Yongliang Shi, Hang Zhao, Hao Zhao

On the other hand, we exploit a masked autoencoder to capture the prior distribution of HDMap, which can serve as a refinement module to mitigate occlusions and artifacts.

Autonomous Vehicles

Paper
Add Code

Large Language Models Powered Context-aware Motion Prediction

no code implementations • 17 Mar 2024 • Xiaoji Zheng, Lixiu Wu, Zhijie Yan, Yuanrong Tang, Hao Zhao, Chen Zhong, Bokui Chen, Jiangtao Gong

Traditional methods of motion forecasting primarily encode vector information of maps and historical trajectory data of traffic participants, lacking a comprehensive understanding of overall traffic semantics, which in turn affects the performance of prediction tasks.

Motion Forecasting motion prediction +1

Paper
Add Code

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

no code implementations • 18 Mar 2024 • Mingjin Chen, JunHao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao

In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image.

3D Human Reconstruction Texture Synthesis

Paper
Add Code

SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing

1 code implementation • 28 Mar 2024 • Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao

This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting.

Paper
Code

TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

1 code implementation • 28 Mar 2024 • Bu Jin, Yupeng Zheng, Pengfei Li, Weize Li, Yuhang Zheng, Sujie Hu, Xinyu Liu, Jinwei Zhu, Zhijie Yan, Haiyang Sun, Kun Zhan, Peng Jia, Xiaoxiao Long, Yilun Chen, Hao Zhao

However, the exploration of 3D dense captioning in outdoor scenes is hindered by two major challenges: 1) the \textbf{domain gap} between indoor and outdoor scenes, such as dynamics and sparse visual inputs, makes it difficult to directly adapt existing indoor methods; 2) the \textbf{lack of data} with comprehensive box-caption pair annotations specifically tailored for outdoor scenes.

3D dense captioning Dense Captioning

Paper
Code

PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

no code implementations • 4 Apr 2024 • Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

Robotic manipulation of ungraspable objects with two-finger grippers presents significant challenges due to the paucity of graspable features, while traditional pre-grasping techniques, which rely on repositioning objects and leveraging external aids like table edges, lack the adaptability across object categories and scenes.

Object

Paper
Add Code

Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs

1 code implementation • 5 Apr 2024 • JunHao Chen, Xiang Li, Xiaojun Ye, Chao Li, Zhaoxin Fan, Hao Zhao

The definition of an IDEA is the composition of multimodal inputs including text, image, and 3D models.

Model Selection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.