Search Results for author: Hang Zhao

Found 104 papers, 50 papers with code

Generating Comprehensive Lithium Battery Charging Data with Generative AI

no code implementations • 11 Apr 2024 • Lidang Jiang, Changyan Hu, Sibei Ji, Hang Zhao, Junxiong Chen, Ge He

Through preprocessing data into a quasi-video format, our study achieves an integrated synthesis of electrochemical data, including voltage, current, temperature, and charging capacity, which is then processed by the RCVAE model.

Paper
Add Code

P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors

no code implementations • 15 Mar 2024 • Zhou Jiang, Zhenxin Zhu, Pengfei Li, Huan-ang Gao, Tianyuan Yuan, Yongliang Shi, Hang Zhao, Hao Zhao

On the other hand, we exploit a masked autoencoder to capture the prior distribution of HDMap, which can serve as a refinement module to mitigate occlusions and artifacts.

Autonomous Vehicles

Paper
Add Code

PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors

1 code implementation • 14 Mar 2024 • Tianyuan Yuan, Yucheng Mao, Jiawei Yang, Yicheng Liu, Yue Wang, Hang Zhao

Autonomous vehicles rely extensively on perception systems to navigate and interpret their surroundings.

Autonomous Driving Navigate

Paper
Code

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

no code implementations • 19 Feb 2024 • Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities.

Autonomous Driving Scene Understanding

Paper
Add Code

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

1 code implementation • 10 Jan 2024 • Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li

As a state-of-the-art, open-source image generation model, PIXART-{\delta} offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.

Image Generation

2,081

Paper
Code

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

2 code implementations • 9 Nov 2023 • Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps.

Image Generation

4,028

Paper
Code

Large Trajectory Models are Scalable Motion Predictors and Planners

1 code implementation • 30 Oct 2023 • Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao

STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task.

Autonomous Driving Language Modelling +2

Paper
Code

LiDAR-based 4D Occupancy Completion and Forecasting

1 code implementation • 17 Oct 2023 • Xinhao Liu, Moonjun Gong, Qi Fang, Haoyu Xie, Yiming Li, Hang Zhao, Chen Feng

In this paper, we introduce a novel LiDAR perception task of Occupancy Completion and Forecasting (OCF) in the context of autonomous driving to unify these aspects into a cohesive framework.

Autonomous Driving Hallucination

Paper
Code

What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?

no code implementations • 10 Oct 2023 • Siting Li, Chenzhuang Du, Yue Zhao, Yu Huang, Hang Zhao

With the growing success of multi-modal learning, research on the robustness of multi-modal models, especially when facing situations with missing modalities, is receiving increased attention.

Data Augmentation

Paper
Add Code

Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments

no code implementations • 9 Oct 2023 • Xiong-Hui Chen, Junyin Ye, Hang Zhao, Yi-Chen Li, Haoran Shi, Yu-Yan Xu, Zhihao Ye, Si-Hang Yang, Anqi Huang, Kai Xu, Zongzhang Zhang, Yang Yu

In this work, we focus on imitator learning based on only one expert demonstration.

Imitation Learning

Paper
Add Code

Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

no code implementations • 8 Oct 2023 • Chenzhuang Du, Yue Zhao, Chonghua Liao, Jiacheng You, Jie Fu, Hang Zhao

To this end, we introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA).

Optical Flow Estimation

Paper
Add Code

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

3 code implementations • 6 Oct 2023 • Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao

Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).

Text-to-Image Generation

4,028

Paper
Code

GPT-Driver: Learning to Drive with GPT

1 code implementation • 2 Oct 2023 • Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, Yue Wang

In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs).

Ranked #1 on Motion Planning on nuScenes

Autonomous Driving Decision Making +2

178

Paper
Code

Uncertainty-Aware Decision Transformer for Stochastic Driving Environments

no code implementations • 28 Sep 2023 • Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao

Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks.

Autonomous Driving Offline RL +1

Paper
Add Code

AutoEncoding Tree for City Generation and Applications

no code implementations • 27 Sep 2023 • Wenyu Han, Congcong Wen, Lazarus Chok, Yan Liang Tan, Sheung Lung Chan, Hang Zhao, Chen Feng

Based on this dataset, we propose AETree, a tree-structured auto-encoder neural network, for city generation.

Autonomous Driving

Paper
Add Code

Boosting Offline Reinforcement Learning for Autonomous Driving with Hierarchical Latent Skills

no code implementations • 24 Sep 2023 • Zenan Li, Fan Nie, Qiao Sun, Fang Da, Hang Zhao

Learning-based vehicle planning is receiving increasing attention with the emergence of diverse driving simulators and large-scale driving datasets.

Autonomous Driving Offline RL +2

Paper
Add Code

Robot Parkour Learning

no code implementations • 11 Sep 2023 • Ziwen Zhuang, Zipeng Fu, Jianren Wang, Christopher Atkeson, Soeren Schwertfeger, Chelsea Finn, Hang Zhao

Parkour is a grand challenge for legged locomotion that requires robots to overcome various obstacles rapidly in complex environments.

Paper
Add Code

StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction

1 code implementation • 24 Aug 2023 • Tianyuan Yuan, Yicheng Liu, Yue Wang, Yilun Wang, Hang Zhao

This approach limits their stability and performance in complex scenarios such as occlusions, largely due to the absence of temporal information.

Autonomous Driving

142

Paper
Code

Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

no code implementations • 16 Aug 2023 • Running Zhao, Jiangtao Yu, Hang Zhao, Edith C. H. Ngai

Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning-based Control for PMSM Using Distributed Gaussian Processes with Optimal Aggregation Strategy

no code implementations • 26 Jul 2023 • Zhenxiao Yin, Xiaobing Dai, Zewen Yang, Yang shen, Georges Hattab, Hang Zhao

The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs).

GPR

Paper
Add Code

Reconstructing Three-decade Global Fine-Grained Nighttime Light Observations by a New Super-Resolution Framework

no code implementations • 14 Jul 2023 • Jinyu Guo, Feng Zhang, Hang Zhao, Baoxiang Pan, Linlu Mei

We provide the long-term and fine-grained nighttime light observations to promote research on human activities.

Super-Resolution

Paper
Add Code

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

1 code implementation • NeurIPS 2023 • Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production.

Audio Synthesis

104

Paper
Code

BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird's-Eye-View in Dynamic Scenarios

no code implementations • 20 Jun 2023 • Yucheng Mao, Ruowen Zhao, Tianbao Zhang, Hang Zhao

Depth estimation is a cornerstone of perception in autonomous driving and robotic systems.

Autonomous Driving Depth Estimation

Paper
Add Code

A Universal Semantic-Geometric Representation for Robotic Manipulation

no code implementations • 18 Jun 2023 • Tong Zhang, Yingdong Hu, Hanchen Cui, Hang Zhao, Yang Gao

To this end, we present $\textbf{Semantic-Geometric Representation} (\textbf{SGR})$, a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.

Paper
Add Code

SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views

1 code implementation • 15 Jun 2023 • Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen, Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding Yu, Chen Feng

Monocular scene understanding is a foundational component of autonomous systems.

3D Semantic Scene Completion 3D Semantic Scene Completion from a single 2D image

146

Paper
Code

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory

no code implementations • 6 Jun 2023 • Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, Hang Zhao

Large language models (LLMs) with memory are computationally universal.

Paper
Add Code

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

1 code implementation • CVPR 2023 • Xiaoyu Tian, Haoxi Ran, Yue Wang, Hang Zhao

This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations?

Multi-Object Tracking object-detection +3

Paper
Code

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

1 code implementation • 2 May 2023 • Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Tianyuan Yuan, Yue Wang, Yang Yuan, Hang Zhao

We abstract the features (i. e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions.

197

Paper
Code

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

1 code implementation • NeurIPS 2023 • Xiaoyu Tian, Tao Jiang, Longfei Yun, Yucheng Mao, Huitong Yang, Yue Wang, Yilun Wang, Hang Zhao

3D occupancy prediction, which estimates the detailed occupancy states and semantics of a scene, is an emerging task to overcome these limitations.

Autonomous Driving

331

Paper
Code

What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging

1 code implementation • CVPR 2023 • Zitian Tang, Wenjie Ye, Wei-Chiu Ma, Hang Zhao

Inferring past human motion from RGB images is challenging due to the inherent uncertainty of the prediction problem.

Human-Object Interaction Detection Pose Estimation

Paper
Code

Programmatically Grounded, Compositionally Generalizable Robotic Manipulation

no code implementations • 26 Apr 2023 • Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao

Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills.

Imitation Learning

Paper
Add Code

Neural Map Prior for Autonomous Driving

no code implementations • CVPR 2023 • Xuan Xiong, Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, Hang Zhao

To the best of our knowledge, this is the first learning-based system for creating a global map prior.

Autonomous Driving Navigate

Paper
Add Code

SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer

1 code implementation • CVPR 2023 • Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han

High-resolution images enable neural networks to learn richer visual representations.

2D Semantic Segmentation Instance Segmentation +4

Paper
Code

INT2: Interactive Trajectory Prediction at Intersections

1 code implementation • ICCV 2023 • Zhijie Yan, Pengfei Li, Zheng Fu, Shaocong Xu, Yongliang Shi, Xiaoxue Chen, Yuhang Zheng, Yang Li, Tianyu Liu, Chuxuan Li, Nairui Luo, Xu Gao, Yilun Chen, Zuoxu Wang, Yifeng Shi, Pengfei Huang, Zhengxiao Han, Jirui Yuan, Jiangtao Gong, Guyue Zhou, Hang Zhao, Hao Zhao

One of the most challenging problems in motion forecasting is interactive trajectory prediction, whose goal is to jointly forecasts the future trajectories of interacting agents.

Motion Forecasting Trajectory Prediction

Paper
Code

AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation

no code implementations • 7 Dec 2022 • Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li

Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads to inferior performance with other modalities such as depth gradient, (2)the original affinity loss does not prevent trivial predictions as intended but actually accelerates this process due to the affinity loss term being symmetric.

Box-supervised Instance Segmentation Segmentation +2

Paper
Add Code

Learning Physically Realizable Skills for Online Packing of General 3D Shapes

1 code implementation • 5 Dec 2022 • Hang Zhao, Zherong Pan, Yang Yu, Kai Xu

We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems.

Action Generation Reinforcement Learning (RL)

Paper
Code

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework

1 code implementation • ICCV 2023 • Bowen Li, Ziyuan Huang, Junjie Ye, Yiming Li, Sebastian Scherer, Hang Zhao, Changhong Fu

Visual object tracking is essential to intelligent robots.

Visual Object Tracking Visual Tracking

Paper
Code

P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

no code implementations • 3 Nov 2022 • Qiao Sun, Xin Huang, Brian C. Williams, Hang Zhao

Motion prediction is crucial in enabling safe motion planning for autonomous vehicles in interactive scenarios.

Autonomous Driving Motion Planning +2

Paper
Add Code

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

no code implementations • 9 Aug 2022 • Xin Huang, Xiaoyu Tian, Junru Gu, Qiao Sun, Hang Zhao

Recently, the occupancy flow fields representation was proposed to represent joint future states of road agents through a combination of occupancy grid and flow, which supports efficient and consistent joint predictions.

Autonomous Driving

Paper
Add Code

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

1 code implementation • CVPR 2023 • Junru Gu, Chenxu Hu, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, Hang Zhao

In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene.

Autonomous Driving Trajectory Prediction

114

Paper
Code

Beyond Visual Field of View: Perceiving 3D Environment with Echoes and Vision

1 code implementation • 3 Jul 2022 • Lingyu Zhu, Esa Rahtu, Hang Zhao

This paper focuses on perceiving and navigating 3D environments using echoes and RGB image.

Depth Estimation Robot Navigation

Paper
Code

Augmented Imagefication: A Data-driven Fault Detection Method for Aircraft Air Data Sensors

no code implementations • 18 Jun 2022 • Hang Zhao, Jinyi Ma, Zhongzhi Li, Yiqun Dong, Jianliang Ai

In this paper, a novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed.

Fault Detection

Paper
Add Code

VectorMapNet: End-to-end Vectorized HD Map Learning

2 code implementations • 17 Jun 2022 • Yicheng Liu, Tianyuan Yuan, Yue Wang, Yilun Wang, Hang Zhao

To the best of our knowledge, VectorMapNet is the first work designed towards end-to-end vectorized map learning from onboard observations.

Ranked #1 on HD semantic map learning on nuScenes

3D Lane Detection Autonomous Driving +2

376

Paper
Code

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

2 code implementations • 13 Jun 2022 • Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao

Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications.

Knowledge Distillation Transfer Learning

Paper
Code

R4D: Utilizing Reference Objects for Long-Range Distance Estimation

no code implementations • ICLR 2022 • Yingwei Li, Tiffany Chen, Maya Kabkab, Ruichi Yu, Longlong Jing, Yurong You, Hang Zhao

An edge in the graph encodes the relative distance information between a pair of target and reference objects.

Autonomous Driving

Paper
Add Code

Depth Estimation Matters Most: Improving Per-Object Depth Estimation for Monocular 3D Detection and Tracking

no code implementations • 8 Jun 2022 • Longlong Jing, Ruichi Yu, Henrik Kretzschmar, Kang Li, Charles R. Qi, Hang Zhao, Alper Ayvaci, Xu Chen, Dillon Cower, Yingwei Li, Yurong You, Han Deng, CongCong Li, Dragomir Anguelov

Monocular image-based 3D perception has become an active research area in recent years owing to its applications in autonomous driving.

Autonomous Driving Depth Estimation +1

Paper
Add Code

Learning Visual Styles from Audio-Visual Associations

no code implementations • 10 May 2022 • Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao

Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization.

Image Stylization

Paper
Add Code

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation

1 code implementation • 6 May 2022 • Zui Chen, Yansen Jing, Shengcheng Yuan, Yifei Xu, Jian Wu, Hang Zhao

Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design.

Audio Classification Audio Signal Processing

Paper
Code

MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries

1 code implementation • 2 May 2022 • Tianyuan Zhang, Xuanyao Chen, Yue Wang, Yilun Wang, Hang Zhao

In contrast to prior works, MUTR3D does not explicitly rely on the spatial and appearance similarity of objects.

Autonomous Driving Depth Estimation

170

Paper
Code

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

no code implementations • 5 Apr 2022 • Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao

Multimodal fusion emerges as an appealing technique to improve model performances on many tasks.

Paper
Add Code

Egocentric Prediction of Action Target in 3D

no code implementations • CVPR 2022 • Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng

We are interested in anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace from egocentric vision.

Paper
Add Code

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

no code implementations • 22 Mar 2022 • Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal

We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.

Representation Learning Self-Supervised Learning

Paper
Add Code

FUTR3D: A Unified Sensor Fusion Framework for 3D Detection

1 code implementation • 20 Mar 2022 • Xuanyao Chen, Tianyuan Zhang, Yue Wang, Yilun Wang, Hang Zhao

Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics.

Autonomous Driving Sensor Fusion

237

Paper
Code

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation

1 code implementation • 17 Mar 2022 • Renhao Wang, Hang Zhao, Yang Gao

Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO.

Contrastive Learning Object +1

Paper
Code

M2I: From Factored Marginal Trajectory Prediction to Interactive Prediction

no code implementations • CVPR 2022 • Qiao Sun, Xin Huang, Junru Gu, Brian C. Williams, Hang Zhao

Predicting future motions of road participants is an important task for driving autonomously in urban scenes.

Open-Ended Question Answering Trajectory Prediction

Paper
Add Code

S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification

1 code implementation • 21 Feb 2022 • Hang Zhao, Chen Zhang, Belei Zhu, Zejun Ma, Kejun Zhang

To our knowledge, S3T is the first method combining the Swin Transformer with a self-supervised learning method for music classification.

Classification Data Augmentation +5

Paper
Code

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

no code implementations • 17 Jan 2022 • Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinghong Jiang, Feng Zhao, Bolei Zhou, Hang Zhao

This map enables our model to automate the alignment of non-homogenous features in a dynamic and data-driven manner.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Embracing Single Stride 3D Object Detector with Sparse Transformer

2 code implementations • CVPR 2022 • Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.

Ranked #3 on 3D Object Detection on waymo cyclist

3D Object Detection Autonomous Driving +3

733

Paper
Code

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

no code implementations • ICLR 2022 • Qi Li, Kaichun Mo, Yanchao Yang, Hang Zhao, Leonidas Guibas

While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e. g., a switch on the wall turns on or off the light, a remote control operates the TV).

Object

Paper
Add Code

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

1 code implementation • 9 Dec 2021 • Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, Junjun Jiang, Bolei Zhou, Hang Zhao

To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks.

Contrastive Learning Unsupervised Pre-training

Paper
Code

Neural Dubber: Dubbing for Videos According to Scripts

no code implementations • NeurIPS 2021 • Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.

Paper
Add Code

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

1 code implementation • 13 Oct 2021 • Yue Wang, Vitor Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, Justin Solomon

This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model.

Ranked #7 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Autonomous Driving +5

716

Paper
Code

Modality Laziness: Everybody's Business is Nobody's Business

no code implementations • 29 Sep 2021 • Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Yue Wang, Yang Yuan, Hang Zhao

We name this problem of multi-modal training, \emph{Modality Laziness}.

Paper
Add Code

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

1 code implementation • ICLR 2022 • Hang Zhao, Yang Yu, Kai Xu

PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL).

3D Bin Packing

208

Paper
Code

Learning Practically Feasible Policies for Online 3D Bin Packing

2 code implementations • 31 Aug 2021 • Hang Zhao, Chenyang Zhu, Xin Xu, Hui Huang, Kai Xu

In this problem, the items are delivered to the agent without informing the full sequence information.

3D Bin Packing Collision Avoidance

249

Paper
Code

DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets

2 code implementations • ICCV 2021 • Junru Gu, Chen Sun, Hang Zhao

In this work, we propose an anchor-free and end-to-end trajectory prediction model, named DenseTNT, that directly outputs a set of trajectories from dense goal candidates.

Motion Forecasting motion prediction +1

420

Paper
Code

HDMapNet: An Online HD Map Construction and Evaluation Framework

3 code implementations • 13 Jul 2021 • Qi Li, Yue Wang, Yilun Wang, Hang Zhao

By introducing the method and metrics, we invite the community to study this novel map learning problem.

Autonomous Driving HD semantic map learning

634

Paper
Code

HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps

no code implementations • CVPR 2021 • Lu Mi, Hang Zhao, Charlie Nash, Xiaohan Jin, Jiyang Gao, Chen Sun, Cordelia Schmid, Nir Shavit, Yuning Chai, Dragomir Anguelov

To address this issue, we introduce a new challenging task to generate HD maps.

Graph Generation Motion Forecasting +1

Paper
Add Code

DenseTNT: Waymo Open Dataset Motion Prediction Challenge 1st Place Solution

1 code implementation • 27 Jun 2021 • Junru Gu, Qiao Sun, Hang Zhao

In autonomous driving, goal-based multi-trajectory prediction methods are proved to be effective recently, where they first score goal candidates, then select a final set of goals, and finally complete trajectories based on the selected goals.

Autonomous Driving motion prediction +1

420

Paper
Code

Intrinsically Motivated Self-supervised Learning in Reinforcement Learning

no code implementations • 26 Jun 2021 • Yue Zhao, Chenzhuang Du, Hang Zhao, Tiejun Li

In vision-based reinforcement learning (RL) tasks, it is prevalent to assign auxiliary tasks with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency.

Decision Making reinforcement-learning +3

Paper
Add Code

Co-advise: Cross Inductive Bias Distillation

no code implementations • CVPR 2022 • Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao

Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.

Inductive Bias

Paper
Add Code

Improving Multi-Modal Learning with Uni-Modal Teachers

no code implementations • 21 Jun 2021 • Chenzhuang Du, Tingle Li, Yichen Liu, Zixin Wen, Tianyu Hua, Yue Wang, Hang Zhao

We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning.

Ranked #60 on Semantic Segmentation on NYU Depth v2

Image Segmentation Semantic Segmentation

Paper
Add Code

What Makes Multi-modal Learning Better than Single (Provably)

no code implementations • NeurIPS 2021 • Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang

The world provides us with data of multiple modalities.

Paper
Add Code

On Feature Decorrelation in Self-Supervised Learning

1 code implementation • ICCV 2021 • Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.

Representation Learning Self-Supervised Learning

Paper
Code

Large Scale Interactive Motion Forecasting for Autonomous Driving : The Waymo Open Motion Dataset

no code implementations • 20 Apr 2021 • Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov

Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.

Motion Forecasting Motion Planning

Paper
Add Code

Multimodal Knowledge Expansion

1 code implementation • ICCV 2021 • Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data.

Ranked #63 on Semantic Segmentation on NYU Depth v2

Denoising Knowledge Distillation +1

Paper
Code

Predictive Visual Tracking: A New Benchmark and Baseline Approach

2 code implementations • 8 Mar 2021 • Bowen Li, Yiming Li, Junjie Ye, Changhong Fu, Hang Zhao

As a crucial robotic perception capability, visual tracking has been intensively studied recently.

Visual Tracking

Paper
Code

Large Scale Interactive Motion Forecasting for Autonomous Driving: The Waymo Open Motion Dataset

no code implementations • ICCV 2021 • Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov

Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.

Motion Forecasting Motion Planning

Paper
Add Code

AETree: Areal Spatial Data Generation

no code implementations • 1 Jan 2021 • Congcong Wen, Wenyu Han, Hang Zhao, Chen Feng

Areal spatial data represent not only geographical locations but also sizes and shapes of physical objects such as buildings in a city.

Clustering

Paper
Add Code

UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging

no code implementations • NeurIPS 2020 • Chu Zhou, Hang Zhao, Jin Han, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi

A conventional camera often suffers from over- or under-exposure when recording a real-world scene with a very high dynamic range (HDR).

Paper
Add Code

Unsupervised Monocular Depth Learning in Dynamic Scenes

5 code implementations • 30 Oct 2020 • Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision.

Ranked #8 on Unsupervised Monocular Depth Estimation on Cityscapes

Depth Prediction Monocular Depth Estimation +2

32,717

Paper
Code

CLOUD: Contrastive Learning of Unsupervised Dynamics

no code implementations • 23 Oct 2020 • Jianren Wang, Yujie Lu, Hang Zhao

Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently.

Contrastive Learning

Paper
Add Code

LID 2020: The Learning from Imperfect Data Challenge Results

no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He

The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.

object-detection Object Detection +5

Paper
Add Code

SEMI: Self-supervised Exploration via Multisensory Incongruity

no code implementations • 26 Sep 2020 • Jianren Wang, Ziwen Zhuang, Hang Zhao

The variance of actions is further used to measure action incongruity.

Efficient Exploration

Paper
Add Code

Multivariate Time-series Anomaly Detection via Graph Attention Network

2 code implementations • 4 Sep 2020 • Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, Qi Zhang

Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications.

Anomaly Detection Graph Attention +3

290

Paper
Code

TNT: Target-driveN Trajectory Prediction

4 code implementations • 19 Aug 2020 • Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp, Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai, Cordelia Schmid, Cong-Cong Li, Dragomir Anguelov

Our key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states.

Ranked #2 on Trajectory Prediction on INTERACTION Dataset - Validation

Motion Forecasting Trajectory Prediction

458

Paper
Code

Online 3D Bin Packing with Constrained Deep Reinforcement Learning

1 code implementation • 26 Jun 2020 • Hang Zhao, Qijin She, Chenyang Zhu, Yin Yang, Kai Xu

We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP).

3D Bin Packing Collision Avoidance +2

249

Paper
Code

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

3 code implementations • CVPR 2020 • Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cong-Cong Li, Cordelia Schmid

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e. g. pedestrians and vehicles) and road context information (e. g. lanes, traffic lights).

Self-Driving Cars

458

Paper
Code

Music Gesture for Visual Sound Separation

no code implementations • CVPR 2020 • Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba

Recent deep learning approaches have achieved impressive performance on visual sound separation tasks.

Optical Flow Estimation

Paper
Add Code

AlignNet: A Unifying Approach to Audio-Visual Alignment

1 code implementation • 12 Feb 2020 • Jianren Wang, Zhaoyuan Fang, Hang Zhao

We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments.

Paper
Code

Neural network with data augmentation in multi-objective prediction of multi-stage pump

no code implementations • 4 Feb 2020 • Hang Zhao

Finally, a neural network model based on data augmentation (NNDA) is proposed for the reason that simulation cost is too high and data is scarce in mechanical simulation field especially in CFD problems.

Data Augmentation

Paper
Add Code

Scalability in Perception for Autonomous Driving: Waymo Open Dataset

8 code implementations • CVPR 2020 • Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov

In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset.

Autonomous Driving

4,766

Paper
Code

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations • ICCV 2019 • Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

Paper
Add Code

Active Scene Understanding via Online Semantic Reconstruction

no code implementations • 18 Jun 2019 • Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, Kai Xu

In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene.

Scene Understanding Semantic Segmentation

Paper
Add Code

Self-Supervised Audio-Visual Co-Segmentation

no code implementations • 18 Apr 2019 • Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.

Image Segmentation Segmentation +1

Paper
Add Code

The Sound of Motions

1 code implementation • ICCV 2019 • Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

Sounds originate from object motions and vibrations of surrounding air.

Paper
Code

RF-based 3D skeletons

no code implementations • SIGCOMM '18 Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication 2018 • Ming-Min Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, Antonio Torralba

It maintains this accuracy even in the presence of multiple people, and in new environments that it has not seen in the training set.

RF-based Pose Estimation

Paper
Add Code

Through-Wall Human Pose Estimation Using Radio Signals

no code implementations • CVPR 2018 • Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi

Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.

RF-based Pose Estimation

Paper
Add Code

The Sound of Pixels

2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

362

Paper
Code

HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization

2 code implementations • ICCV 2019 • Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan

This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.

Ranked #10 on Temporal Action Localization on HACS

Action Classification Action Recognition +3

183

Paper
Code

Scene Parsing Through ADE20K Dataset

no code implementations • CVPR 2017 • Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba

A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.

Object Scene Parsing +1

Paper
Add Code

Open Vocabulary Scene Parsing

no code implementations • ICCV 2017 • Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.

General Classification Scene Parsing

Paper
Add Code

Semantic Understanding of Scenes through the ADE20K Dataset

21 code implementations • 18 Aug 2016 • Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.

Scene Parsing Segmentation +1

4,832

Paper
Code

Loss Functions for Neural Networks for Image Processing

2 code implementations • 28 Nov 2015 • Hang Zhao, Orazio Gallo, Iuri Frosio, Jan Kautz

Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems.

Image Restoration

9,333

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.