Search Results for author: Hang Zhao

Found 70 papers, 27 papers with code

AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation

no code implementations7 Dec 2022 Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li

Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads to inferior performance with other modalities such as depth gradient, (2)the original affinity loss does not prevent trivial predictions as intended but actually accelerates this process due to the affinity loss term being symmetric.

Box-supervised Instance Segmentation Semantic Segmentation +1

Learning Physically Realizable Skills for Online Packing of General 3D Shapes

no code implementations5 Dec 2022 Hang Zhao, Zherong Pan, Yang Yu, Kai Xu

We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems.

Action Generation Reinforcement Learning (RL)

P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving

no code implementations3 Nov 2022 Qiao Sun, Xin Huang, Brian C. Williams, Hang Zhao

Motion prediction is crucial in enabling safe motion planning for autonomous vehicles in interactive scenarios.

Autonomous Driving Motion Planning +1

VectorFlow: Combining Images and Vectors for Traffic Occupancy and Flow Prediction

no code implementations9 Aug 2022 Xin Huang, Xiaoyu Tian, Junru Gu, Qiao Sun, Hang Zhao

Recently, the occupancy flow fields representation was proposed to represent joint future states of road agents through a combination of occupancy grid and flow, which supports efficient and consistent joint predictions.

Autonomous Driving

Augmented Imagefication: A Data-driven Fault Detection Method for Aircraft Air Data Sensors

no code implementations18 Jun 2022 Hang Zhao, Jinyi Ma, Zhongzhi Li, Yiqun Dong, Jianliang Ai

In this paper, a novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed.

Fault Detection

VectorMapNet: End-to-end Vectorized HD Map Learning

1 code implementation17 Jun 2022 Yicheng Liu, Yuantian Yuan, Yue Wang, Yilun Wang, Hang Zhao

To the best of our knowledge, VectorMapNet is the first work designed towards end-to-end vectorized map learning from onboard observations.

3D Lane Detection Autonomous Driving +1

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

1 code implementation13 Jun 2022 Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao

Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications.

Knowledge Distillation Transfer Learning

Learning Visual Styles from Audio-Visual Associations

no code implementations10 May 2022 Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao

Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization.

Image Stylization

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation

1 code implementation6 May 2022 Zui Chen, Yansen Jing, Shengcheng Yuan, Yifei Xu, Jian Wu, Hang Zhao

Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design.

Audio Classification Audio Signal Processing

MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries

1 code implementation2 May 2022 Tianyuan Zhang, Xuanyao Chen, Yue Wang, Yilun Wang, Hang Zhao

In contrast to prior works, MUTR3D does not explicitly rely on the spatial and appearance similarity of objects.

Association Autonomous Driving +1

Training-Free Robust Multimodal Learning via Sample-Wise Jacobian Regularization

no code implementations5 Apr 2022 Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao

Multimodal fusion emerges as an appealing technique to improve model performances on many tasks.

Egocentric Prediction of Action Target in 3D

no code implementations CVPR 2022 Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng

We are interested in anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace from egocentric vision.

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

no code implementations22 Mar 2022 Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal

We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.

Representation Learning Self-Supervised Learning

FUTR3D: A Unified Sensor Fusion Framework for 3D Detection

no code implementations20 Mar 2022 Xuanyao Chen, Tianyuan Zhang, Yue Wang, Yilun Wang, Hang Zhao

Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics.

Autonomous Driving

CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation

1 code implementation17 Mar 2022 Renhao Wang, Hang Zhao, Yang Gao

Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO.

Contrastive Learning

S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification

no code implementations21 Feb 2022 Hang Zhao, Chen Zhang, Belei Zhu, Zejun Ma, Kejun Zhang

To our knowledge, S3T is the first method combining the Swin Transformer with a self-supervised learning method for music classification.

Classification Data Augmentation +5

Embracing Single Stride 3D Object Detector with Sparse Transformer

2 code implementations CVPR 2022 Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.

3D Object Detection Autonomous Driving +2

IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

no code implementations ICLR 2022 Qi Li, Kaichun Mo, Yanchao Yang, Hang Zhao, Leonidas Guibas

While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e. g., a switch on the wall turns on or off the light, a remote control operates the TV).

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

1 code implementation9 Dec 2021 Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, Junjun Jiang, Bolei Zhou, Hang Zhao

To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks.

Contrastive Learning Unsupervised Pre-training

Neural Dubber: Dubbing for Videos According to Scripts

no code implementations NeurIPS 2021 Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

1 code implementation13 Oct 2021 Yue Wang, Vitor Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, Justin Solomon

This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model.

3D Object Detection Autonomous Driving +3

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

1 code implementation ICLR 2022 Hang Zhao, Yang Yu, Kai Xu

PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL).

3D Bin Packing

Learning Practically Feasible Policies for Online 3D Bin Packing

2 code implementations31 Aug 2021 Hang Zhao, Chenyang Zhu, Xin Xu, Hui Huang, Kai Xu

In this problem, the items are delivered to the agent without informing the full sequence information.

3D Bin Packing

DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets

2 code implementations ICCV 2021 Junru Gu, Chen Sun, Hang Zhao

In this work, we propose an anchor-free and end-to-end trajectory prediction model, named DenseTNT, that directly outputs a set of trajectories from dense goal candidates.

Motion Forecasting motion prediction +1

HDMapNet: An Online HD Map Construction and Evaluation Framework

2 code implementations13 Jul 2021 Qi Li, Yue Wang, Yilun Wang, Hang Zhao

By introducing the method and metrics, we invite the community to study this novel map learning problem.

Autonomous Driving HD semantic map learning

DenseTNT: Waymo Open Dataset Motion Prediction Challenge 1st Place Solution

1 code implementation27 Jun 2021 Junru Gu, Qiao Sun, Hang Zhao

In autonomous driving, goal-based multi-trajectory prediction methods are proved to be effective recently, where they first score goal candidates, then select a final set of goals, and finally complete trajectories based on the selected goals.

Autonomous Driving motion prediction +1

Intrinsically Motivated Self-supervised Learning in Reinforcement Learning

no code implementations26 Jun 2021 Yue Zhao, Chenzhuang Du, Hang Zhao, Tiejun Li

In vision-based reinforcement learning (RL) tasks, it is prevalent to assign auxiliary tasks with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency.

Decision Making reinforcement-learning +3

Co-advise: Cross Inductive Bias Distillation

no code implementations CVPR 2022 Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao

Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.

Inductive Bias

Improving Multi-Modal Learning with Uni-Modal Teachers

no code implementations21 Jun 2021 Chenzhuang Du, Tingle Li, Yichen Liu, Zixin Wen, Tianyu Hua, Yue Wang, Hang Zhao

We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning.

Image Segmentation Semantic Segmentation

On Feature Decorrelation in Self-Supervised Learning

1 code implementation ICCV 2021 Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.

Representation Learning Self-Supervised Learning

Multimodal Knowledge Expansion

1 code implementation ICCV 2021 Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao

The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data.

Denoising Knowledge Distillation +1

Predictive Visual Tracking: A New Benchmark and Baseline Approach

1 code implementation8 Mar 2021 Bowen Li, Yiming Li, Junjie Ye, Changhong Fu, Hang Zhao

As a crucial robotic perception capability, visual tracking has been intensively studied recently.

Visual Tracking

AETree: Areal Spatial Data Generation

no code implementations1 Jan 2021 Congcong Wen, Wenyu Han, Hang Zhao, Chen Feng

Areal spatial data represent not only geographical locations but also sizes and shapes of physical objects such as buildings in a city.

UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging

no code implementations NeurIPS 2020 Chu Zhou, Hang Zhao, Jin Han, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi

A conventional camera often suffers from over- or under-exposure when recording a real-world scene with a very high dynamic range (HDR).

Unsupervised Monocular Depth Learning in Dynamic Scenes

4 code implementations30 Oct 2020 Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision.

Depth Estimation Depth Prediction +1

CLOUD: Contrastive Learning of Unsupervised Dynamics

no code implementations23 Oct 2020 Jianren Wang, Yujie Lu, Hang Zhao

Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently.

Contrastive Learning

Multivariate Time-series Anomaly Detection via Graph Attention Network

2 code implementations4 Sep 2020 Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, Qi Zhang

Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications.

Anomaly Detection Graph Attention +2

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation

4 code implementations CVPR 2020 Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cong-Cong Li, Cordelia Schmid

Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e. g. pedestrians and vehicles) and road context information (e. g. lanes, traffic lights).

Self-Driving Cars

AlignNet: A Unifying Approach to Audio-Visual Alignment

1 code implementation12 Feb 2020 Jianren Wang, Zhaoyuan Fang, Hang Zhao

We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments.

Neural network with data augmentation in multi-objective prediction of multi-stage pump

no code implementations4 Feb 2020 Hang Zhao

Finally, a neural network model based on data augmentation (NNDA) is proposed for the reason that simulation cost is too high and data is scarce in mechanical simulation field especially in CFD problems.

Data Augmentation

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations ICCV 2019 Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

Active Scene Understanding via Online Semantic Reconstruction

no code implementations18 Jun 2019 Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, Kai Xu

In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene.

Scene Understanding Semantic Segmentation

Self-Supervised Audio-Visual Co-Segmentation

no code implementations18 Apr 2019 Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.

Image Segmentation Semantic Segmentation

The Sound of Motions

no code implementations ICCV 2019 Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba

Sounds originate from object motions and vibrations of surrounding air.

Through-Wall Human Pose Estimation Using Radio Signals

no code implementations CVPR 2018 Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi

Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.

RF-based Pose Estimation

The Sound of Pixels

2 code implementations ECCV 2018 Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.

Scene Parsing Through ADE20K Dataset

no code implementations CVPR 2017 Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba

A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.

Scene Parsing

Open Vocabulary Scene Parsing

no code implementations ICCV 2017 Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba

Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.

General Classification Scene Parsing

Semantic Understanding of Scenes through the ADE20K Dataset

21 code implementations18 Aug 2016 Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba

Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.

Scene Parsing Semantic Segmentation

Loss Functions for Neural Networks for Image Processing

2 code implementations28 Nov 2015 Hang Zhao, Orazio Gallo, Iuri Frosio, Jan Kautz

Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems.

Image Restoration

Cannot find the paper you are looking for? You can Submit a new open access paper.