no code implementations • 7 Dec 2022 • Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li
Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads to inferior performance with other modalities such as depth gradient, (2)the original affinity loss does not prevent trivial predictions as intended but actually accelerates this process due to the affinity loss term being symmetric.
Box-supervised Instance Segmentation
Semantic Segmentation
+1
no code implementations • 5 Dec 2022 • Hang Zhao, Zherong Pan, Yang Yu, Kai Xu
We study the problem of learning online packing skills for irregular 3D shapes, which is arguably the most challenging setting of bin packing problems.
no code implementations • 21 Nov 2022 • Bowen Li, Ziyuan Huang, Junjie Ye, Yiming Li, Sebastian Scherer, Hang Zhao, Changhong Fu
Visual object tracking is essential to intelligent robots.
no code implementations • 3 Nov 2022 • Qiao Sun, Xin Huang, Brian C. Williams, Hang Zhao
Motion prediction is crucial in enabling safe motion planning for autonomous vehicles in interactive scenarios.
no code implementations • 9 Aug 2022 • Xin Huang, Xiaoyu Tian, Junru Gu, Qiao Sun, Hang Zhao
Recently, the occupancy flow fields representation was proposed to represent joint future states of road agents through a combination of occupancy grid and flow, which supports efficient and consistent joint predictions.
no code implementations • 2 Aug 2022 • Junru Gu, Chenxu Hu, Tianyuan Zhang, Xuanyao Chen, Yilun Wang, Yue Wang, Hang Zhao
In the existing autonomous driving systems, perception and prediction are two separate modules.
no code implementations • 3 Jul 2022 • Lingyu Zhu, Esa Rahtu, Hang Zhao
This paper focuses on perceiving and navigating 3D environments using echoes and RGB image.
no code implementations • 18 Jun 2022 • Hang Zhao, Jinyi Ma, Zhongzhi Li, Yiqun Dong, Jianliang Ai
In this paper, a novel data-driven approach named Augmented Imagefication for Fault detection (FD) of aircraft air data sensors (ADS) is proposed.
1 code implementation • 17 Jun 2022 • Yicheng Liu, Yuantian Yuan, Yue Wang, Yilun Wang, Hang Zhao
To the best of our knowledge, VectorMapNet is the first work designed towards end-to-end vectorized map learning from onboard observations.
Ranked #1 on
3D Lane Detection
on OpenLane-V2
1 code implementation • 13 Jun 2022 • Zihui Xue, Zhengqi Gao, Sucheng Ren, Hang Zhao
Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications.
no code implementations • ICLR 2022 • Yingwei Li, Tiffany Chen, Maya Kabkab, Ruichi Yu, Longlong Jing, Yurong You, Hang Zhao
An edge in the graph encodes the relative distance information between a pair of target and reference objects.
no code implementations • 8 Jun 2022 • Longlong Jing, Ruichi Yu, Henrik Kretzschmar, Kang Li, Charles R. Qi, Hang Zhao, Alper Ayvaci, Xu Chen, Dillon Cower, Yingwei Li, Yurong You, Han Deng, CongCong Li, Dragomir Anguelov
Monocular image-based 3D perception has become an active research area in recent years owing to its applications in autonomous driving.
no code implementations • 10 May 2022 • Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao
Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization.
1 code implementation • 6 May 2022 • Zui Chen, Yansen Jing, Shengcheng Yuan, Yifei Xu, Jian Wu, Hang Zhao
Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design.
1 code implementation • 2 May 2022 • Tianyuan Zhang, Xuanyao Chen, Yue Wang, Yilun Wang, Hang Zhao
In contrast to prior works, MUTR3D does not explicitly rely on the spatial and appearance similarity of objects.
no code implementations • 5 Apr 2022 • Zhengqi Gao, Sucheng Ren, Zihui Xue, Siting Li, Hang Zhao
Multimodal fusion emerges as an appealing technique to improve model performances on many tasks.
no code implementations • CVPR 2022 • Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng
We are interested in anticipating as early as possible the target location of a person's object manipulation action in a 3D workspace from egocentric vision.
no code implementations • 22 Mar 2022 • Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal
We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.
no code implementations • 20 Mar 2022 • Xuanyao Chen, Tianyuan Zhang, Yue Wang, Yilun Wang, Hang Zhao
Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics.
1 code implementation • 17 Mar 2022 • Renhao Wang, Hang Zhao, Yang Gao
Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO.
no code implementations • CVPR 2022 • Qiao Sun, Xin Huang, Junru Gu, Brian C. Williams, Hang Zhao
Predicting future motions of road participants is an important task for driving autonomously in urban scenes.
no code implementations • 21 Feb 2022 • Hang Zhao, Chen Zhang, Belei Zhu, Zejun Ma, Kejun Zhang
To our knowledge, S3T is the first method combining the Swin Transformer with a self-supervised learning method for music classification.
no code implementations • 17 Jan 2022 • Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinghong Jiang, Feng Zhao, Bolei Zhou, Hang Zhao
This map enables our model to automate the alignment of non-homogenous features in a dynamic and data-driven manner.
2 code implementations • CVPR 2022 • Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Ranked #3 on
3D Object Detection
on waymo pedestrian
no code implementations • ICLR 2022 • Qi Li, Kaichun Mo, Yanchao Yang, Hang Zhao, Leonidas Guibas
While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e. g., a switch on the wall turns on or off the light, a remote control operates the TV).
1 code implementation • 9 Dec 2021 • Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, Junjun Jiang, Bolei Zhou, Hang Zhao
To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks.
no code implementations • NeurIPS 2021 • Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao
Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.
1 code implementation • 13 Oct 2021 • Yue Wang, Vitor Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, Justin Solomon
This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model.
1 code implementation • ICLR 2022 • Hang Zhao, Yang Yu, Kai Xu
PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL).
no code implementations • 29 Sep 2021 • Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Yue Wang, Yang Yuan, Hang Zhao
We name this problem of multi-modal training, \emph{Modality Laziness}.
2 code implementations • 31 Aug 2021 • Hang Zhao, Chenyang Zhu, Xin Xu, Hui Huang, Kai Xu
In this problem, the items are delivered to the agent without informing the full sequence information.
2 code implementations • ICCV 2021 • Junru Gu, Chen Sun, Hang Zhao
In this work, we propose an anchor-free and end-to-end trajectory prediction model, named DenseTNT, that directly outputs a set of trajectories from dense goal candidates.
2 code implementations • 13 Jul 2021 • Qi Li, Yue Wang, Yilun Wang, Hang Zhao
By introducing the method and metrics, we invite the community to study this novel map learning problem.
no code implementations • CVPR 2021 • Lu Mi, Hang Zhao, Charlie Nash, Xiaohan Jin, Jiyang Gao, Chen Sun, Cordelia Schmid, Nir Shavit, Yuning Chai, Dragomir Anguelov
To address this issue, we introduce a new challenging task to generate HD maps.
1 code implementation • 27 Jun 2021 • Junru Gu, Qiao Sun, Hang Zhao
In autonomous driving, goal-based multi-trajectory prediction methods are proved to be effective recently, where they first score goal candidates, then select a final set of goals, and finally complete trajectories based on the selected goals.
no code implementations • 26 Jun 2021 • Yue Zhao, Chenzhuang Du, Hang Zhao, Tiejun Li
In vision-based reinforcement learning (RL) tasks, it is prevalent to assign auxiliary tasks with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency.
no code implementations • CVPR 2022 • Sucheng Ren, Zhengqi Gao, Tianyu Hua, Zihui Xue, Yonglong Tian, Shengfeng He, Hang Zhao
Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks.
no code implementations • 21 Jun 2021 • Chenzhuang Du, Tingle Li, Yichen Liu, Zixin Wen, Tianyu Hua, Yue Wang, Hang Zhao
We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning.
Ranked #33 on
Semantic Segmentation
on NYU Depth v2
no code implementations • NeurIPS 2021 • Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang
The world provides us with data of multiple modalities.
1 code implementation • ICCV 2021 • Tianyu Hua, Wenxiao Wang, Zihui Xue, Sucheng Ren, Yue Wang, Hang Zhao
In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.
no code implementations • 20 Apr 2021 • Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov
Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.
1 code implementation • ICCV 2021 • Zihui Xue, Sucheng Ren, Zhengqi Gao, Hang Zhao
The popularity of multimodal sensors and the accessibility of the Internet have brought us a massive amount of unlabeled multimodal data.
Ranked #35 on
Semantic Segmentation
on NYU Depth v2
1 code implementation • 8 Mar 2021 • Bowen Li, Yiming Li, Junjie Ye, Changhong Fu, Hang Zhao
As a crucial robotic perception capability, visual tracking has been intensively studied recently.
no code implementations • 1 Jan 2021 • Congcong Wen, Wenyu Han, Hang Zhao, Chen Feng
Areal spatial data represent not only geographical locations but also sizes and shapes of physical objects such as buildings in a city.
no code implementations • ICCV 2021 • Scott Ettinger, Shuyang Cheng, Benjamin Caine, Chenxi Liu, Hang Zhao, Sabeek Pradhan, Yuning Chai, Ben Sapp, Charles R. Qi, Yin Zhou, Zoey Yang, Aurelien Chouard, Pei Sun, Jiquan Ngiam, Vijay Vasudevan, Alexander McCauley, Jonathon Shlens, Dragomir Anguelov
Furthermore, we introduce a new set of metrics that provides a comprehensive evaluation of both single agent and joint agent interaction motion forecasting models.
no code implementations • NeurIPS 2020 • Chu Zhou, Hang Zhao, Jin Han, Chang Xu, Chao Xu, Tiejun Huang, Boxin Shi
A conventional camera often suffers from over- or under-exposure when recording a real-world scene with a very high dynamic range (HDR).
4 code implementations • 30 Oct 2020 • Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, Anelia Angelova
We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision.
no code implementations • 23 Oct 2020 • Jianren Wang, Yujie Lu, Hang Zhao
Developing agents that can perform complex control tasks from high dimensional observations such as pixels is challenging due to difficulties in learning dynamics efficiently.
no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He
The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.
no code implementations • 26 Sep 2020 • Jianren Wang, Ziwen Zhuang, Hang Zhao
The variance of actions is further used to measure action incongruity.
2 code implementations • 4 Sep 2020 • Hang Zhao, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, Qi Zhang
Anomaly detection on multivariate time-series is of great importance in both data mining research and industrial applications.
3 code implementations • 19 Aug 2020 • Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp, Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai, Cordelia Schmid, Cong-Cong Li, Dragomir Anguelov
Our key insight is that for prediction within a moderate time horizon, the future modes can be effectively captured by a set of target states.
1 code implementation • 26 Jun 2020 • Hang Zhao, Qijin She, Chenyang Zhu, Yin Yang, Kai Xu
We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP).
4 code implementations • CVPR 2020 • Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cong-Cong Li, Cordelia Schmid
Behavior prediction in dynamic, multi-agent systems is an important problem in the context of self-driving cars, due to the complex representations and interactions of road components, including moving agents (e. g. pedestrians and vehicles) and road context information (e. g. lanes, traffic lights).
no code implementations • CVPR 2020 • Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba
Recent deep learning approaches have achieved impressive performance on visual sound separation tasks.
1 code implementation • 12 Feb 2020 • Jianren Wang, Zhaoyuan Fang, Hang Zhao
We present AlignNet, a model that synchronizes videos with reference audios under non-uniform and irregular misalignments.
no code implementations • 4 Feb 2020 • Hang Zhao
Finally, a neural network model based on data augmentation (NNDA) is proposed for the reason that simulation cost is too high and data is scarce in mechanical simulation field especially in CFD problems.
8 code implementations • CVPR 2020 • Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Sheng Zhao, Shuyang Cheng, Yu Zhang, Jonathon Shlens, Zhifeng Chen, Dragomir Anguelov
In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset.
no code implementations • ICCV 2019 • Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba
At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.
no code implementations • 18 Jun 2019 • Lintao Zheng, Chenyang Zhu, Jiazhao Zhang, Hang Zhao, Hui Huang, Matthias Niessner, Kai Xu
In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene.
no code implementations • 18 Apr 2019 • Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh Mcdermott, Antonio Torralba
Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data.
no code implementations • ICCV 2019 • Hang Zhao, Chuang Gan, Wei-Chiu Ma, Antonio Torralba
Sounds originate from object motions and vibrations of surrounding air.
no code implementations • SIGCOMM '18 Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication 2018 • Ming-Min Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, Antonio Torralba
It maintains this accuracy even in the presence of multiple people, and in new environments that it has not seen in the training set.
no code implementations • CVPR 2018 • Ming-Min Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, Dina Katabi
Yet, unlike vision-based pose estimation, the radio-based system can estimate 2D poses through walls despite never trained on such scenarios.
2 code implementations • ECCV 2018 • Hang Zhao, Chuang Gan, Andrew Rouditchenko, Carl Vondrick, Josh Mcdermott, Antonio Torralba
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos, learns to locate image regions which produce sounds and separate the input sounds into a set of components that represents the sound from each pixel.
1 code implementation • ICCV 2019 • Hang Zhao, Antonio Torralba, Lorenzo Torresani, Zhicheng Yan
This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.
Ranked #5 on
Temporal Action Localization
on HACS
no code implementations • CVPR 2017 • Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.
no code implementations • ICCV 2017 • Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.
21 code implementations • 18 Aug 2016 • Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.
2 code implementations • 28 Nov 2015 • Hang Zhao, Orazio Gallo, Iuri Frosio, Jan Kautz
Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems.