1 code implementation • 14 Mar 2023 • Runsheng Xu, Xin Xia, Jinlong Li, Hanzhao Li, Shuo Zhang, Zhengzhong Tu, Zonglin Meng, Hao Xiang, Xiaoyu Dong, Rui Song, Hongkai Yu, Bolei Zhou, Jiaqi Ma
To facilitate the development of cooperative perception, we present V2V4Real, the first large-scale real-world multi-modal dataset for V2V perception.
no code implementations • 3 Mar 2023 • Zhenghai Xue, Zhenghao Peng, Quanyi Li, Zhihan Liu, Bolei Zhou
Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance.
no code implementations • 20 Jan 2023 • Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou
Generative models make huge progress to the photorealistic image synthesis in recent years.
no code implementations • 12 Jan 2023 • Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou
In this work we investigate that such a generative feature learned from image synthesis exhibits great potentials in solving a wide range of computer vision tasks, including both generative ones and more importantly discriminative ones.
no code implementations • 11 Jan 2023 • Alexander Swerdlow, Runsheng Xu, Bolei Zhou
Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it provides a concise and unified spatial representation across views and benefits a diverse set of downstream driving applications.
no code implementations • 22 Dec 2022 • Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov
Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects.
1 code implementation • 14 Dec 2022 • Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou
Video generation requires synthesizing consistent and persistent frames with dynamic content over time.
Ranked #1 on
Video Generation
on YouTube Driving
1 code implementation • 27 Sep 2022 • Hao Xiang, Runsheng Xu, Xin Xia, Zhaoliang Zheng, Bolei Zhou, Jiaqi Ma
Recent advancements in Vehicle-to-Everything communication technology have enabled autonomous vehicles to share sensory information to obtain better perception performance.
no code implementations • 20 Sep 2022 • Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, Bolei Zhou
Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator.
1 code implementation • 15 Sep 2022 • Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou
We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.
1 code implementation • 5 Jul 2022 • Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma
The extensive experiments on the V2V perception dataset, OPV2V, demonstrate that CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.
1 code implementation • 31 May 2022 • Quanyi Li, Zhenghao Peng, Haibin Wu, Lan Feng, Bolei Zhou
Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called \textit{Policy Dissection} to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior.
no code implementations • 5 Apr 2022 • Qihang Zhang, Zhenghao Peng, Bolei Zhou
Specifically, we train an inverse dynamic model with a small amount of labeled data and use it to predict action labels for all the YouTube video frames.
1 code implementation • CVPR 2022 • Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou
To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations.
Ranked #2 on
Gesture Generation
on TED Gesture Dataset
no code implementations • 21 Mar 2022 • Jiankai Sun, Bolei Zhou, Michael J. Black, Arjun Chandrasekaran
An important component of this problem is 3D Temporal Action Localization (3D-TAL), which involves recognizing what actions a person is performing, and when.
no code implementations • ICLR 2022 • Quanyi Li, Zhenghao Peng, Bolei Zhou
HACO can train agents to drive in unseen traffic scenarios with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin.
1 code implementation • 13 Feb 2022 • Xian Liu, Rui Qian, Hang Zhou, Di Hu, Weiyao Lin, Ziwei Liu, Bolei Zhou, Xiaowei Zhou
Specifically, we observe that the previous practice of learning only a single audio representation is insufficient due to the additive nature of audio signals.
no code implementations • 19 Jan 2022 • Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, Bolei Zhou
Moreover, to enable portrait rendering in one unified neural radiance field, a Torso Deformation module is designed to stabilize the large-scale non-rigid torso motions.
no code implementations • 17 Jan 2022 • Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinghong Jiang, Feng Zhao, Bolei Zhou, Hang Zhao
This map enables our model to automate the alignment of non-homogenous features in a dynamic and data-driven manner.
1 code implementation • CVPR 2022 • Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, Bolei Zhou
The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis.
no code implementations • CVPR 2022 • Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin
Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.
1 code implementation • 9 Dec 2021 • Zhenyu Li, Zehui Chen, Ang Li, Liangji Fang, Qinhong Jiang, Xianming Liu, Junjun Jiang, Bolei Zhou, Hang Zhao
To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks.
1 code implementation • CVPR 2022 • Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou
We further propose to align the spatial awareness of G with the attention map induced from D. Through this way we effectively lessen the information gap between D and G. Extensive results show that our method pushes the two-player game in GANs closer to the equilibrium, leading to a better synthesis performance.
no code implementations • 18 Nov 2021 • Ceyuan Yang, Yujun Shen, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Zhirong Wu, Bolei Zhou
We then equip the well-learned discriminator backbone with an attribute classifier to ensure that the generator captures the appropriate characters from the reference.
2 code implementations • NeurIPS 2021 • Zhenghao Peng, Quanyi Li, Ka Ming Hui, Chunxiao Liu, Bolei Zhou
Self-Driven Particles (SDP) describe a category of multi-agent systems common in everyday life, such as flocking birds and traffic flows.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 25 Oct 2021 • Rui Xu, Xiangyu Xu, Kai Chen, Bolei Zhou, Chen Change Loy
Transformer becomes prevalent in computer vision, especially for high-level vision tasks.
1 code implementation • 13 Oct 2021 • Zhenghao Peng, Quanyi Li, Chunxiao Liu, Bolei Zhou
Offline RL technique is further used to learn from the partial demonstration generated by the expert.
no code implementations • 29 Sep 2021 • Yuanqi Du, Xian Liu, Shengchao Liu, Bolei Zhou
In this work, we develop a simple yet effective method to interpret the latent space of the learned generative models with various molecular properties for more interactive molecule generation and discovery.
no code implementations • 29 Sep 2021 • Zhihan Liu, Hao Sun, Bolei Zhou
To this end, we propose a novel meta-algorithm Self-Imitation Policy Learning through Iterative Distillation (SPLID) which relies on the concept of $\delta$-distilled policy to iteratively level up the quality of the target data and agent mimics from the relabeled target data.
no code implementations • 29 Sep 2021 • Hao Sun, Lei Han, Jian Guo, Bolei Zhou
We verify our insight on a range of tasks: (1) In offline RL, the conservative exploitation leads to improved learning performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to trade-off between exploration and exploitation thus improving learning efficiency; (3) In online RL with discrete action space, a negative reward shifting brings an improvement over the previous curiosity-based exploration method.
no code implementations • 29 Sep 2021 • Haoyue Bai, Ceyuan Yang, Yinghao Xu, S.-H. Gary Chan, Bolei Zhou
In this paper, we employ interpolated generative models to generate OoD samples at training time via data augmentation.
2 code implementations • 26 Sep 2021 • Quanyi Li, Zhenghao Peng, Lan Feng, Qihang Zhang, Zhenghai Xue, Bolei Zhou
Based on MetaDrive, we construct a variety of RL tasks and baselines in both single-agent and multi-agent settings, including benchmarking generalizability across unseen scenes, safe exploration, and learning multi-agent traffic.
no code implementations • 10 Aug 2021 • Bolei Zhou
Significant progress has been made by the advances in Generative Adversarial Networks (GANs) for image generation.
no code implementations • 9 Jul 2021 • Hao Sun, Ziping Xu, Meng Fang, Zhenghao Peng, Jiadong Guo, Bo Dai, Bolei Zhou
Safe exploration is crucial for the real-world application of reinforcement learning (RL).
1 code implementation • NeurIPS 2021 • Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou
Meanwhile, the learned instance discrimination capability from the discriminator is in turn exploited to encourage the generator for diverse generation.
Ranked #8 on
Image Generation
on FFHQ 256 x 256
2 code implementations • ICCV 2021 • Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye
TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization.
Weakly Supervised Object Localization
Weakly-Supervised Object Localization
1 code implementation • CVPR 2021 • Yicheng Liu, Jinghuai Zhang, Liangji Fang, Qinhong Jiang, Bolei Zhou
Predicting multiple plausible future trajectories of the nearby vehicles is crucial for the safety of autonomous driving.
no code implementations • 13 Mar 2021 • Kaiwen Zha, Yujun Shen, Bolei Zhou
In this work, we study the image transformation problem, which targets at learning the underlying transformations (e. g., the transition of seasons) from a collection of unlabeled images.
1 code implementation • CVPR 2021 • Ceyuan Yang, Zhirong Wu, Bolei Zhou, Stephen Lin
The pretext task is to predict the instance category given the composited images as well as the foreground bounding boxes.
no code implementations • 26 Jan 2021 • Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, Li Liu
Scene classification, aiming at classifying a scene image to one of the predefined scene categories by comprehending the entire image, is a longstanding, fundamental and challenging problem in computer vision.
1 code implementation • 14 Jan 2021 • Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.
no code implementations • 1 Jan 2021 • Hao Sun, Ziping Xu, Meng Fang, Yuhang Song, Jiechao Xiong, Bo Dai, Zhengyou Zhang, Bolei Zhou
Despite the remarkable progress made by the policy gradient algorithms in reinforcement learning (RL), sub-optimal policies usually result from the local exploration property of the policy gradient update.
2 code implementations • 26 Dec 2020 • Quanyi Li, Zhenghao Peng, Qihang Zhang, Chunxiao Liu, Bolei Zhou
We validate that training with the increasing number of procedurally generated scenes significantly improves the generalization of the agent across scenarios of different traffic densities and road networks.
1 code implementation • 9 Dec 2020 • Shuhan Tan, Yujun Shen, Bolei Zhou
Generative Adversarial Networks (GANs) advance face synthesis through learning the underlying distribution of observed data.
no code implementations • CVPR 2021 • Rui Xu, Xintao Wang, Kai Chen, Bolei Zhou, Chen Change Loy
In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators.
1 code implementation • 28 Sep 2020 • Rui Xu, Minghao Guo, Jiaqi Wang, Xiaoxiao Li, Bolei Zhou, Chen Change Loy
By bringing together the best of both paradigms, we propose a new deep inpainting framework where texture generation is guided by a texture memory of patch samples extracted from unmasked regions.
2 code implementations • 10 Sep 2020 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, Antonio Torralba
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
no code implementations • ECCV 2020 • Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang, Bolei Zhou, Dahua Lin
Shots are key narrative elements of various videos, e. g. movies, TV series, and user-generated videos that are thriving over the Internet.
1 code implementation • CVPR 2021 • Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou
Generative Adversarial Networks (GANs) have recently advanced image synthesis by learning the underlying distribution of the observed data.
9 code implementations • CVPR 2021 • Yujun Shen, Bolei Zhou
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
1 code implementation • 29 Jun 2020 • Yinghao Xu, Ceyuan Yang, Ziwei Liu, Bo Dai, Bolei Zhou
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses.
1 code implementation • 28 Jun 2020 • Ceyuan Yang, Yinghao Xu, Bo Dai, Bolei Zhou
Visual tempo, which describes how fast an action goes, has shown its potential in supervised action recognition.
no code implementations • 14 Jun 2020 • Zhenghao Peng, Hao Sun, Bolei Zhou
Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently.
no code implementations • 11 Jun 2020 • Hao Sun, Ziping Xu, Yuhang Song, Meng Fang, Jiechao Xiong, Bo Dai, Bolei Zhou
However, PG algorithms rely on exploiting the value function being learned with the first-order update locally, which results in limited sample efficiency.
1 code implementation • 21 May 2020 • Hao Sun, Zhenghao Peng, Bo Dai, Jian Guo, Dahua Lin, Bolei Zhou
In problem-solving, we humans can come up with multiple novel solutions to the same problem.
2 code implementations • 18 May 2020 • Yujun Shen, Ceyuan Yang, Xiaoou Tang, Bolei Zhou
In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space.
1 code implementation • 15 May 2020 • David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, Antonio Torralba
First, it is hard for GANs to precisely reproduce an input image.
2 code implementations • 27 Apr 2020 • Hao Sun, Xinyu Pan, Bo Dai, Dahua Lin, Bolei Zhou
Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging reinforcement learning problem due to the sparsity of reward signals.
no code implementations • CVPR 2020 • Liangji Fang, Qinhong Jiang, Jianping Shi, Bolei Zhou
However, it remains difficult for these methods to provide multimodal predictions as well as integrate physical constraints such as traffic rules and movable areas.
3 code implementations • CVPR 2020 • Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, Bolei Zhou
Previous works often capture the visual tempo through sampling raw videos at multiple rates and constructing an input-level frame pyramid, which usually requires a costly multi-branch network to handle.
Ranked #87 on
Action Recognition
on Something-Something V2
3 code implementations • CVPR 2020 • Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin
Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.
no code implementations • CVPR 2020 • Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy
We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person.
2 code implementations • ECCV 2020 • Jiapeng Zhu, Yujun Shen, Deli Zhao, Bolei Zhou
A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.
1 code implementation • CVPR 2020 • Jinjin Gu, Yujun Shen, Bolei Zhou
Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors.
Ranked #6 on
Blind Face Restoration
on CelebA-Test
no code implementations • 30 Nov 2019 • Junning Huang, Sirui Xie, Jiankai Sun, Qiurui Ma, Chunxiao Liu, Jianping Shi, Dahua Lin, Bolei Zhou
In this work, we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning.
no code implementations • 28 Nov 2019 • Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo
Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.
2 code implementations • 21 Nov 2019 • Ceyuan Yang, Yujun Shen, Bolei Zhou
Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs.
1 code implementation • NeurIPS 2019 • Hao Sun, Zhizhong Li, Xiaotong Liu, Dahua Lin, Bolei Zhou
This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay, enabling the learning process in a self-imitated manner and thus can be trained with supervised learning.
1 code implementation • ICCV 2019 • David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, Antonio Torralba
Differences in statistics reveal object classes that are omitted by a GAN.
no code implementations • ICCV 2019 • Yu Xiong, Qingqiu Huang, Lingfeng Guo, Hang Zhou, Bolei Zhou, Dahua Lin
On top of this dataset, we develop a framework to perform matching between movie segments and synopsis paragraphs.
no code implementations • 25 Sep 2019 • Jiapeng Zhu, Deli Zhao, Bolei Zhou, Bo Zhang
A two-stage stochasticity-free training scheme is designed to train LIA via adversarial learning, in the sense that the decoder of LIA is first trained as a standard GAN with the invertible network and then the partial encoder is learned from an autoencoder by detaching the invertible network from LIA.
no code implementations • 25 Sep 2019 • Ceyuan Yang, Yujun Shen, Bolei Zhou
Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what networks have learned inside the deep generative representations and how photo-realistic images are able to be composed from random noises.
no code implementations • 25 Sep 2019 • Hao Sun, Bo Dai, Jiankai Sun, Zhenghao Peng, Guodong Xu, Dahua Lin, Bolei Zhou
In this work we model the social influence into the scheme of reinforcement learning, enabling the agents to learn both from the environment and from their peers.
no code implementations • ICCV 2019 • Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou
The model not only finds when an action is happening and which object is being manipulated, but also identifies which part of the object is being interacted with.
4 code implementations • CVPR 2020 • Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou
In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs.
3 code implementations • 19 Jun 2019 • Jiapeng Zhu, Deli Zhao, Bo Zhang, Bolei Zhou
In this paper, we show that the entanglement of the latent space for the VAE/GAN framework poses the main challenge for encoder learning.
1 code implementation • 9 Jun 2019 • Bowen Pan, Jiankai Sun, Ho Yin Tiga Leung, Alex Andonian, Bolei Zhou
Our further experiment on a LoCoBot robot shows that our model enables the surrounding sensing capability from 2D image input.
2 code implementations • CVPR 2019 • Rui Xu, Xiaoxiao Li, Bolei Zhou, Chen Change Loy
Then the synthesized flow field is used to guide the propagation of pixels to fill up the missing regions in the video.
Ranked #7 on
Video Inpainting
on DAVIS
One-shot visual object segmentation
Optical Flow Estimation
+2
no code implementations • ICLR Workshop DeepGenStruct 2019 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba
We present an analytic framework to visualize and understand GANs at the unit-, object-, and scene-level.
no code implementations • 29 Jan 2019 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba
We quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.
no code implementations • 25 Jan 2019 • Quanshi Zhang, Lixin Fan, Bolei Zhou
This is the Proceedings of AAAI 2019 Workshop on Network Interpretability for Deep Learning
no code implementations • 4 Dec 2018 • Yujun Shen, Bolei Zhou, Ping Luo, Xiaoou Tang
In the second stage, they compete in the image domain to render photo-realistic images that contain high diversity but preserve identity.
9 code implementations • ICLR 2019 • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, Antonio Torralba
Then, we quantify the causal effect of interpretable units by measuring the ability of interventions to control objects in the output.
1 code implementation • ECCV 2018 • Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba
Explanations of the decisions made by a deep neural network are important for human end-users to be able to understand and diagnose the trustworthiness of the system.
no code implementations • ECCV 2018 • Wei-Chiu Ma, Hang Chu, Bolei Zhou, Raquel Urtasun, Antonio Torralba
At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image.
2 code implementations • 3 Aug 2018 • Jimmy Wu, Bolei Zhou, Rebecca Russell, Vincent Kee, Syler Wagner, Mitchell Hebert, Antonio Torralba, David M. S. Johnson
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation.
18 code implementations • ECCV 2018 • Tete Xiao, Yingcheng Liu, Bolei Zhou, Yuning Jiang, Jian Sun
In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image.
Ranked #86 on
Semantic Segmentation
on ADE20K val
1 code implementation • ECCV 2018 • Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, Xiaogang Wang
Generating scene graph to describe all the relations inside an image gains increasing interests these years.
Ranked #1 on
Scene Graph Generation
on VRD
no code implementations • 7 Jun 2018 • Bolei Zhou, Yiyou Sun, David Bau, Antonio Torralba
We confirm that unit attributes such as class selectivity are a poor predictor for impact on overall accuracy as found previously in recent work \cite{morcos2018importance}.
1 code implementation • 31 May 2018 • Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson
We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions.
1 code implementation • 13 Mar 2018 • Jimmy Wu, Diondra Peck, Scott Hsieh, Vandana Dialani, Constance D. Lehman, Bolei Zhou, Vasilis Syrgkanis, Lester Mackey, Genevieve Patterson
This work interprets the internal representations of deep neural networks trained for classification of diseased tissue in 2D mammograms.
no code implementations • CVPR 2018 • Bowen Pan, Wuwei Lin, Xiaolin Fang, Chaoqin Huang, Bolei Zhou, Cewu Lu
Deep convolutional neural networks (CNNs) have made impressive progress in many video recognition tasks such as video pose estimation and video object detection.
4 code implementations • 9 Jan 2018 • Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.
3 code implementations • ECCV 2018 • Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.
Ranked #2 on
Hand Gesture Recognition
on Jester test
2 code implementations • 15 Nov 2017 • Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba
In this work, we describe Network Dissection, a method that interprets networks by providing labels for the units of their deep visual representations.
no code implementations • CVPR 2018 • Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang
Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately.
1 code implementation • ICCV 2017 • Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang
Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information.
Ranked #2 on
Object Detection
on Visual Genome
no code implementations • CVPR 2017 • Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, Antonio Torralba
A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines.
1 code implementation • CVPR 2017 • David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba
Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer.
no code implementations • ICCV 2017 • Hang Zhao, Xavier Puig, Bolei Zhou, Sanja Fidler, Antonio Torralba
Recognizing arbitrary objects in the wild has been a challenging problem due to the limitations of existing classification models and datasets.
2 code implementations • 5 Mar 2017 • Jay M. Wong, Vincent Kee, Tiffany Le, Syler Wagner, Gian-Luca Mariottini, Abraham Schneider, Lei Hamilton, Rahul Chipalkatty, Mitchell Hebert, David M. S. Johnson, Jimmy Wu, Bolei Zhou, Antonio Torralba
Recent robotic manipulation competitions have highlighted that sophisticated robots still struggle to achieve fast and reliable perception of task-relevant objects in complex, realistic scenarios.
1 code implementation • CVPR 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang
Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance.
1 code implementation • 6 Oct 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, Aude Oliva
The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification at tasks such as object and scene recognition.
21 code implementations • 18 Aug 2016 • Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision.
32 code implementations • CVPR 2016 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.
Ranked #2 on
Weakly-Supervised Object Localization
on ILSVRC 2015
7 code implementations • 7 Dec 2015 • Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus
We describe a very simple bag-of-words baseline for visual question answering.
1 code implementation • 21 Oct 2015 • Zi Wang, Bolei Zhou, Stefanie Jegelka
Recently, there has been rising interest in Bayesian optimization -- the optimization of an unknown function with assumptions usually expressed by a Gaussian Process (GP) prior.
no code implementations • 9 Jul 2015 • Donglai Wei, Bolei Zhou, Antonio Torrabla, William Freeman
Convolutional Neural Network (CNN) has been successful in image recognition tasks, and recent works shed lights on how CNN separates different classes with the learned inter-class knowledge through visualization.
1 code implementation • 22 Dec 2014 • Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
With the success of new computational architectures for visual processing, such as convolutional neural networks (CNN) and access to image databases with millions of labeled examples (e. g., ImageNet, Places), the state of the art in computer vision is advancing rapidly.
no code implementations • NeurIPS 2014 • Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, Aude Oliva
Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success.
no code implementations • CVPR 2015 • Bolei Zhou, Vignesh Jagadeesh, Robinson Piramuthu
Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition system, since it is expensive to obtain fully labeled data for a large number of concept categories.
no code implementations • CVPR 2013 • Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields.