Search Results for author: Peter Vajda

Found 45 papers, 20 papers with code

FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining

2 code implementations • CVPR 2021 • Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Bichen Wu, Zijian He, Zhen Wei, Kan Chen, Yuandong Tian, Matthew Yu, Peter Vajda, Joseph E. Gonzalez

To address this, we present Neural Architecture-Recipe Search (NARS) to search both (a) architectures and (b) their corresponding training recipes, simultaneously.

Ranked #5 on Neural Architecture Search on ImageNet

Neural Architecture Search object-detection +1

29,713

Paper
Code

NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

2 code implementations • ICCV 2023 • Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, RuiLong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input.

3D Object Detection Depth Estimation +1

4,790

Paper
Code

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

5 code implementations • CVPR 2019 • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer

Due to this, previous neural architecture search (NAS) methods are computationally expensive.

Ranked #890 on Image Classification on ImageNet

Image Classification Neural Architecture Search

890

Paper
Code

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

1 code implementation • CVPR 2019 • Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha

We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors.

Bayesian Optimization Efficient Neural Network +1

890

Paper
Code

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

1 code implementation • CVPR 2020 • Alvin Wan, Xiaoliang Dai, Peizhao Zhang, Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, Peter Vajda, Joseph E. Gonzalez

We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands.

Ranked #68 on Neural Architecture Search on ImageNet

Neural Architecture Search

890

Paper
Code

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

1 code implementation • CVPR 2023 • Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu

To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.

Ranked #6 on Open Vocabulary Semantic Segmentation on PascalVOC-20

Image Captioning Open Vocabulary Semantic Segmentation +1

627

Paper
Code

SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

3 code implementations • ECCV 2020 • Chenfeng Xu, Bichen Wu, Zining Wang, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image.

Ranked #24 on 3D Semantic Segmentation on SemanticKITTI

3D Semantic Segmentation Point Cloud Segmentation +1

534

Paper
Code

One Shot 3D Photography

1 code implementation • 27 Aug 2020 • Johannes Kopf, Kevin Matzen, Suhib Alsisan, Ocean Quigley, Francis Ge, Yangming Chong, Josh Patterson, Jan-Michael Frahm, Shu Wu, Matthew Yu, Peizhao Zhang, Zijian He, Peter Vajda, Ayush Saraf, Michael Cohen

3D photos are static in time, like traditional photos, but are displayed with interactive parallax on mobile or desktop screens, as well as on Virtual Reality devices, where viewing it also includes stereo.

Monocular Depth Estimation

468

Paper
Code

Unbiased Teacher for Semi-Supervised Object Detection

4 code implementations • ICLR 2021 • Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.

Ranked #2 on Semi-Supervised Person Bounding Box Detection on COCO 1% labeled data

Image Classification Object +4

411

Paper
Code

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

8 code implementations • 5 Jun 2020 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships.

General Classification Image Classification +1

178

Paper
Code

Cross-Domain Adaptive Teacher for Object Detection

2 code implementations • CVPR 2022 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +3

170

Paper
Code

Learning to Generate Grounded Visual Captions without Localization Supervision

2 code implementations • 1 Jun 2019 • Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira

When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the model is hallucinating based on priors in the dataset and/or the language model.

Image Captioning Language Modelling +2

154

Paper
Code

Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

1 code implementation • 8 Jun 2021 • Chenfeng Xu, Shijia Yang, Tomer Galanti, Bichen Wu, Xiangyu Yue, Bohan Zhai, Wei Zhan, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka

We discover that we can indeed use the same architecture and pretrained weights of a neural net model to understand both images and point-clouds.

3D Point Cloud Classification Point Cloud Classification +1

116

Paper
Code

AVID: Any-Length Video Inpainting with Diffusion Model

1 code implementation • 6 Dec 2023 • Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, Licheng Yu

Given a video, a masked region at its initial frame, and an editing prompt, it requires a model to do infilling at each frame following the editing guidance while keeping the out-of-mask region intact.

Image Inpainting Video Inpainting

Paper
Code

Data Efficient Language-supervised Zero-shot Recognition with Optimal Transport Distillation

1 code implementation • ICLR 2022 • Bichen Wu, Ruizhe Cheng, Peizhao Zhang, Tianren Gao, Peter Vajda, Joseph E. Gonzalez

Traditional computer vision models are trained to predict a fixed set of predefined categories.

Contrastive Learning Knowledge Distillation +1

Paper
Code

Deep Space-Time Video Upsampling Networks

1 code implementation • ECCV 2020 • Jaeyeon Kang, Younghyun Jo, Seoung Wug Oh, Peter Vajda, Seon Joo Kim

Video super-resolution (VSR) and frame interpolation (FI) are traditional computer vision problems, and the performance have been improving by incorporating deep learning recently.

Motion Compensation Video Super-Resolution

Paper
Code

Tackling the Ill-Posedness of Super-Resolution Through Adaptive Target Generation

1 code implementation • CVPR 2021 • Younghyun Jo, Seoung Wug Oh, Peter Vajda, Seon Joo Kim

By the one-to-many nature of the super-resolution (SR) problem, a single low-resolution (LR) image can be mapped to many high-resolution (HR) images.

Ranked #3 on Blind Super-Resolution on DIV2KRK - 4x upscaling

Blind Super-Resolution Super-Resolution +1

Paper
Code

Efficient Segmentation: Learning Downsampling Near Semantic Boundaries

1 code implementation • ICCV 2019 • Dmitrii Marin, Zijian He, Peter Vajda, Priyam Chatterjee, Sam Tsai, Fei Yang, Yuri Boykov

Many automated processes such as auto-piloting rely on a good semantic segmentation as a critical component.

Computational Efficiency Segmentation +1

Paper
Code

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

1 code implementation • CVPR 2023 • Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Efficient ViTs

Paper
Code

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

Paper
Code

Value-aware Quantization for Training and Inference of Neural Networks

no code implementations • ECCV 2018 • Eunhyeok Park, Sungjoo Yoo, Peter Vajda

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large data in high precision, which reduces total quantization errors under very low precision.

Quantization

Paper
Add Code

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

no code implementations • ICLR 2019 • Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, Kurt Keutzer

Recent work in network quantization has substantially reduced the time and space complexity of neural network inference, enabling their deployment on embedded and mobile devices with limited computational and memory resources.

Neural Architecture Search Quantization

Paper
Add Code

Precision Highway for Ultra Low-Precision Quantization

no code implementations • ICLR 2019 • Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo, Peter Vajda

We also report that the proposed method significantly outperforms the existing method in the 2-bit quantization of an LSTM for language modeling.

Language Modelling Quantization

Paper
Add Code

Learning the Loss Functions in a Discriminative Space for Video Restoration

no code implementations • 20 Mar 2020 • Younghyun Jo, Jaeyeon Kang, Seoung Wug Oh, Seonghyeon Nam, Peter Vajda, Seon Joo Kim

Our framework is similar to GANs in that we iteratively train two networks - a generator and a loss network.

Deblurring Video Restoration

Paper
Add Code

Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild

no code implementations • ECCV 2020 • Alexander Grabner, Yaming Wang, Peizhao Zhang, Peihong Guo, Tong Xiao, Peter Vajda, Peter M. Roth, Vincent Lepetit

We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild.

Paper
Add Code

FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

no code implementations • 25 Nov 2020 • Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda

More efficient variants of FBWave can achieve up to 109x fewer MACs while still delivering acceptable audio quality.

Paper
Add Code

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation

no code implementations • 18 Apr 2021 • Ruizhe Cheng, Bichen Wu, Peizhao Zhang, Peter Vajda, Joseph E. Gonzalez

Our model transfers knowledge from pretrained image and sentence encoders and achieves strong performance with only 3M image text pairs, 133x smaller than CLIP.

Sentence Zero-Shot Learning

Paper
Add Code

Visual Transformers: Where Do Transformers Really Belong in Vision Models?

no code implementations • ICCV 2021 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph E. Gonzalez, Kurt Keutzer, Peter Vajda

A recent trend in computer vision is to replace convolutions with transformers.

Semantic Segmentation

Paper
Add Code

Adaptive Unbiased Teacher for Cross-Domain Object Detection

no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda

This enables the student model to capture domain-invariant features.

Data Augmentation Domain Adaptation +3

Paper
Add Code

An Investigation on Hardware-Aware Vision Transformer Scaling

no code implementations • 29 Sep 2021 • Chaojian Li, KyungMin Kim, Bichen Wu, Peizhao Zhang, Hang Zhang, Xiaoliang Dai, Peter Vajda, Yingyan Lin

In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from $74. 6\%$ to $76. 7\%$ ($\uparrow2. 1\%$) under the same 0. 7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by $\uparrow0. 7\%$ under a similar throughput on a V100 GPU.

Image Classification object-detection +2

Paper
Add Code

FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

no code implementations • 19 Nov 2021 • Bichen Wu, Chaojian Li, Hang Zhang, Xiaoliang Dai, Peizhao Zhang, Matthew Yu, Jialiang Wang, Yingyan Lin, Peter Vajda

To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort.

Ranked #7 on Neural Architecture Search on ImageNet

Classification Image Classification +4

Paper
Add Code

Open-Set Semi-Supervised Object Detection

no code implementations • 29 Aug 2022 • Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.

Object object-detection +3

Paper
Add Code

3D-Aware Encoding for Style-based Neural Radiance Fields

no code implementations • 12 Nov 2022 • Yu-Jhe Li, Tao Xu, Bichen Wu, Ningyuan Zheng, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani

In the first stage, we introduce a base encoder that converts the input image to a latent code.

Contrastive Learning Image Reconstruction

Paper
Add Code

XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse

no code implementations • 16 Nov 2022 • Hyoukjun Kwon, Krishnakumar Nair, Jamin Seo, Jason Yik, Debabrata Mohapatra, Dongyuan Zhan, Jinook Song, Peter Capak, Peizhao Zhang, Peter Vajda, Colby Banbury, Mark Mazumder, Liangzhen Lai, Ashish Sirasao, Tushar Krishna, Harshit Khaitan, Vikas Chandra, Vijay Janapa Reddi

We hope that our work will stimulate research and lead to the development of a new generation of ML systems for XR use cases.

Paper
Add Code

A Practical Stereo Depth System for Smart Glasses

no code implementations • CVPR 2023 • Jialiang Wang, Daniel Scharstein, Akash Bapat, Kevin Blackburn-Matzen, Matthew Yu, Jonathan Lehman, Suhib Alsisan, Yanghan Wang, Sam Tsai, Jan-Michael Frahm, Zijian He, Peter Vajda, Michael F. Cohen, Matt Uyttendaele

We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable.

Monocular Depth Estimation Stereo Depth Estimation

Paper
Add Code

INGeo: Accelerating Instant Neural Scene Reconstruction with Noisy Geometry Priors

no code implementations • 5 Dec 2022 • Chaojian Li, Bichen Wu, Albert Pumarola, Peizhao Zhang, Yingyan Lin, Peter Vajda

We present a method that accelerates reconstruction of 3D scenes and objects, aiming to enable instant reconstruction on edge devices such as mobile phones and AR/VR headsets.

Novel View Synthesis

Paper
Add Code

Pruning Compact ConvNets for Efficient Inference

no code implementations • 11 Jan 2023 • Sayan Ghosh, Karthik Prasad, Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Graham Cormode, Peter Vajda

The resulting family of pruned models can consistently obtain better performance than existing FBNetV3 models at the same level of computation, and thus provide state-of-the-art results when trading off between computational complexity and generalization performance on the ImageNet benchmark.

Network Pruning Neural Architecture Search

Paper
Add Code

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

no code implementations • 21 Mar 2023 • Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani

We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code.

Contrastive Learning Text to 3D

Paper
Add Code

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh

Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.

Image Generation

Paper
Add Code

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

no code implementations • 6 Dec 2023 • Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang

However, one of the major drawbacks of diffusion models is that the image generation process is costly.

Denoising Image Generation

Paper
Add Code

ControlRoom3D: Room Generation using Semantic Proxy Rooms

no code implementations • 8 Dec 2023 • Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou

Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style.

Paper
Add Code

MixRT: Mixed Neural Representations For Real-Time NeRF Rendering

no code implementations • 19 Dec 2023 • Chaojian Li, Bichen Wu, Peter Vajda, Yingyan, Lin

Neural Radiance Field (NeRF) has emerged as a leading technique for novel view synthesis, owing to its impressive photorealistic reconstruction and rendering capability.

Novel View Synthesis

Paper
Add Code

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

no code implementations • 20 Dec 2023 • Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda

In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications.

Data Augmentation Video Editing +1

Paper
Add Code

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis

no code implementations • 29 Dec 2023 • Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu

This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames.

Optical Flow Estimation Video-to-Video Synthesis

Paper
Add Code

Animated Stickers: Bringing Stickers to Life with Video Diffusion

no code implementations • 8 Feb 2024 • David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang, Ankit Ramchandani, Miao Liu, Albert Pumarola, Edgar Schoenfeld, Elliot Blanchard, Krishna Narni, Yaqiao Luo, Lawrence Chen, Guan Pang, Ali Thabet, Peter Vajda, Amy Bearman, Licheng Yu

Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of temporal layers to model motion.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.