Search Results for author: Xiaoliang Dai

Found 35 papers, 12 papers with code

Efficient Quantization Strategies for Latent Diffusion Models

no code implementations • 9 Dec 2023 • Yuewei Yang, Xiaoliang Dai, Jialiang Wang, Peizhao Zhang, Hongbo Zhang

By treating the quantization discrepancy as relative noise and identifying sensitive part(s) of a model, we propose an efficient quantization approach encompassing both global and local strategies.

Quantization Text-to-Image Generation

Paper
Add Code

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

no code implementations • 6 Dec 2023 • Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu

Additionally, existing diffusion-based image manipulation models are sub-optimal in controlling the state transition of an action in egocentric image pixel space because of the domain gap.

Image Manipulation Language Modelling +1

Paper
Add Code

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

no code implementations • 6 Dec 2023 • Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang

However, one of the major drawbacks of diffusion models is that the image generation process is costly.

Denoising Image Generation

Paper
Add Code

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

1 code implementation • 1 Dec 2023 • Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.

Ranked #3 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Image Classification Instance Segmentation +5

1,730

Paper
Code

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh

Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.

Image Generation

Paper
Add Code

Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations

no code implementations • 21 Sep 2023 • Zhihang Yu, Shu Wang, Yunqiang Zhu, Wen Yuan, Xiaoliang Dai, Zhiqiang Zou

However, current recommendation algorithms in the field of computer science fall short in adequately addressing the spatial heterogeneity related to environment and sparsity of regional historical interaction data, which limits their effectiveness in recommending sustainable development patterns.

Paper
Add Code

Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

no code implementations • CVPR 2023 • Yonggan Fu, Yuecheng Li, Chenghui Li, Jason Saragih, Peizhao Zhang, Xiaoliang Dai, Yingyan Lin

Real-time and robust photorealistic avatars for telepresence in AR/VR have been highly desired for enabling immersive photorealistic telepresence.

Neural Architecture Search

Paper
Add Code

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

no code implementations • 21 Mar 2023 • Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani

We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code.

Contrastive Learning Text to 3D

Paper
Add Code

Trainable Projected Gradient Method for Robust Fine-tuning

2 code implementations • CVPR 2023 • Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira

To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.

Transfer Learning

Paper
Code

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

no code implementations • CVPR 2023 • Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner

Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images.

Contrastive Learning Instance Segmentation +6

Paper
Add Code

Pruning Compact ConvNets for Efficient Inference

no code implementations • 11 Jan 2023 • Sayan Ghosh, Karthik Prasad, Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Graham Cormode, Peter Vajda

The resulting family of pruned models can consistently obtain better performance than existing FBNetV3 models at the same level of computation, and thus provide state-of-the-art results when trading off between computational complexity and generalization performance on the ImageNet benchmark.

Network Pruning Neural Architecture Search

Paper
Add Code

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

1 code implementation • CVPR 2023 • Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Efficient ViTs

Paper
Code

3D-Aware Encoding for Style-based Neural Radiance Fields

no code implementations • 12 Nov 2022 • Yu-Jhe Li, Tao Xu, Bichen Wu, Ningyuan Zheng, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani

In the first stage, we introduce a base encoder that converts the input image to a latent code.

Contrastive Learning Image Reconstruction

Paper
Add Code

Token Merging: Your ViT But Faster

3 code implementations • 17 Oct 2022 • Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman

Off-the-shelf, ToMe can 2x the throughput of state-of-the-art ViT-L @ 512 and ViT-H @ 518 models on images and 2. 2x the throughput of ViT-L on video with only a 0. 2-0. 3% accuracy drop in each case.

Ranked #13 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs

1,205

Paper
Code

Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

1 code implementation • CVPR 2023 • Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu

To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.

Ranked #6 on Open Vocabulary Semantic Segmentation on PascalVOC-20

Image Captioning Open Vocabulary Semantic Segmentation +1

625

Paper
Code

Hydra Attention: Efficient Attention with Many Heads

no code implementations • 15 Sep 2022 • Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman

While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult.

Paper
Add Code

Open-Set Semi-Supervised Object Detection

no code implementations • 29 Aug 2022 • Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.

Object object-detection +3

Paper
Add Code

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

2 code implementations • 14 Jul 2022 • Tunhou Zhang, Dehua Cheng, Yuchen He, Zhengxing Chen, Xiaoliang Dai, Liang Xiong, Feng Yan, Hai Li, Yiran Chen, Wei Wen

To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i. e., search space) to search the full architectures.

Click-Through Rate Prediction Neural Architecture Search +1

Paper
Code

Cross-Domain Adaptive Teacher for Object Detection

2 code implementations • CVPR 2022 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +3

170

Paper
Code

FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

no code implementations • 19 Nov 2021 • Bichen Wu, Chaojian Li, Hang Zhang, Xiaoliang Dai, Peizhao Zhang, Matthew Yu, Jialiang Wang, Yingyan Lin, Peter Vajda

To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort.

Ranked #7 on Neural Architecture Search on ImageNet

Classification Image Classification +4

Paper
Add Code

Adaptive Unbiased Teacher for Cross-Domain Object Detection

no code implementations • 29 Sep 2021 • Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris M. Kitani, Peter Vajda

This enables the student model to capture domain-invariant features.

Data Augmentation Domain Adaptation +3

Paper
Add Code

An Investigation on Hardware-Aware Vision Transformer Scaling

no code implementations • 29 Sep 2021 • Chaojian Li, KyungMin Kim, Bichen Wu, Peizhao Zhang, Hang Zhang, Xiaoliang Dai, Peter Vajda, Yingyan Lin

In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from $74. 6\%$ to $76. 7\%$ ($\uparrow2. 1\%$) under the same 0. 7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by $\uparrow0. 7\%$ under a similar throughput on a V100 GPU.

Image Classification object-detection +2

Paper
Add Code

Visual Transformers: Where Do Transformers Really Belong in Vision Models?

no code implementations • ICCV 2021 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph E. Gonzalez, Kurt Keutzer, Peter Vajda

A recent trend in computer vision is to replace convolutions with transformers.

Semantic Segmentation

Paper
Add Code

FP-NAS: Fast Probabilistic Neural Architecture Search

no code implementations • CVPR 2021 • Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli

Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.

Neural Architecture Search

Paper
Add Code

Fully Dynamic Inference with Deep Neural Networks

no code implementations • 29 Jul 2020 • Wenhan Xia, Hongxu Yin, Xiaoliang Dai, Niraj K. Jha

Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction.

Computational Efficiency Self-Driving Cars

Paper
Add Code

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

8 code implementations • 5 Jun 2020 • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships.

General Classification Image Classification +1

178

Paper
Code

FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining

2 code implementations • CVPR 2021 • Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Bichen Wu, Zijian He, Zhen Wei, Kan Chen, Yuandong Tian, Matthew Yu, Peter Vajda, Joseph E. Gonzalez

To address this, we present Neural Architecture-Recipe Search (NARS) to search both (a) architectures and (b) their corresponding training recipes, simultaneously.

Ranked #5 on Neural Architecture Search on ImageNet

Neural Architecture Search object-detection +1

29,680

Paper
Code

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions

1 code implementation • CVPR 2020 • Alvin Wan, Xiaoliang Dai, Peizhao Zhang, Zijian He, Yuandong Tian, Saining Xie, Bichen Wu, Matthew Yu, Tao Xu, Kan Chen, Peter Vajda, Joseph E. Gonzalez

We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands.

Ranked #68 on Neural Architecture Search on ImageNet

Neural Architecture Search

889

Paper
Code

STEERAGE: Synthesis of Neural Networks Using Architecture Search and Grow-and-Prune Methods

no code implementations • 12 Dec 2019 • Shayan Hassantabar, Xiaoliang Dai, Niraj K. Jha

On MNIST dataset, our CNN architecture achieves an error rate of 0. 66%, with 8. 6x fewer parameters compared to the LeNet-5 baseline.

Navigate

Paper
Add Code

DiabDeep: Pervasive Diabetes Diagnosis based on Wearable Medical Sensors and Efficient Neural Networks

no code implementations • 11 Oct 2019 • Hongxu Yin, Bilal Mukadam, Xiaoliang Dai, Niraj K. Jha

For server (edge) side inference, we achieve a 96. 3% (95. 3%) accuracy in classifying diabetics against healthy individuals, and a 95. 7% (94. 6%) accuracy in distinguishing among type-1/type-2 diabetic, and healthy individuals.

Paper
Add Code

Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks

no code implementations • 27 May 2019 • Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

Deep neural networks (DNNs) have become a widely deployed model for numerous machine learning applications.

Incremental Learning

Paper
Add Code

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

1 code implementation • CVPR 2019 • Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha

We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors.

Bayesian Optimization Efficient Neural Network +1

889

Paper
Code

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

5 code implementations • CVPR 2019 • Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer

Due to this, previous neural architecture search (NAS) methods are computationally expensive.

Ranked #890 on Image Classification on ImageNet

Image Classification Neural Architecture Search

889

Paper
Code

Grow and Prune Compact, Fast, and Accurate LSTMs

no code implementations • 30 May 2018 • Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one level non-linear control gates.

Image Captioning speech-recognition +1

Paper
Add Code

NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm

no code implementations • 6 Nov 2017 • Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

To address these problems, we introduce a network growth algorithm that complements network pruning to learn both weights and compact DNN architectures during training.

Network Pruning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.