Search Results for author: Xiaoliang Dai

Found 40 papers, 14 papers with code

Towards Automated Model Design on Recommender Systems

1 code implementation12 Nov 2024 Tunhou Zhang, Dehua Cheng, Yuchen He, Zhengxing Chen, Xiaoliang Dai, Liang Xiong, Yudong Liu, Feng Cheng, Yufan Cao, Feng Yan, Hai Li, Yiran Chen, Wei Wen

Designing recommender systems using deep neural networks requires careful architecture design, and further optimization demands extensive co-design efforts on jointly optimizing model architecture and hardware.

AutoML Click-Through Rate Prediction +1

Movie Gen: A Cast of Media Foundation Models

1 code implementation17 Oct 2024 Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dimitry Vengertsev, Edgar Schonfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du

Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation.

Audio Generation Video Editing +1

An Analysis on Quantizing Diffusion Transformers

no code implementations16 Jun 2024 Yuewei Yang, Jialiang Wang, Xiaoliang Dai, Peizhao Zhang, Hongbo Zhang

Prior works address PTQ of DMs on UNet structures have addressed the challenges in calibrating parameters for both activations and weights via moderate optimization.

Conditional Image Generation Denoising +1

Layout Agnostic Scene Text Image Synthesis with Diffusion Models

no code implementations CVPR 2024 Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan

While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge.

Diversity Image Generation +3

Efficient Quantization Strategies for Latent Diffusion Models

no code implementations9 Dec 2023 Yuewei Yang, Xiaoliang Dai, Jialiang Wang, Peizhao Zhang, Hongbo Zhang

By treating the quantization discrepancy as relative noise and identifying sensitive part(s) of a model, we propose an efficient quantization approach encompassing both global and local strategies.

Quantization Text-to-Image Generation

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

no code implementations6 Dec 2023 Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu

Additionally, existing diffusion-based image manipulation models are sub-optimal in controlling the state transition of an action in egocentric image pixel space because of the domain gap.

Image Manipulation Language Modelling +1

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

1 code implementation CVPR 2024 Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.

Decoder Image Classification +6

Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations

no code implementations21 Sep 2023 Zhihang Yu, Shu Wang, Yunqiang Zhu, Wen Yuan, Xiaoliang Dai, Zhiqiang Zou

However, current recommendation algorithms in the field of computer science fall short in adequately addressing the spatial heterogeneity related to environment and sparsity of regional historical interaction data, which limits their effectiveness in recommending sustainable development patterns.

Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

no code implementations CVPR 2023 Yonggan Fu, Yuecheng Li, Chenghui Li, Jason Saragih, Peizhao Zhang, Xiaoliang Dai, Yingyan Lin

Real-time and robust photorealistic avatars for telepresence in AR/VR have been highly desired for enabling immersive photorealistic telepresence.

Neural Architecture Search

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

no code implementations21 Mar 2023 Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani

We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code.

Contrastive Learning Text to 3D

Trainable Projected Gradient Method for Robust Fine-tuning

2 code implementations CVPR 2023 Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira

To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.

Transfer Learning

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

no code implementations CVPR 2023 Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner

Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images.

Contrastive Learning Instance Segmentation +6

Pruning Compact ConvNets for Efficient Inference

no code implementations11 Jan 2023 Sayan Ghosh, Karthik Prasad, Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Graham Cormode, Peter Vajda

The resulting family of pruned models can consistently obtain better performance than existing FBNetV3 models at the same level of computation, and thus provide state-of-the-art results when trading off between computational complexity and generalization performance on the ImageNet benchmark.

Network Pruning Neural Architecture Search

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

2 code implementations CVPR 2023 Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Celine Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Efficient ViTs

Token Merging: Your ViT But Faster

4 code implementations17 Oct 2022 Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman

Off-the-shelf, ToMe can 2x the throughput of state-of-the-art ViT-L @ 512 and ViT-H @ 518 models on images and 2. 2x the throughput of ViT-L on video with only a 0. 2-0. 3% accuracy drop in each case.

Efficient ViTs

Hydra Attention: Efficient Attention with Many Heads

no code implementations15 Sep 2022 Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Judy Hoffman

While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult.

Open-Set Semi-Supervised Object Detection

no code implementations29 Aug 2022 Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.

Object object-detection +3

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

2 code implementations14 Jul 2022 Tunhou Zhang, Dehua Cheng, Yuchen He, Zhengxing Chen, Xiaoliang Dai, Liang Xiong, Feng Yan, Hai Li, Yiran Chen, Wei Wen

To overcome the data multi-modality and architecture heterogeneity challenges in the recommendation domain, NASRec establishes a large supernet (i. e., search space) to search the full architectures.

Click-Through Rate Prediction Neural Architecture Search +1

Cross-Domain Adaptive Teacher for Object Detection

2 code implementations CVPR 2022 Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +3

FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

no code implementations19 Nov 2021 Bichen Wu, Chaojian Li, Hang Zhang, Xiaoliang Dai, Peizhao Zhang, Matthew Yu, Jialiang Wang, Yingyan Lin, Peter Vajda

To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort.

Classification Image Classification +4

An Investigation on Hardware-Aware Vision Transformer Scaling

no code implementations29 Sep 2021 Chaojian Li, KyungMin Kim, Bichen Wu, Peizhao Zhang, Hang Zhang, Xiaoliang Dai, Peter Vajda, Yingyan Lin

In particular, when transferred to PiT, our scaling strategies lead to a boosted ImageNet top-1 accuracy of from $74. 6\%$ to $76. 7\%$ ($\uparrow2. 1\%$) under the same 0. 7G FLOPs; and when transferred to the COCO object detection task, the average precision is boosted by $\uparrow0. 7\%$ under a similar throughput on a V100 GPU.

Image Classification object-detection +2

FP-NAS: Fast Probabilistic Neural Architecture Search

no code implementations CVPR 2021 Zhicheng Yan, Xiaoliang Dai, Peizhao Zhang, Yuandong Tian, Bichen Wu, Matt Feiszli

Furthermore, to search fast in the multi-variate space, we propose a coarse-to-fine strategy by using a factorized distribution at the beginning which can reduce the number of architecture parameters by over an order of magnitude.

Neural Architecture Search

Fully Dynamic Inference with Deep Neural Networks

no code implementations29 Jul 2020 Wenhan Xia, Hongxu Yin, Xiaoliang Dai, Niraj K. Jha

Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction.

Computational Efficiency Self-Driving Cars

Visual Transformers: Token-based Image Representation and Processing for Computer Vision

8 code implementations5 Jun 2020 Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

In this work, we challenge this paradigm by (a) representing images as semantic visual tokens and (b) running transformers to densely model token relationships.

General Classification Image Classification +1

STEERAGE: Synthesis of Neural Networks Using Architecture Search and Grow-and-Prune Methods

no code implementations12 Dec 2019 Shayan Hassantabar, Xiaoliang Dai, Niraj K. Jha

On MNIST dataset, our CNN architecture achieves an error rate of 0. 66%, with 8. 6x fewer parameters compared to the LeNet-5 baseline.

Navigate

DiabDeep: Pervasive Diabetes Diagnosis based on Wearable Medical Sensors and Efficient Neural Networks

no code implementations11 Oct 2019 Hongxu Yin, Bilal Mukadam, Xiaoliang Dai, Niraj K. Jha

For server (edge) side inference, we achieve a 96. 3% (95. 3%) accuracy in classifying diabetics against healthy individuals, and a 95. 7% (94. 6%) accuracy in distinguishing among type-1/type-2 diabetic, and healthy individuals.

Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks

no code implementations27 May 2019 Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

Deep neural networks (DNNs) have become a widely deployed model for numerous machine learning applications.

Incremental Learning

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

1 code implementation CVPR 2019 Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha

We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors.

Bayesian Optimization Efficient Neural Network +2

Grow and Prune Compact, Fast, and Accurate LSTMs

no code implementations30 May 2018 Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one level non-linear control gates.

Image Captioning speech-recognition +1

NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm

no code implementations6 Nov 2017 Xiaoliang Dai, Hongxu Yin, Niraj K. Jha

To address these problems, we introduce a network growth algorithm that complements network pruning to learn both weights and compact DNN architectures during training.

Network Pruning

Cannot find the paper you are looking for? You can Submit a new open access paper.