Search Results for author: Dilin Wang

Found 32 papers, 15 papers with code

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

no code implementations • 20 Feb 2024 • Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan

MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time.

3D Object Reconstruction 3D Reconstruction +2

Paper
Add Code

Taming Mode Collapse in Score Distillation for Text-to-3D Generation

no code implementations • 31 Dec 2023 • Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

In this paper, we reveal that the existing score distillation-based text-to-3D generation frameworks degenerate to maximal likelihood seeking on each view independently and thus suffer from the mode collapse problem, manifesting as the Janus artifact in practice.

3D Generation Prompt Engineering +1

Paper
Add Code

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

no code implementations • 31 Dec 2023 • Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance.

Text to 3D

Paper
Add Code

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

1 code implementation • 1 Dec 2023 • Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.

Ranked #3 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Image Classification Instance Segmentation +5

1,751

Paper
Code

Pose-Free Generalizable Rendering Transformer

no code implementations • 5 Oct 2023 • Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang

To address this challenge, we introduce PF-GRT, a new Pose-Free framework for Generalizable Rendering Transformer, eliminating the need for pre-computed camera poses and instead leveraging feature-matching learned directly from data.

Generalizable Novel View Synthesis Novel View Synthesis

Paper
Add Code

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

no code implementations • 5 Sep 2023 • Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra

Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

no code implementations • 8 Jun 2023 • Ganesh Jawahar, Haichuan Yang, Yunyang Xiong, Zechun Liu, Dilin Wang, Fei Sun, Meng Li, Aasish Pappu, Barlas Oguz, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Raghuraman Krishnamoorthi, Vikas Chandra

In addition, the proposed method achieves the SOTA performance in NAS for building fast machine translation models, yielding better latency-BLEU tradeoff compared to HAT, state-of-the-art NAS for MT.

Language Modelling Machine Translation +2

Paper
Add Code

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion

no code implementations • 12 Dec 2022 • Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra

Fusing 3D LiDAR features with 2D camera features is a promising technique for enhancing the accuracy of 3D detection, thanks to their complementary physical properties.

Paper
Add Code

Fast Point Cloud Generation with Straight Flows

1 code implementation • CVPR 2023 • Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu

We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods.

Point Cloud Completion

Paper
Code

Temporally Consistent Online Depth Estimation in Dynamic Scenes

no code implementations • 17 Nov 2021 • Zhaoshuo Li, Wei Ye, Dilin Wang, Francis X. Creighton, Russell H. Taylor, Ganesh Venkatesh, Mathias Unberath

We present a framework named Consistent Online Dynamic Depth (CODD) to produce temporally consistent depth estimates in dynamic scenes in an online setting.

Stereo Depth Estimation

Paper
Add Code

Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

1 code implementation • CVPR 2022 • Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan

Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs.

Ranked #24 on Semantic Segmentation on Cityscapes val

Image Classification Representation Learning +3

174

Paper
Code

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

no code implementations • 15 Oct 2021 • Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

no code implementations • 7 Oct 2021 • Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer

This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution.

speech-recognition Speech Recognition

Paper
Add Code

NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training

1 code implementation • ICLR 2022 • Chengyue Gong, Dilin Wang, Meng Li, Xinlei Chen, Zhicheng Yan, Yuandong Tian, Qiang Liu, Vikas Chandra

In this work, we observe that the poor performance is due to a gradient conflict issue: the gradients of different sub-networks conflict with that of the supernet more severely in ViTs than CNNs, which leads to early saturation in training and inferior convergence.

Ranked #7 on Neural Architecture Search on ImageNet

Data Augmentation Image Classification +2

Paper
Code

Noisy Training Improves E2E ASR for the Edge

no code implementations • 9 Jul 2021 • Dilin Wang, Yuan Shangguan, Haichuan Yang, Pierce Chuang, Jiatong Zhou, Meng Li, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

We apply noisy training to improve both dense and sparse state-of-the-art Emformer models and observe consistent WER reduction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Vision Transformers with Patch Diversification

1 code implementation • 26 Apr 2021 • Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, Qiang Liu

To alleviate this problem, in this work, we introduce novel loss functions in vision transformer training to explicitly encourage diversity across patch representations for more discriminative feature extraction.

Ranked #19 on Semantic Segmentation on Cityscapes val

Image Classification Semantic Segmentation

Paper
Code

AlphaNet: Improved Training of Supernets with Alpha-Divergence

2 code implementations • 16 Feb 2021 • Dilin Wang, Chengyue Gong, Meng Li, Qiang Liu, Vikas Chandra

Weight-sharing NAS builds a supernet that assembles all the architectures as its sub-networks and jointly trains the supernet with the sub-networks.

Ranked #12 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

103

Paper
Code

KeepAugment: A Simple Information-Preserving Data Augmentation Approach

1 code implementation • CVPR 2021 • Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, Qiang Liu

Data augmentation (DA) is an essential technique for training state-of-the-art deep learning systems.

Data Augmentation General Classification +3

Paper
Code

AlphaMatch: Improving Consistency for Semi-supervised Learning with Alpha-divergence

no code implementations • CVPR 2021 • Chengyue Gong, Dilin Wang, Qiang Liu

Semi-supervised learning (SSL) is a key approach toward more data-efficient machine learning by jointly leverage both labeled and unlabeled data.

Paper
Add Code

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

2 code implementations • CVPR 2021 • Dilin Wang, Meng Li, Chengyue Gong, Vikas Chandra

Our discovered model family, AttentiveNAS models, achieves top-1 accuracy from 77. 3% to 80. 7% on ImageNet, and outperforms SOTA models, including BigNAS and Once-for-All networks.

Ranked #21 on Neural Architecture Search on ImageNet

Neural Architecture Search

103

Paper
Code

Stein Variational Gradient Descent With Matrix-Valued Kernels

1 code implementation • NeurIPS 2019 • Dilin Wang, Ziyang Tang, Chandrajit Bajaj, Qiang Liu

Stein variational gradient descent (SVGD) is a particle-based inference algorithm that leverages gradient information for efficient approximate inference.

Bayesian Inference

Paper
Code

Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent

1 code implementation • ICLR 2020 • Dilin Wang, Meng Li, Lemeng Wu, Vikas Chandra, Qiang Liu

Designing energy-efficient networks is of critical importance for enabling state-of-the-art deep learning in mobile and edge settings where the computation and energy budgets are highly limited.

Paper
Code

Splitting Steepest Descent for Growing Neural Architectures

1 code implementation • NeurIPS 2019 • Qiang Liu, Lemeng Wu, Dilin Wang

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs.

Paper
Code

Improving Neural Language Modeling via Adversarial Training

1 code implementation • 10 Jun 2019 • Dilin Wang, Chengyue Gong, Qiang Liu

Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models.

Ranked #5 on Language Modelling on Penn Treebank (Word Level)

Language Modelling Machine Translation +1

Paper
Code

Variational Inference with Tail-adaptive f-Divergence

1 code implementation • NeurIPS 2018 • Dilin Wang, Hao liu, Qiang Liu

Variational inference with {\alpha}-divergences has been widely used in modern probabilistic machine learning.

Variational Inference

Paper
Code

Stein Variational Gradient Descent as Moment Matching

no code implementations • NeurIPS 2018 • Qiang Liu, Dilin Wang

Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest.

Paper
Add Code

Stein Variational Message Passing for Continuous Graphical Models

no code implementations • ICML 2018 • Dilin Wang, Zhe Zeng, Qiang Liu

We propose a novel distributed inference algorithm for continuous graphical models, by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest.

Paper
Add Code

Learning to Draw Samples with Amortized Stein Variational Gradient Descent

no code implementations • 20 Jul 2017 • Yihao Feng, Dilin Wang, Qiang Liu

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference.

Bayesian Inference

Paper
Add Code

Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE

no code implementations • 4 Jul 2017 • Qiang Liu, Dilin Wang

We propose a number of new algorithms for learning deep energy models and demonstrate their properties.

Paper
Add Code

Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning

1 code implementation • 6 Nov 2016 • Dilin Wang, Qiang Liu

We propose a simple algorithm to train stochastic neural networks to draw samples from given target distributions for probabilistic inference.

Ranked #19 on Conditional Image Generation on CIFAR-10 (Inception score metric)

Conditional Image Generation

Paper
Code

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

13 code implementations • NeurIPS 2016 • Qiang Liu, Dilin Wang

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization.

Bayesian Inference Variational Inference

8,363

Paper
Code

Entity Disambiguation by Knowledge and Text Jointly Embedding

no code implementations • CONLL 2016 • Wei Fang, Jianwen Zhang, Dilin Wang, Zheng Chen, Ming Li

Entity Disambiguation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.