Search Results for author: Yunyang Xiong

Found 18 papers, 12 papers with code

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

no code implementations • 22 Feb 2024 • Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra

The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0. 7%/0. 8% than MobileLLM 125M/350M.

Paper
Add Code

SqueezeSAM: User friendly mobile interactive segmentation

no code implementations • 11 Dec 2023 • Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra

Finally, when a user clicks on an object, they typically expect all related pieces of the object to be segmented.

Data Augmentation Interactive Segmentation +4

Paper
Add Code

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

1 code implementation • 1 Dec 2023 • Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.

Ranked #3 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Image Classification Instance Segmentation +5

1,725

Paper
Code

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Ranked #10 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +4

24,847

Paper
Code

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

no code implementations • 8 Jun 2023 • Ganesh Jawahar, Haichuan Yang, Yunyang Xiong, Zechun Liu, Dilin Wang, Fei Sun, Meng Li, Aasish Pappu, Barlas Oguz, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Raghuraman Krishnamoorthi, Vikas Chandra

In addition, the proposed method achieves the SOTA performance in NAS for building fast machine translation models, yielding better latency-BLEU tradeoff compared to HAT, state-of-the-art NAS for MT.

Language Modelling Machine Translation +2

Paper
Add Code

Self-positioning Point-based Transformer for Point Cloud Understanding

1 code implementation • CVPR 2023 • Jinyoung Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, Hyunwoo J. Kim

In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity.

Ranked #2 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +1

Paper
Code

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion

no code implementations • 12 Dec 2022 • Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra

Fusing 3D LiDAR features with 2D camera features is a promising technique for enhancing the accuracy of 3D detection, thanks to their complementary physical properties.

Paper
Add Code

Fast Point Cloud Generation with Straight Flows

1 code implementation • CVPR 2023 • Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu

We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods.

Point Cloud Completion

Paper
Code

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

1 code implementation • CVPR 2023 • Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Efficient ViTs

Paper
Code

SageMix: Saliency-Guided Mixup for Point Clouds

1 code implementation • 13 Oct 2022 • Sanghyeok Lee, Minkyu Jeon, Injae Kim, Yunyang Xiong, Hyunwoo J. Kim

Mixup is a simple and widely-used data augmentation technique that has proven effective in alleviating the problems of overfitting and data scarcity.

Ranked #40 on 3D Part Segmentation on ShapeNet-Part

3D Part Segmentation 3D Point Cloud Classification +3

Paper
Code

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

1 code implementation • 18 Nov 2021 • Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh

In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity of such models to linear.

Paper
Code

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

6 code implementations • 7 Feb 2021 • Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh

The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens.

Ranked #13 on Semantic Textual Similarity on MRPC (F1 metric)

Natural Language Inference Question Answering +2

7,508

Paper
Code

MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

4 code implementations • CVPR 2021 • Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen

By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.

Neural Architecture Search Object +2

76,579

Paper
Code

Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation

1 code implementation • CVPR 2019 • Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh

nature of this data suggests better estimation may be possible if the model explicitly made use of such "repeated measurements" from each user as is commonly done in classical statistical analysis using so-called mixed effects models.

Gaze Estimation

Paper
Code

Resource Constrained Neural Network Architecture Search: Will a Submodularity Assumption Help?

1 code implementation • ICCV 2019 • Yunyang Xiong, Ronak Mehta, Vikas Singh

In the latter case, the optimization is often non-differentiable and also not very amenable to derivative-free optimization methods.

Neural Architecture Search

Paper
Code

ANTNets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification

no code implementations • 7 Apr 2019 • Yunyang Xiong, Hyunwoo J. Kim, Varsha Hedau

It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks.

Classification General Classification +1