Search Results for author: Yunyang Xiong

Found 24 papers, 17 papers with code

EdgeTAM: On-Device Track Anything Model

no code implementations13 Jan 2025 Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran

Given that video segmentation is a dense prediction task, we find preserving the spatial structure of the memories is essential so that the queries are split into global-level and patch-level groups.

Video Segmentation Video Semantic Segmentation

MetaMorph: Multimodal Understanding and Generation via Instruction Tuning

no code implementations18 Dec 2024 Shengbang Tong, David Fan, Jiachen Zhu, Yunyang Xiong, Xinlei Chen, Koustuv Sinha, Michael Rabbat, Yann Lecun, Saining Xie, Zhuang Liu

Our results suggest that LLMs may have strong "prior" vision capabilities that can be efficiently adapted to both visual understanding and generation with a relatively simple instruction tuning process.

Instruction Following MORPH +1

Efficient Track Anything

1 code implementation28 Nov 2024 Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra

The high computation complexity of multistage image encoder and memory module has limited its applications in real-world tasks, e. g., video object segmentation on mobile devices.

Object Segmentation +4

EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything

1 code implementation CVPR 2024 Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra

On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.

Decoder Image Classification +6

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

2 code implementations14 Oct 2023 Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Image Classification Language Modeling +9

Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts

1 code implementation8 Jun 2023 Ganesh Jawahar, Haichuan Yang, Yunyang Xiong, Zechun Liu, Dilin Wang, Fei Sun, Meng Li, Aasish Pappu, Barlas Oguz, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Raghuraman Krishnamoorthi, Vikas Chandra

In NLP tasks like machine translation and pre-trained language modeling, there is a significant performance gap between supernet and training from scratch for the same model architecture, necessitating retraining post optimal architecture identification.

Language Modeling Language Modelling +3

Self-positioning Point-based Transformer for Point Cloud Understanding

1 code implementation CVPR 2023 Jinyoung Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, Hyunwoo J. Kim

In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity.

3D Part Segmentation Scene Segmentation +1

PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion

no code implementations12 Dec 2022 Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra

Fusing 3D LiDAR features with 2D camera features is a promising technique for enhancing the accuracy of 3D detection, thanks to their complementary physical properties.

Fast Point Cloud Generation with Straight Flows

1 code implementation CVPR 2023 Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu

We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods.

Point Cloud Completion

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

3 code implementations CVPR 2023 Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Celine Lin

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Efficient ViTs

SageMix: Saliency-Guided Mixup for Point Clouds

1 code implementation13 Oct 2022 Sanghyeok Lee, Minkyu Jeon, Injae Kim, Yunyang Xiong, Hyunwoo J. Kim

Mixup is a simple and widely-used data augmentation technique that has proven effective in alleviating the problems of overfitting and data scarcity.

3D Part Segmentation 3D Point Cloud Classification +3

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

1 code implementation18 Nov 2021 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh

In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity of such models to linear.

MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

4 code implementations CVPR 2021 Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen

By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.

Neural Architecture Search Object +2

Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation

1 code implementation CVPR 2019 Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh

nature of this data suggests better estimation may be possible if the model explicitly made use of such "repeated measurements" from each user as is commonly done in classical statistical analysis using so-called mixed effects models.

Gaze Estimation

ANTNets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification

no code implementations7 Apr 2019 Yunyang Xiong, Hyunwoo J. Kim, Varsha Hedau

It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks.

Classification General Classification +1

Building Bayesian Neural Networks with Blocks: On Structure, Interpretability and Uncertainty

no code implementations10 Jun 2018 Hao Henry Zhou, Yunyang Xiong, Vikas Singh

We provide simple schemes to build Bayesian Neural Networks (BNNs), block by block, inspired by a recent idea of computation skeletons.

Gaussian Processes Variational Inference

Filter Flow Made Practical: Massively Parallel and Lock-Free

1 code implementation CVPR 2017 Sathya N. Ravi, Yunyang Xiong, Lopamudra Mukherjee, Vikas Singh

This paper is inspired by a relatively recent work of Seitz and Baker which introduced the so-called Filter Flow model.

Optical Flow Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.