no code implementations • 13 Jan 2025 • Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran
Given that video segmentation is a dense prediction task, we find preserving the spatial structure of the memories is essential so that the queries are split into global-level and patch-level groups.
no code implementations • 18 Dec 2024 • Shengbang Tong, David Fan, Jiachen Zhu, Yunyang Xiong, Xinlei Chen, Koustuv Sinha, Michael Rabbat, Yann Lecun, Saining Xie, Zhuang Liu
Our results suggest that LLMs may have strong "prior" vision capabilities that can be efficiently adapted to both visual understanding and generation with a relatively simple instruction tuning process.
1 code implementation • 28 Nov 2024 • Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra
The high computation complexity of multistage image encoder and memory module has limited its applications in real-world tasks, e. g., video object segmentation on mobile devices.
1 code implementation • 22 Oct 2024 • Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny, Vikas Chandra
Given a light-weight LLM, our LongVU also scales effectively into a smaller size with state-of-the-art video understanding performance.
1 code implementation • 14 Oct 2024 • Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber
To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
no code implementations • 27 May 2024 • Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie, Pietro Astolfi, Reyhane Askari Hemmat, Jun Chen, Kushal Tirumala, Rim Assouel, Mazda Moayeri, Arjang Talattof, Kamalika Chaudhuri, Zechun Liu, Xilun Chen, Quentin Garrido, Karen Ullrich, Aishwarya Agrawal, Kate Saenko, Asli Celikyilmaz, Vikas Chandra
Then, we present and discuss approaches to evaluate VLMs.
3 code implementations • 22 Feb 2024 • Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra
The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0. 7%/0. 8% than MobileLLM 125M/350M.
no code implementations • 11 Dec 2023 • Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra
A common user expectation is that a click on a specific part of an object will result in the segmentation of the entire object.
1 code implementation • CVPR 2024 • Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.
Ranked #3 on Zero-Shot Instance Segmentation on LVIS v1.0 val
2 code implementations • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny
Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.
1 code implementation • 8 Jun 2023 • Ganesh Jawahar, Haichuan Yang, Yunyang Xiong, Zechun Liu, Dilin Wang, Fei Sun, Meng Li, Aasish Pappu, Barlas Oguz, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Raghuraman Krishnamoorthi, Vikas Chandra
In NLP tasks like machine translation and pre-trained language modeling, there is a significant performance gap between supernet and training from scratch for the same model architecture, necessitating retraining post optimal architecture identification.
1 code implementation • CVPR 2023 • Jinyoung Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, Hyunwoo J. Kim
In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity.
Ranked #4 on 3D Part Segmentation on ShapeNet-Part
no code implementations • 12 Dec 2022 • Lemeng Wu, Dilin Wang, Meng Li, Yunyang Xiong, Raghuraman Krishnamoorthi, Qiang Liu, Vikas Chandra
Fusing 3D LiDAR features with 2D camera features is a promising technique for enhancing the accuracy of 3D detection, thanks to their complementary physical properties.
1 code implementation • CVPR 2023 • Lemeng Wu, Dilin Wang, Chengyue Gong, Xingchao Liu, Yunyang Xiong, Rakesh Ranjan, Raghuraman Krishnamoorthi, Vikas Chandra, Qiang Liu
We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model, outperforming other efficient 3D point cloud generation methods.
3 code implementations • CVPR 2023 • Haoran You, Yunyang Xiong, Xiaoliang Dai, Bichen Wu, Peizhao Zhang, Haoqi Fan, Peter Vajda, Yingyan Celine Lin
Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.
1 code implementation • 13 Oct 2022 • Sanghyeok Lee, Minkyu Jeon, Injae Kim, Yunyang Xiong, Hyunwoo J. Kim
Mixup is a simple and widely-used data augmentation technique that has proven effective in alleviating the problems of overfitting and data scarcity.
Ranked #42 on 3D Part Segmentation on ShapeNet-Part
1 code implementation • 18 Nov 2021 • Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh
In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity of such models to linear.
8 code implementations • 7 Feb 2021 • Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh
The scalability of Nystr\"{o}mformer enables application to longer sequences with thousands of tokens.
Ranked #13 on Semantic Textual Similarity on MRPC (F1 metric)
4 code implementations • CVPR 2021 • Yunyang Xiong, Hanxiao Liu, Suyog Gupta, Berkin Akin, Gabriel Bender, Yongzhe Wang, Pieter-Jan Kindermans, Mingxing Tan, Vikas Singh, Bo Chen
By incorporating regular convolutions in the search space and directly optimizing the network architectures for object detection, we obtain a family of object detection models, MobileDets, that achieve state-of-the-art results across mobile accelerators.
1 code implementation • CVPR 2019 • Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh
nature of this data suggests better estimation may be possible if the model explicitly made use of such "repeated measurements" from each user as is commonly done in classical statistical analysis using so-called mixed effects models.
1 code implementation • ICCV 2019 • Yunyang Xiong, Ronak Mehta, Vikas Singh
In the latter case, the optimization is often non-differentiable and also not very amenable to derivative-free optimization methods.
no code implementations • 7 Apr 2019 • Yunyang Xiong, Hyunwoo J. Kim, Varsha Hedau
It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks.
no code implementations • 10 Jun 2018 • Hao Henry Zhou, Yunyang Xiong, Vikas Singh
We provide simple schemes to build Bayesian Neural Networks (BNNs), block by block, inspired by a recent idea of computation skeletons.
1 code implementation • CVPR 2017 • Sathya N. Ravi, Yunyang Xiong, Lopamudra Mukherjee, Vikas Singh
This paper is inspired by a relatively recent work of Seitz and Baker which introduced the so-called Filter Flow model.