no code implementations • 27 May 2024 • Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee
Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning.
no code implementations • 22 Mar 2024 • Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan
In response, we propose PruMerge, a novel adaptive visual token reduction strategy that significantly reduces the number of visual tokens without compromising the performance of LMMs.
1 code implementation • 20 Feb 2024 • Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee
We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.
no code implementations • CVPR 2024 • Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee
Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.
1 code implementation • ICCV 2023 • Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee
Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain.
Ranked #17 on Domain Generalization on PACS
no code implementations • 19 Sep 2023 • Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM).
no code implementations • 9 Jun 2023 • Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee
By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components.
1 code implementation • 18 Aug 2022 • Mu Cai, Yixuan Li
In particular, generative models are shown to overly rely on the background information to estimate the likelihood.
1 code implementation • 21 Mar 2022 • Haotian Liu, Mu Cai, Yong Jae Lee
Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.
Ranked #13 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)
1 code implementation • 2 Feb 2022 • Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li
In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training.
no code implementations • ICLR 2022 • Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li
In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training.
1 code implementation • ICCV 2021 • Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, Gao Huang
Image-to-image translation has been revolutionized with GAN-based methods.