Search Results for author: Mu Cai

Found 11 papers, 6 papers with code

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

no code implementations22 Mar 2024 Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

Based on this, we propose PruMerge, a novel adaptive visual token reduction approach, which largely reduces the number of visual tokens while maintaining comparable model performance.

Language Modelling Large Language Model +2

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

1 code implementation20 Feb 2024 Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.

counterfactual Data Augmentation +2

Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code implementations1 Dec 2023 Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Visual Commonsense Reasoning Visual Prompting

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

1 code implementation ICCV 2023 Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain.

Domain Generalization Knowledge Distillation +2

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

no code implementations19 Sep 2023 Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM).

Image Classification Language Modelling +1

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

no code implementations9 Jun 2023 Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components.

Image Classification In-Context Learning +2

Out-of-distribution Detection via Frequency-regularized Generative Models

1 code implementation18 Aug 2022 Mu Cai, Yixuan Li

In particular, generative models are shown to overly rely on the background information to estimate the likelihood.

Image Generation Out-of-Distribution Detection +1

Masked Discrimination for Self-Supervised Learning on Point Clouds

1 code implementation21 Mar 2022 Haotian Liu, Mu Cai, Yong Jae Lee

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.

3D Shape Classification Binary Classification +4

VOS: Learning What You Don't Know by Virtual Outlier Synthesis

1 code implementation2 Feb 2022 Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li

In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training.

object-detection Object Detection +1

Towards Unknown-aware Learning with Virtual Outlier Synthesis

no code implementations ICLR 2022 Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li

In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training.

object-detection Object Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.