Search Results for author: Mu Cai

Found 12 papers, 6 papers with code

Matryoshka Multimodal Models

no code implementations • 27 May 2024 • Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee

Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning.

Paper
Add Code

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

no code implementations • 22 Mar 2024 • Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

In response, we propose PruMerge, a novel adaptive visual token reduction strategy that significantly reduces the number of visual tokens without compromising the performance of LMMs.

Language Modelling Large Language Model +4

Paper
Add Code

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

1 code implementation • 20 Feb 2024 • Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.

counterfactual Data Augmentation +2

Paper
Code

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code implementations • CVPR 2024 • Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Visual Commonsense Reasoning Visual Prompting

Paper
Add Code

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

1 code implementation • ICCV 2023 • Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain.

Ranked #17 on Domain Generalization on PACS

Domain Generalization Knowledge Distillation +2

Paper
Code

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

no code implementations • 19 Sep 2023 • Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM).

Image Classification Language Modelling +1

Paper
Add Code

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

no code implementations • 9 Jun 2023 • Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components.

Image Classification In-Context Learning +2

Paper
Add Code

Out-of-distribution Detection via Frequency-regularized Generative Models

1 code implementation • 18 Aug 2022 • Mu Cai, Yixuan Li

In particular, generative models are shown to overly rely on the background information to estimate the likelihood.

Image Generation Out-of-Distribution Detection +1

Paper
Code

Masked Discrimination for Self-Supervised Learning on Point Clouds

1 code implementation • 21 Mar 2022 • Haotian Liu, Mu Cai, Yong Jae Lee

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.

Ranked #13 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)

3D Shape Classification Binary Classification +4

Paper
Code

VOS: Learning What You Don't Know by Virtual Outlier Synthesis

1 code implementation • 2 Feb 2022 • Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li

In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training.

object-detection Object Detection +1

303

Paper
Code

Towards Unknown-aware Learning with Virtual Outlier Synthesis

no code implementations • ICLR 2022 • Xuefeng Du, Zhaoning Wang, Mu Cai, Yixuan Li

In this paper, we present VOS, a novel framework for OOD detection by adaptively synthesizing virtual outliers that can meaningfully regularize the model's decision boundary during training.

object-detection Object Detection +1

Paper
Add Code

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

1 code implementation • ICCV 2021 • Mu Cai, Hong Zhang, Huijuan Huang, Qichuan Geng, Yixuan Li, Gao Huang

Image-to-image translation has been revolutionized with GAN-based methods.

Image-to-Image Translation Translation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.