no code implementations • 13 Dec 2024 • Yu-Jhe Li, Xinyang Zhang, Kun Wan, Lantao Yu, Ajinkya Kale, Xin Lu
To overcome this challenge, existing methods often use multi-modal models like CLIP, which combine image and text features in a shared embedding space to bridge the gap between limited and extensive vocabulary recognition, resulting in a two-stage approach: In the first stage, a mask generator takes an input image to generate mask proposals, and the in the second stage the target mask is picked based on the query.
no code implementations • 26 Nov 2024 • Shijian Deng, Wentian Zhao, Yu-Jhe Li, Kun Wan, Daniel Miranda, Ajinkya Kale, Yapeng Tian
Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness.
1 code implementation • 8 Oct 2024 • Zeliang Zhang, Phu Pham, Wentian Zhao, Kun Wan, Yu-Jhe Li, Jianing Zhou, Daniel Miranda, Ajinkya Kale, Chenliang Xu
In this study, we investigate the redundancy in visual computation at both the parameter and computational pattern levels within LLaVA, a representative MLLM, and introduce a suite of streamlined strategies to enhance efficiency.
1 code implementation • CVPR 2024 • Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera
We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS).
no code implementations • 7 Nov 2023 • Xingzhe He, Zhiwen Cao, Nicholas Kolkin, Lantao Yu, Kun Wan, Helge Rhodin, Ratheesh Kalarot
This strategy enables the model to preserve fine details of the desired subjects, such as text and logos.
no code implementations • ICLR 2019 • Boyuan Feng, Kun Wan, Shu Yang, Yufei Ding
Convolutional Neural Networks (CNNs) have achieved tremendous success for many computer vision tasks, which shows a promising perspective of deploying CNNs on mobile platforms.
no code implementations • 26 Jan 2019 • Xiaolei Liu, Xiaosong Zhang, Kun Wan, Qingxin Zhu, Yufei Ding
In this paper, we propose~\textit{weighted-sampling audio adversarial examples}, focusing on the numbers and the weights of distortion to reinforce the attack.
no code implementations • ICLR 2019 • Kun Wan, Boyuan Feng, Shu Yang, Yufei Ding
In this paper, we are the first in the field to consider how to craft an effective sparse kernel design by eliminating the large design space.
1 code implementation • 28 Sep 2018 • Lingwei Xie, Song He, Shu Yang, Boyuan Feng, Kun Wan, Zhongnan Zhang, Xiaochen Bo, Yufei Ding
In this paper, we propose a novel domain-adversarial multi-task framework for integrating shared knowledge from multiple domains.
no code implementations • ICLR 2019 • Kun Wan, Boyuan Feng, Lingwei Xie, Yufei Ding
The insights attained here could potentially be applied as a general approach for boosting the accuracy of other CNN models with similar nonlinear connections.