Search Results for author: Kun Wan

Found 10 papers, 3 papers with code

Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

no code implementations13 Dec 2024 Yu-Jhe Li, Xinyang Zhang, Kun Wan, Lantao Yu, Ajinkya Kale, Xin Lu

To overcome this challenge, existing methods often use multi-modal models like CLIP, which combine image and text features in a shared embedding space to bridge the gap between limited and extensive vocabulary recognition, resulting in a two-stage approach: In the first stage, a mask generator takes an input image to generate mask proposals, and the in the second stage the target mask is picked based on the query.

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

no code implementations26 Nov 2024 Shijian Deng, Wentian Zhao, Yu-Jhe Li, Kun Wan, Daniel Miranda, Ajinkya Kale, Yapeng Tian

Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness.

Hallucination

Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See

1 code implementation8 Oct 2024 Zeliang Zhang, Phu Pham, Wentian Zhao, Kun Wan, Yu-Jhe Li, Jianing Zhou, Daniel Miranda, Ajinkya Kale, Chenliang Xu

In this study, we investigate the redundancy in visual computation at both the parameter and computational pattern levels within LLaVA, a representative MLLM, and introduce a suite of streamlined strategies to enhance efficiency.

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

1 code implementation CVPR 2024 Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS).

Deep Learning NeRF +2

PCNN: Environment Adaptive Model Without Finetuning

no code implementations ICLR 2019 Boyuan Feng, Kun Wan, Shu Yang, Yufei Ding

Convolutional Neural Networks (CNNs) have achieved tremendous success for many computer vision tasks, which shows a promising perspective of deploying CNNs on mobile platforms.

model Transfer Learning

Weighted-Sampling Audio Adversarial Example Attack

no code implementations26 Jan 2019 Xiaolei Liu, Xiaosong Zhang, Kun Wan, Qingxin Zhu, Yufei Ding

In this paper, we propose~\textit{weighted-sampling audio adversarial examples}, focusing on the numbers and the weights of distortion to reinforce the attack.

Adversarial Attack Automatic Speech Recognition +3

Penetrating the Fog: the Path to Efficient CNN Models

no code implementations ICLR 2019 Kun Wan, Boyuan Feng, Shu Yang, Yufei Ding

In this paper, we are the first in the field to consider how to craft an effective sparse kernel design by eliminating the large design space.

Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds

1 code implementation28 Sep 2018 Lingwei Xie, Song He, Shu Yang, Boyuan Feng, Kun Wan, Zhongnan Zhang, Xiaochen Bo, Yufei Ding

In this paper, we propose a novel domain-adversarial multi-task framework for integrating shared knowledge from multiple domains.

Property Prediction

Reconciling Feature-Reuse and Overfitting in DenseNet with Specialized Dropout

no code implementations ICLR 2019 Kun Wan, Boyuan Feng, Lingwei Xie, Yufei Ding

The insights attained here could potentially be applied as a general approach for boosting the accuracy of other CNN models with similar nonlinear connections.

Cannot find the paper you are looking for? You can Submit a new open access paper.