Search Results for author: Chu-Song Chen

Found 33 papers, 21 papers with code

Relation-Rich Visual Document Generator for Visual Information Extraction

1 code implementation CVPR 2025 Zi-Han Jiang, Chien-Wei Lin, Wei-Hua Li, Hsuan-Tung Liu, Yi-Ren Yeh, Chu-Song Chen

Despite advances in Large Language Models (LLMs) and Multimodal LLMs (MLLMs) for visual document understanding (VDU), visual information extraction (VIE) from relation-rich documents remains challenging due to the layout diversity and limited training data.

Diversity document understanding +4

ACCEPT: Adaptive Codebook for Composite and Efficient Prompt Tuning

1 code implementation10 Oct 2024 Yu-Chen Lin, Wei-Hua Li, Jun-Cheng Chen, Chu-Song Chen

We achieve the superior performance on 17 diverse natural language tasks including natural language understanding (NLU) and question answering (QA) tasks by tuning only 0. 3% of parameters of the PLMs.

Natural Language Understanding parameter-efficient fine-tuning +2

The Great Contradiction Showdown: How Jailbreak and Stealth Wrestle in Vision-Language Models?

no code implementations2 Oct 2024 Ching-Chia Kao, Chia-Mu Yu, Chun-Shien Lu, Chu-Song Chen

Vision-Language Models (VLMs) have achieved remarkable performance on a variety of tasks, yet they remain vulnerable to jailbreak attacks that compromise safety and reliability.

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

1 code implementation1 Sep 2024 Yi-Chia Chen, Wei-Hua Li, Cheng Sun, Yu-Chiang Frank Wang, Chu-Song Chen

We introduce SAM4MLLM, an innovative approach which integrates the Segment Anything Model (SAM) with Multi-Modal Large Language Models (MLLMs) for pixel-aware tasks.

Language Modeling Language Modelling +3

Defending Against Repetitive Backdoor Attacks on Semi-supervised Learning through Lens of Rate-Distortion-Perception Trade-off

1 code implementation14 Jul 2024 Cheng-Yi Lee, Ching-Chia Kao, Cheng-Han Yeh, Chun-Shien Lu, Chia-Mu Yu, Chu-Song Chen

Semi-supervised learning (SSL) has achieved remarkable performance with a small fraction of labeled data by leveraging vast amounts of unlabeled data from the Internet.

Data Poisoning

RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images

1 code implementation14 May 2024 Zong-Wei Hong, Yen-Yang Hung, Chu-Song Chen

In this work, we introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image.

6D Pose Estimation Object +2

D4AM: A General Denoising Framework for Downstream Acoustic Models

1 code implementation28 Nov 2023 Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen

To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Domain-Generalized Face Anti-Spoofing with Unknown Attacks

3 code implementations18 Oct 2023 Zong-Wei Hong, Yu-Chen Lin, Hsuan-Tung Liu, Yi-Ren Yeh, Chu-Song Chen

Although face anti-spoofing (FAS) methods have achieved remarkable performance on specific domains or attack types, few studies have focused on the simultaneous presence of domain changes and unknown attacks, which is closer to real application scenarios.

Domain Generalization Face Anti-Spoofing

Globally Consistent Video Depth and Pose Estimation with Efficient Test-Time Training

1 code implementation4 Aug 2022 Yao-Chih Lee, Kuan-Wei Tseng, Guan-Sheng Chen, Chu-Song Chen

It can improve the robustness of learning-based methods with flow-guided keyframes and well-established depth prior.

Optical Flow Estimation Pose Estimation

STR-GQN: Scene Representation and Rendering for Unknown Cameras Based on Spatial Transformation Routing

no code implementations ICCV 2021 Wen-Cheng Chen, Min-Chun Hu, Chu-Song Chen

The STR mechanism treats the spatial transformation as the message passing process, and the relation between the view poses and the routing weights is modeled by an end-to-end trainable neural network.

Video-based Person Re-identification without Bells and Whistles

1 code implementation22 May 2021 Chih-Ting Liu, Jun-Cheng Chen, Chu-Song Chen, Shao-Yi Chien

Besides, we discover the errors not only for the identity labels of tracklets but also for the evaluation protocol for the test data of MARS.

Video-Based Person Re-Identification

360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales

1 code implementation15 Sep 2020 Ashesh, Chu-Song Chen, Hsuan-Tien Lin

Technically, the gaze information can be inferred from two different magnification levels: face orientation and eye orientation.

Gaze Estimation

Data-specific Adaptive Threshold for Face Recognition and Authentication

2 code implementations26 Oct 2018 Hsin-Rung Chou, Jia-Hong Lee, Yi-Ming Chan, Chu-Song Chen

Many face recognition systems boost the performance using deep learning models, but only a few researches go into the mechanisms for dealing with online registration.

 Ranked #1 on Face Recognition on LFW (Online Open Set) (using extra training data)

Face Recognition

Unifying and Merging Well-trained Deep Neural Networks for Inference Stage

1 code implementation14 May 2018 Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen

We propose a novel method to merge convolutional neural-nets for the inference stage.

Aesthetic Critiques Generation for Photos

no code implementations ICCV 2017 Kuang-Yu Chang, Kung-Hung Lu, Chu-Song Chen

Although aesthetic quality assessment has generated a great deal of interest in the last decade, most studies focus on providing a quality rating of good or bad for an image.

Image Captioning

Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks

no code implementations CVPR 2016 Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie zhou

In this paper, we propose a new unsupervised deep learning approach called DeepBit to learn compact binary descriptor for efficient visual object matching.

Image Retrieval Object +3

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

1 code implementation1 Jul 2015 Huei-Fang Yang, Kevin Lin, Chu-Song Chen

SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets.

Attribute Classification +3

To Know Where We Are: Vision-Based Positioning in Outdoor Environments

no code implementations19 Jun 2015 Kuan-Wen Chen, Chun-Hsin Wang, Xiao Wei, Qiao Liang, Ming-Hsuan Yang, Chu-Song Chen, Yi-Ping Hung

Augmented reality (AR) displays become more and more popular recently, because of its high intuitiveness for humans and high-quality head-mounted display have rapidly developed.

Image Registration Model Compression

Bayesian Fisher's Discriminant for Functional Data

no code implementations9 Dec 2014 Yao-Hsiang Yang, Lu-Hung Chen, Chieh-Chih Wang, Chu-Song Chen

We propose a Bayesian framework of Gaussian process in order to extend Fisher's discriminant to classify functional data such as spectra and images.

Cannot find the paper you are looking for? You can Submit a new open access paper.