no code implementations • 15 Jan 2025 • Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, HongXia Xie, Hong-Han Shuai, Wen-Huang Cheng
Knowledge distillation (KD) involves transferring knowledge from a pre-trained heavy teacher model to a lighter student model, thereby reducing the inference cost while maintaining comparable effectiveness.
1 code implementation • 16 Dec 2024 • Yu-Hsuan Huang, Ling Lo, HongXia Xie, Hong-Han Shuai, Wen-Huang Cheng
Sequential recommendation (SR) systems predict user preferences by analyzing time-ordered interaction sequences.
1 code implementation • 2 Sep 2024 • Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro
To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality.
1 code implementation • 25 Jul 2024 • Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Lo, Yi-Ning Huang, Terence Lin, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng
Our model couples Latent Diffusion Models with Visual Language Models to refine the generation process, ensuring precise depictions of HOIs.
no code implementations • 17 Jul 2024 • Yi Yao, Chan-Feng Hsu, Jhe-Hao Lin, HongXia Xie, Terence Lin, Yi-Ning Huang, Hong-Han Shuai, Wen-Huang Cheng
In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data.
2 code implementations • 9 Jun 2024 • Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng
Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention.
Ranked #1 on Object Detection on AI-TOD
no code implementations • 17 May 2024 • Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo
The research progress analysis provides an overall analysis of the solutions and trends in recent years.
1 code implementation • 10 May 2024 • Rong Chao, Wen-Huang Cheng, Moreno La Quatra, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Szu-Wei Fu, Yu Tsao
This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task.
Ranked #4 on Speech Enhancement on VoiceBank + DEMAND
1 code implementation • CVPR 2024 • HongXia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng
Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions.
no code implementations • 8 Apr 2024 • Hou-I Liu, Marco Galindo, HongXia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Hui Li, Wen-Huang Cheng
Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing.
no code implementations • 7 Apr 2024 • Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng
Subsequently, we introduce the cross-modal residual distillation to transfer the 3D spatial cues.
2 code implementations • 4 Apr 2024 • Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, Wen-Huang Cheng
DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries.
no code implementations • CVPR 2024 • Ling Lo, Cheng Yu Yeo, Hong-Han Shuai, Wen-Huang Cheng
To address the concerns we propose an image immunization approach named semantic attack to protect our images from being manipulated by malicious agents using diffusion models.
no code implementations • 31 Jul 2023 • Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo
In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.
no code implementations • 12 Jun 2023 • Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu
To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process.
1 code implementation • ICCV 2023 • Chieh-Yun Chen, Yi-Chung Chen, Hong-Han Shuai, Wen-Huang Cheng
COTTON leverages clothing structure with landmarks and segmentation to design a novel landmark-guided transformation for precisely deforming clothes, allowing for size adjustment during try-on.
no code implementations • ICCV 2023 • HongXia Xie, Ming-Xian Lee, Tzu-Jui Chen, Hung-Jen Chen, Hou-I Liu, Hong-Han Shuai, Wen-Huang Cheng
Then, the Cross-Patch Attention module is proposed to fuse the features of MIP and global context together to complement each other.
no code implementations • 7 Nov 2022 • Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang, Xiao Sun, HaoDong Wu, Xuncheng Liu, Weizhan Zhang, Caixia Yan, Haipeng Du, Qinghua Zheng, Qi Wang, Wangdu Chen, Ran Duan, Mengdi Sun, Dan Zhu, Guannan Chen, Hojin Cho, Steve Kim, Shijie Yue, Chenghua Li, Zhengyang Zhuge, Wei Chen, Wenxu Wang, Yufeng Zhou, Xiaochen Cai, Hengxing Cai, Kele Xu, Li Liu, Zehua Cheng, Wenyi Lian, Wenjing Lian
While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices.
no code implementations • 7 Jul 2022 • Bo-Kai Ruan, Hong-Han Shuai, Wen-Huang Cheng
Transformers have achieved great success in natural language processing.
no code implementations • 4 Jul 2022 • Sandy Ardianto, Hsueh-Ming Hang, Wen-Huang Cheng
We design a fast car detection and tracking algorithm for traffic monitoring fisheye video mounted on crossroads.
no code implementations • 1 Oct 2021 • Chun-Wei Yang, Thanh-Hai Phung, Hong-Han Shuai, Wen-Huang Cheng
To automate the monitoring process, one of the promising solutions is to leverage existing object detection models to detect the faces with or without masks.
no code implementations • 8 Jul 2021 • Hong-Xia Xie, I-Hsuan Li, Ling Lo, Hong-Han Shuai, Wen-Huang Cheng
In this work, we describe our method for tackling the valence-arousal estimation challenge from ABAW2 ICCV-2021 Competition.
no code implementations • 18 May 2021 • Fatma S. Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, Yu Tsao
In this study, motivated by multimodal learning, which uses information from various modalities, and the current success of convolutional neural networks (CNNs) in various fields, we propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image by incorporating various types of visual and social features into a unified network model.
no code implementations • 8 Mar 2021 • Lei Chen, Shao-En Weng, Chu-Jun Peng, Hong-Han Shuai, Wen-Huang Cheng
Network security has been an active research topic for long.
no code implementations • 6 Feb 2021 • Chien-Lung Chou, Chieh-Yun Chen, Chia-Wei Hsieh, Hong-Han Shuai, Jiaying Liu, Wen-Huang Cheng
Afterward, given an in-shop clothing image, a user image, and a synthesized pose, we propose a novel model for synthesizing a human try-on image with the target clothing in the best fitting pose.
1 code implementation • 29 Jan 2021 • Yu-Jen Ma, Hong-Han Shuai, Wen-Huang Cheng
In this paper, we propose a novel SpatioTemporal convolutional Dense Network (STDNet) to address the video-based crowd counting problem, which contains the decomposition of 3D convolution and the 3D spatiotemporal dilated dense convolution to alleviate the rapid growth of the model size caused by the Conv3D layer.
2 code implementations • 21 Jan 2021 • Edwin Arkel Rios, Wen-Huang Cheng, Bo-Cheng Lai
In this work we tackle the challenging problem of anime character recognition.
1 code implementation • ICCV 2021 • Yu-Chih-Tuan Hu, Bo-Han Kung, Daniel Stanley Tan, Jun-Cheng Chen, Kai-Lung Hua, Wen-Huang Cheng
Most prior works on physical adversarial attacks mainly focus on the attack performance but seldom enforce any restrictions over the appearance of the generated adversarial patches.
1 code implementation • ICCV 2021 • Chieh-Yun Chen, Ling Lo, Pin-Jui Huang, Hong-Han Shuai, Wen-Huang Cheng
In the second stage, we first remove the clothes on the source human via the removed mask and warp the clothing features conditioning on the try-on clothing mask to fit the next frame human.
no code implementations • 21 Dec 2020 • Hong-Xia Xie, Ling Lo, Hong-Han Shuai, Wen-Huang Cheng
Facial micro-expressions indicate brief and subtle facial movements that appear during emotional communication.
no code implementations • 19 Apr 2020 • Ling Lo, Hong-Xia Xie, Hong-Han Shuai, Wen-Huang Cheng
Micro-Expression (ME) is the spontaneous, involuntary movement of a face that can reveal the true feeling.
no code implementations • 31 Mar 2020 • Wen-Huang Cheng, Sijie Song, Chieh-Yun Chen, Shintami Chusnul Hidayati, Jiaying Liu
Fashion is the way we present ourselves to the world and has become one of the world's largest industries.
no code implementations • 4 Oct 2019 • Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, Jiebo Luo
In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.)
1 code implementation • 23 Apr 2018 • Xutong Ren, Mading Li, Wen-Huang Cheng, Jiaying Liu
Many low-light enhancement methods ignore intensive noise in original images.
1 code implementation • 12 Dec 2017 • Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei
With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space.
no code implementations • 12 Dec 2017 • Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Tao Mei
We evaluate our approach on two large-scale Flickr image datasets with over 1. 8 million photos in total, for the task of popularity prediction.