Search Results for author: Wen-Huang Cheng

Found 36 papers, 14 papers with code

Feature-based One-For-All: A Universal Framework for Heterogeneous Knowledge Distillation

no code implementations15 Jan 2025 Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, HongXia Xie, Hong-Han Shuai, Wen-Huang Cheng

Knowledge distillation (KD) involves transferring knowledge from a pre-trained heavy teacher model to a lighter student model, thereby reducing the inference cost while maintaining comparable effectiveness.

Knowledge Distillation

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

1 code implementation2 Sep 2024 Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim, Hyeongseop Rha, Seunghee Han, Wen-Huang Cheng, Yong Man Ro

To address this challenge, speaker adaptive lip reading technologies have advanced by focusing on effectively adapting a lip reading model to target speakers in the visual modality.

Lip Reading Sentence

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

1 code implementation25 Jul 2024 Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Lo, Yi-Ning Huang, Terence Lin, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng

Our model couples Latent Diffusion Models with Visual Language Models to refine the generation process, ensuring precise depictions of HOIs.

Object Text-to-Image Generation

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

no code implementations17 Jul 2024 Yi Yao, Chan-Feng Hsu, Jhe-Hao Lin, HongXia Xie, Terence Lin, Yi-Ning Huang, Hong-Han Shuai, Wen-Huang Cheng

In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data.

Diversity Scene Generation +1

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

2 code implementations9 Jun 2024 Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention.

Contrastive Learning Denoising +2

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

1 code implementation CVPR 2024 HongXia Xie, Chu-Jun Peng, Yu-Wen Tseng, Hung-Jen Chen, Chan-Feng Hsu, Hong-Han Shuai, Wen-Huang Cheng

Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions.

Emotion Classification Emotion Recognition

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

no code implementations8 Apr 2024 Hou-I Liu, Marco Galindo, HongXia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Hui Li, Wen-Huang Cheng

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing.

Deep Learning Survey

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

2 code implementations4 Apr 2024 Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, Wen-Huang Cheng

DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries.

Object object-detection +1

Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing

no code implementations CVPR 2024 Ling Lo, Cheng Yu Yeo, Hong-Han Shuai, Wen-Huang Cheng

To address the concerns we propose an image immunization approach named semantic attack to protect our images from being manipulated by malicious agents using diffusion models.

Denoising Image Inpainting +1

MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text

no code implementations31 Jul 2023 Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo

In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.

Video Generation

Size Does Matter: Size-aware Virtual Try-on via Clothing-oriented Transformation Try-on Network

1 code implementation ICCV 2023 Chieh-Yun Chen, Yi-Chung Chen, Hong-Han Shuai, Wen-Huang Cheng

COTTON leverages clothing structure with landmarks and segmentation to design a novel landmark-guided transformation for precisely deforming clothes, allowing for size adjustment during try-on.

Virtual Try-on

Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency Representation Learning

no code implementations1 Oct 2021 Chun-Wei Yang, Thanh-Hai Phung, Hong-Han Shuai, Wen-Huang Cheng

To automate the monitoring process, one of the promising solutions is to leverage existing object detection models to detect the faces with or without masks.

object-detection Object Detection +2

Technical Report for Valence-Arousal Estimation in ABAW2 Challenge

no code implementations8 Jul 2021 Hong-Xia Xie, I-Hsuan Li, Ling Lo, Hong-Han Shuai, Wen-Huang Cheng

In this work, we describe our method for tackling the valence-arousal estimation challenge from ABAW2 ICCV-2021 Competition.

Arousal Estimation

Multimodal Deep Learning Framework for Image Popularity Prediction on Social Media

no code implementations18 May 2021 Fatma S. Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, Yu Tsao

In this study, motivated by multimodal learning, which uses information from various modalities, and the current success of convolutional neural networks (CNNs) in various fields, we propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image by incorporating various types of visual and social features into a unified network model.

Image popularity prediction Multimodal Deep Learning

Template-Free Try-on Image Synthesis via Semantic-guided Optimization

no code implementations6 Feb 2021 Chien-Lung Chou, Chieh-Yun Chen, Chia-Wei Hsieh, Hong-Han Shuai, Jiaying Liu, Wen-Huang Cheng

Afterward, given an in-shop clothing image, a user image, and a synthesized pose, we propose a novel model for synthesizing a human try-on image with the target clothing in the best fitting pose.

Image Generation Virtual Try-on

Spatiotemporal Dilated Convolution with Uncertain Matching for Video-based Crowd Estimation

1 code implementation29 Jan 2021 Yu-Jen Ma, Hong-Han Shuai, Wen-Huang Cheng

In this paper, we propose a novel SpatioTemporal convolutional Dense Network (STDNet) to address the video-based crowd counting problem, which contains the decomposition of 3D convolution and the 3D spatiotemporal dilated dense convolution to alleviate the rapid growth of the model size caused by the Conv3D layer.

Crowd Counting

Naturalistic Physical Adversarial Patch for Object Detectors

1 code implementation ICCV 2021 Yu-Chih-Tuan Hu, Bo-Han Kung, Daniel Stanley Tan, Jun-Cheng Chen, Kai-Lung Hua, Wen-Huang Cheng

Most prior works on physical adversarial attacks mainly focus on the attack performance but seldom enforce any restrictions over the appearance of the generated adversarial patches.

Generative Adversarial Network Object

FashionMirror: Co-Attention Feature-Remapping Virtual Try-On With Sequential Template Poses

1 code implementation ICCV 2021 Chieh-Yun Chen, Ling Lo, Pin-Jui Huang, Hong-Han Shuai, Wen-Huang Cheng

In the second stage, we first remove the clothes on the source human via the removed mask and warp the clothing features conditioning on the try-on clothing mask to fit the next frame human.

Segmentation Semantic Segmentation +1

Fashion Meets Computer Vision: A Survey

no code implementations31 Mar 2020 Wen-Huang Cheng, Sijie Song, Chieh-Yun Chen, Shintami Chusnul Hidayati, Jiaying Liu

Fashion is the way we present ourselves to the world and has become one of the world's largest industries.

Attribute Fashion Synthesis +3

SMP Challenge: An Overview of Social Media Prediction Challenge 2019

no code implementations4 Oct 2019 Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, Jiebo Luo

In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.)

Multimedia recommendation

Joint Enhancement and Denoising Method via Sequential Decomposition

1 code implementation23 Apr 2018 Xutong Ren, Mading Li, Wen-Huang Cheng, Jiaying Liu

Many low-light enhancement methods ignore intensive noise in original images.

Denoising

Sequential Prediction of Social Media Popularity with Deep Temporal Context Networks

1 code implementation12 Dec 2017 Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Qiushi Huang, Jintao Li, Tao Mei

With a joint embedding network, we obtain a unified deep representation of multi-modal user-post data in a common embedding space.

Social Media Popularity Prediction

Time Matters: Multi-scale Temporalization of Social Media Popularity

no code implementations12 Dec 2017 Bo Wu, Wen-Huang Cheng, Yongdong Zhang, Tao Mei

We evaluate our approach on two large-scale Flickr image datasets with over 1. 8 million photos in total, for the task of popularity prediction.

Social Media Popularity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.