Search Results for author: Yunsheng Wu

Found 20 papers, 11 papers with code

Enabling Deep Residual Networks for Weakly Supervised Object Detection

no code implementations ECCV 2020 Yunhang Shen, Rongrong Ji, Yan Wang, Zhiwei Chen, Feng Zheng, Feiyue Huang, Yunsheng Wu

Weakly supervised object detection (WSOD) has attracted extensive research attention due to its great flexibility of exploiting large-scale image-level annotation for detector training.

Object object-detection +1

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

no code implementations24 Nov 2024 Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, Boyu Wang

Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target contents.

MobileMamba: Lightweight Multi-Receptive Visual Mamba Network

1 code implementation24 Nov 2024 Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Lei Xie

CNNs, with their local receptive fields, struggle to capture long-range dependencies, while Transformers, despite their global modeling capabilities, are limited by quadratic computational complexity in high-resolution scenarios.

Mamba State Space Models

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

1 code implementation15 Nov 2024 Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Chengming Xu, Jinlong Peng, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Yanwei Fu

Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios.

Virtual Try-on

SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model

no code implementations5 Sep 2024 Weipeng Tan, Chuming Lin, Chengming Xu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yunsheng Wu, Yanwei Fu

Specifically, we first introduce the novel probabilistic style prior learning to model the intrinsic style as a Gaussian distribution using facial expressions and audio embedding.

Diversity Talking Head Generation

VITA: Towards Open-Source Interactive Omni Multimodal LLM

1 code implementation9 Aug 2024 Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Shaoqi Dong, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun

The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas.

Language Modeling Language Modelling +3

Oracle Bone Inscriptions Multi-modal Dataset

no code implementations4 Jul 2024 Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.

Decipherment Denoising

SlerpFace: Face Template Protection via Spherical Linear Interpolation

no code implementations3 Jul 2024 Zhizhou Zhong, Yuxi Mi, Yuge Huang, Jianqing Xu, Guodong Mu, Shouhong Ding, Jingyun Zhang, rizen guo, Yunsheng Wu, Shuigeng Zhou

Based on studies of the diffusion model's generative capability, this paper proposes a defense by rotating templates to a noise-like distribution.

Face Recognition

DF40: Toward Next-Generation Deepfake Detection

1 code implementation19 Jun 2024 Zhiyuan Yan, Taiping Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Chengjie Wang, Shouhong Ding, Yunsheng Wu, Li Yuan

In this work, we found the dataset (both train and test) can be the "primary culprit" due to: (1) forgery diversity: Deepfake techniques are commonly referred to as both face forgery and entire image synthesis.

DeepFake Detection Face Reenactment +2

Deepfake Generation and Detection: A Benchmark and Survey

1 code implementation26 Mar 2024 Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.

Attribute Face Reenactment +3

Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

no code implementations19 Feb 2024 Xuelin Qian, Yu Wang, Simian Luo, yinda zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, xiangyang xue, Bo Zhao, Tiejun Huang, Yunsheng Wu, Yanwei Fu

In this paper, we extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.

3D Generation 3D Shape Generation +1

Aligning and Prompting Everything All at Once for Universal Visual Perception

2 code implementations CVPR 2024 Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.

Object object-detection +6

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

1 code implementation16 Aug 2023 Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, Yunsheng Wu

We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations.

Memorization Retrieval

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

3 code implementations23 Jun 2023 Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modeling +6

DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning

1 code implementation28 May 2019 Xiawu Zheng, Chenyi Yang, Shaokun Zhang, Yan Wang, Baochang Zhang, Yongjian Wu, Yunsheng Wu, Ling Shao, Rongrong Ji

With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints, which is practical for on-device models across diverse search spaces and constraints.

Neural Architecture Search

Towards Optimal Discrete Online Hashing with Balanced Similarity

1 code implementation29 Jan 2019 Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, Yunsheng Wu

In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework.

Retrieval

Facial Action Unit Detection Using Attention and Relation Learning

no code implementations10 Aug 2018 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Yunsheng Wu, Lizhuang Ma

By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured.

Action Unit Detection Facial Action Unit Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.