Search Results for author: Chaoyou Fu

Found 24 papers, 12 papers with code

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code implementations • 24 Apr 2024 • Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Decision Making Logical Reasoning +1

Paper
Add Code

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

4 code implementations • 5 Apr 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.

Few-Shot Learning Scene Segmentation +1

444

Paper
Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

9,392

Paper
Code

Aligning and Prompting Everything All at Once for Universal Visual Perception

2 code implementations • 4 Dec 2023 • Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.

Object object-detection +6

430

Paper
Code

Woodpecker: Hallucination Correction for Multimodal Large Language Models

1 code implementation • 24 Oct 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.

Hallucination

552

Paper
Code

Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis

no code implementations • 31 Aug 2023 • Linsen Song, Wayne Wu, Chaoyou Fu, Chen Change Loy, Ran He

Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping.

Paper
Add Code

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

3 code implementations • 23 Jun 2023 • Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modelling +3

9,392

Paper
Code

A Survey on Multimodal Large Language Models

1 code implementation • 23 Jun 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen

Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks.

Hallucination In-Context Learning +5

9,392

Paper
Code

Multi-modal Queried Object Detection in the Wild

1 code implementation • NeurIPS 2023 • Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu

To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed.

Ranked #1 on Few-Shot Object Detection on ODinW-35

Few-Shot Object Detection Object +2

234

Paper
Code

Heterogeneous Face Recognition via Face Synthesis with Identity-Attribute Disentanglement

no code implementations • 10 Jun 2022 • Ziming Yang, Jian Liang, Chaoyou Fu, Mandi Luo, Xiao-Yu Zhang

Secondly, we devise a face synthesis module (FSM) to generate a large number of images with stochastic combinations of disentangled identities and attributes for enriching the attribute diversity of synthetic images.

Attribute Data Augmentation +4

Paper
Add Code

Rethinking Image Cropping: Exploring Diverse Compositions From Global Views

no code implementations • CVPR 2022 • Gengyun Jia, Huaibo Huang, Chaoyou Fu, Ran He

In this paper, we regard image cropping as a set prediction problem.

Image Cropping regression +1

Paper
Add Code

Causal Representation Learning for Context-Aware Face Transfer

no code implementations • 4 Oct 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He

Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e. g., facial expressions, head poses, and other background factors) may change dramatically.

counterfactual Counterfactual Inference +4

Paper
Add Code

Pareidolia Face Reenactment

no code implementations • CVPR 2021 • Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He

We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video.

Face Reenactment Texture Synthesis

Paper
Add Code

Information Bottleneck Disentanglement for Identity Swapping

1 code implementation • CVPR 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He

In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.

Disentanglement Face Recognition +1

Paper
Code

Everything's Talkin': Pareidolia Face Reenactment

1 code implementation • 7 Apr 2021 • Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He

We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video.

Face Reenactment Texture Synthesis

Paper
Code

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification

1 code implementation • ICCV 2021 • Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, Ran He

Visible-Infrared person re-identification (VI-ReID) aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment.

Neural Architecture Search Person Re-Identification

Paper
Code

AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection

no code implementations • NeurIPS 2020 • Hao Zhu, Chaoyou Fu, Qianyi Wu, Wayne Wu, Chen Qian, Ran He

However, due to the lack of Deepfakes datasets with large variance in appearance, which can be hardly produced by recent identity swapping methods, the detection algorithm may fail in this situation.

Paper
Add Code

DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition

1 code implementation • 20 Sep 2020 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises.

Contrastive Learning Face Recognition +1

117

Paper
Code

Deep Momentum Uncertainty Hashing

no code implementations • 17 Sep 2020 • Chaoyou Fu, Guoli Wang, Xiang Wu, Qian Zhang, Ran He

It embodies the uncertainty of the hashing network to the corresponding input image.

Combinatorial Optimization Deep Hashing

Paper
Add Code

Dual Variational Generation for Low Shot Heterogeneous Face Recognition

no code implementations • NeurIPS 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

Specifically, we first introduce a dual variational autoencoder to represent a joint distribution of paired heterogeneous images.

Face Recognition Heterogeneous Face Recognition

Paper
Add Code

Cross-Spectral Face Hallucination via Disentangling Independent Factors

no code implementations • CVPR 2020 • Boyan Duan, Chaoyou Fu, Yi Li, Xingguang Song, Ran He

The cross-sensor gap is one of the challenges that have aroused much research interests in Heterogeneous Face Recognition (HFR).

Face Alignment Face Hallucination +3

Paper
Add Code

High Fidelity Face Manipulation with Extreme Poses and Expressions

no code implementations • 28 Mar 2019 • Chaoyou Fu, Yibo Hu, Xiang Wu, Guoli Wang, Qian Zhang, Ran He

Furthermore, due to the lack of high-resolution face manipulation databases to verify the effectiveness of our method, we collect a new high-quality Multi-View Face (MVF-HQ) database.

Face Generation Face Recognition +1

Paper
Add Code

Dual Variational Generation for Low-Shot Heterogeneous Face Recognition

1 code implementation • 25 Mar 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He

Then, in order to ensure the identity consistency of the generated paired heterogeneous images, we impose a distribution alignment in the latent space and a pairwise identity preserving in the image space.

Ranked #1 on Face Verification on CASIA NIR-VIS 2.0

Face Recognition Heterogeneous Face Recognition

117

Paper
Code

Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing

no code implementations • 7 Sep 2018 • Chaoyou Fu, Liangchen Song, Xiang Wu, Guoli Wang, Ran He

It generates hashing bits by the output neurons of a deep hashing network.

Deep Hashing Information Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.