no code implementations • 24 Apr 2024 • Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji
This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.
4 code implementations • 5 Apr 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.
2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun
They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.
2 code implementations • 4 Dec 2023 • Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji
However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.
1 code implementation • 24 Oct 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen
Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.
no code implementations • 31 Aug 2023 • Linsen Song, Wayne Wu, Chaoyou Fu, Chen Change Loy, Ran He
Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production, which requires massive training data and training time to learn a person-specific audio-video mapping.
3 code implementations • 23 Jun 2023 • Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.
1 code implementation • 23 Jun 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen
Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks.
1 code implementation • NeurIPS 2023 • Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu
To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed.
Ranked #1 on Few-Shot Object Detection on ODinW-35
no code implementations • 10 Jun 2022 • Ziming Yang, Jian Liang, Chaoyou Fu, Mandi Luo, Xiao-Yu Zhang
Secondly, we devise a face synthesis module (FSM) to generate a large number of images with stochastic combinations of disentangled identities and attributes for enriching the attribute diversity of synthetic images.
no code implementations • CVPR 2022 • Gengyun Jia, Huaibo Huang, Chaoyou Fu, Ran He
In this paper, we regard image cropping as a set prediction problem.
no code implementations • 4 Oct 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Ran He
Human face synthesis involves transferring knowledge about the identity and identity-dependent face shape (IDFS) of a human face to target face images where the context (e. g., facial expressions, head poses, and other background factors) may change dramatically.
no code implementations • CVPR 2021 • Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He
We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video.
1 code implementation • CVPR 2021 • Gege Gao, Huaibo Huang, Chaoyou Fu, Zhaoyang Li, Ran He
In this work, we propose a novel information disentangling and swapping network, called InfoSwap, to extract the most expressive information for identity representation from a pre-trained face recognition model.
1 code implementation • 7 Apr 2021 • Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He
We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video.
1 code implementation • ICCV 2021 • Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, Ran He
Visible-Infrared person re-identification (VI-ReID) aims to match cross-modality pedestrian images, breaking through the limitation of single-modality person ReID in dark environment.
no code implementations • NeurIPS 2020 • Hao Zhu, Chaoyou Fu, Qianyi Wu, Wayne Wu, Chen Qian, Ran He
However, due to the lack of Deepfakes datasets with large variance in appearance, which can be hardly produced by recent identity swapping methods, the detection algorithm may fail in this situation.
1 code implementation • 20 Sep 2020 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
As a consequence, massive new diverse paired heterogeneous images with the same identity can be generated from noises.
no code implementations • 17 Sep 2020 • Chaoyou Fu, Guoli Wang, Xiang Wu, Qian Zhang, Ran He
It embodies the uncertainty of the hashing network to the corresponding input image.
no code implementations • NeurIPS 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Specifically, we first introduce a dual variational autoencoder to represent a joint distribution of paired heterogeneous images.
no code implementations • CVPR 2020 • Boyan Duan, Chaoyou Fu, Yi Li, Xingguang Song, Ran He
The cross-sensor gap is one of the challenges that have aroused much research interests in Heterogeneous Face Recognition (HFR).
no code implementations • 28 Mar 2019 • Chaoyou Fu, Yibo Hu, Xiang Wu, Guoli Wang, Qian Zhang, Ran He
Furthermore, due to the lack of high-resolution face manipulation databases to verify the effectiveness of our method, we collect a new high-quality Multi-View Face (MVF-HQ) database.
1 code implementation • 25 Mar 2019 • Chaoyou Fu, Xiang Wu, Yibo Hu, Huaibo Huang, Ran He
Then, in order to ensure the identity consistency of the generated paired heterogeneous images, we impose a distribution alignment in the latent space and a pairwise identity preserving in the image space.
Ranked #1 on Face Verification on CASIA NIR-VIS 2.0
no code implementations • 7 Sep 2018 • Chaoyou Fu, Liangchen Song, Xiang Wu, Guoli Wang, Ran He
It generates hashing bits by the output neurons of a deep hashing network.