no code implementations • 17 Aug 2024 • Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister
MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities.
1 code implementation • 28 May 2024 • Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li
Traditional image-to-3D models often struggle with scenes containing multiple objects due to biases and occlusion complexities.
1 code implementation • CVPR 2024 • Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister
Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets.
1 code implementation • 2 Apr 2024 • Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.
1 code implementation • 31 Mar 2024 • Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.
Ranked #2 on Highlight Detection on QVHighlights
no code implementations • 25 Jan 2024 • Jia Wan, Wanhua Li, Jason Ken Adhinarta, Atmadeep Banerjee, Evelina Sjostedt, Jingpeng Wu, Jeff Lichtman, Hanspeter Pfister, Donglai Wei
While imaging techniques at macro and mesoscales have garnered substantial attention and resources, microscale Volume Electron Microscopy (vEM) imaging, capable of revealing intricate vascular details, has lacked the necessary benchmarking infrastructure.
1 code implementation • CVPR 2024 • Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister
Humans live in a 3D world and commonly use natural language to interact with a 3D scene.
1 code implementation • ICCV 2023 • Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, Donglai Wei
Simultaneously, there has been an influx of multilingual pre-trained models for NMT and multimodal pre-trained models for vision-language tasks, primarily in English, which have shown exceptional generalisation ability.
1 code implementation • CVPR 2023 • Shuai Shen, Wenliang Zhao, Zibin Meng, Wanhua Li, Zheng Zhu, Jie zhou, Jiwen Lu
In this way, the proposed DiffTalk is capable of producing high-quality talking head videos in synchronization with the source audio, and more importantly, it can be naturally generalized across different identities without any further fine-tuning.
no code implementations • ICCV 2023 • Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie zhou, Jiwen Lu
Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance.
1 code implementation • 24 Jul 2022 • Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie zhou, Jiwen Lu
Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images.
1 code implementation • 18 Jul 2022 • Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie zhou, Jiwen Lu
As each sample is annotated with multiple attribute labels, these "words" will naturally form an unordered but meaningful "sentence", which depicts the semantic information of the corresponding sample.
1 code implementation • 12 Jul 2022 • Wanhua Li, Jiwen Lu, Abudukelimu Wuerkaixi, Jianjiang Feng, Jie zhou
Unlike most existing personalized methods that learn the parameters of a personalized estimator for each person in the training set, our method learns the mapping from identity information to age estimator parameters.
Ranked #1 on Age Estimation on ChaLearn 2015
1 code implementation • 6 Jun 2022 • Wanhua Li, Xiaoke Huang, Zheng Zhu, Yansong Tang, Xiu Li, Jie zhou, Jiwen Lu
In this paper, we propose to learn the rank concepts from the rich semantic CLIP latent space.
Ranked #1 on Few-shot Age Estimation on MORPH Album2
no code implementations • 1 Oct 2021 • PengYu Chen, Wanhua Li
In the end, our team achieved the result of 0. 366 for AP@0. 50:0. 95 on the test set, which is competitive with other top-ranking methods while only one GPU is used.
no code implementations • 6 Sep 2021 • Wanhua Li, Jiwen Lu, Abudukelimu Wuerkaixi, Jianjiang Feng, Jie zhou
To address this, we propose a Star-shaped Reasoning Graph Network (S-RGN).
Ranked #1 on Kinship Verification on KinFaceW-I
1 code implementation • CVPR 2021 • Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou
To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.
no code implementations • CVPR 2021 • Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, Jie zhou
In the end, the samples in the unbalanced train batch are re-weighted by the learned meta-miner to optimize the kinship models.
Ranked #1 on Kinship Verification on KinFaceW-II
1 code implementation • CVPR 2021 • Wanhua Li, Xiaoke Huang, Jiwen Lu, Jianjiang Feng, Jie zhou
An ordinal distribution constraint is proposed to exploit the ordinal nature of regression.
Ranked #2 on Age Estimation on Adience
Aesthetics Quality Assessment Age And Gender Classification +3
1 code implementation • 24 Mar 2021 • Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou
To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.
no code implementations • ICCV 2021 • Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, Jie zhou
In this paper, we propose a frequency-aware spatiotemporal transformers for deep In this paper, we propose a Frequency-Aware Spatiotemporal Transformer (FAST) for video inpainting detection, which aims to simultaneously mine the traces of video inpainting from spatial, temporal, and frequency domains.
1 code implementation • ECCV 2020 • Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, Jie zhou
Human beings are fundamentally sociable -- that we generally organize our social lives in terms of relations with other people.
Ranked #1 on Visual Social Relationship Recognition on PIPA
no code implementations • 22 Apr 2020 • Wanhua Li, Yingqiang Zhang, Kangchen Lv, Jiwen Lu, Jianjiang Feng, Jie zhou
In this paper, we propose a graph-based kinship reasoning (GKR) network for kinship verification, which aims to effectively perform relational reasoning on the extracted features of an image pair.
Ranked #3 on Kinship Verification on KinFaceW-II
no code implementations • CVPR 2019 • Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie zhou, Qi Tian
Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process.
Ranked #2 on Age Estimation on FGNET