Search Results for author: Wanhua Li

Found 22 papers, 14 papers with code

R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

1 code implementation2 Apr 2024 Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen

Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.

Highlight Detection Moment Retrieval +4

Joint-Task Regularization for Partially Labeled Multi-Task Learning

1 code implementation2 Apr 2024 Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister

Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets.

Multi-Task Learning

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

1 code implementation31 Mar 2024 Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen

Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.

Highlight Detection Moment Retrieval +4

TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images

no code implementations25 Jan 2024 Jia Wan, Wanhua Li, Jason Ken Adhinarta, Atmadeep Banerjee, Evelina Sjostedt, Jingpeng Wu, Jeff Lichtman, Hanspeter Pfister, Donglai Wei

Furthermore, we developed a zero-shot cortical blood vessel segmentation method named TriSAM, which leverages the powerful segmentation model SAM for 3D segmentation.

Benchmarking Segmentation

LangSplat: 3D Language Gaussian Splatting

1 code implementation26 Dec 2023 Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister

Humans live in a 3D world and commonly use natural language to interact with a 3D scene.

Object Localization Semantic Segmentation

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

1 code implementation ICCV 2023 Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, Donglai Wei

Simultaneously, there has been an influx of multilingual pre-trained models for NMT and multimodal pre-trained models for vision-language tasks, primarily in English, which have shown exceptional generalisation ability.

Image Captioning Multimodal Machine Translation +2

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

1 code implementation CVPR 2023 Shuai Shen, Wenliang Zhao, Zibin Meng, Wanhua Li, Zheng Zhu, Jie zhou, Jiwen Lu

In this way, the proposed DiffTalk is capable of producing high-quality talking head videos in synchronization with the source audio, and more importantly, it can be naturally generalized across different identities without any further fine-tuning.

Denoising Talking Head Generation

CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

no code implementations ICCV 2023 Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie zhou, Jiwen Lu

Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance.

Attribute Clustering +2

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

1 code implementation18 Jul 2022 Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie zhou, Jiwen Lu

As each sample is annotated with multiple attribute labels, these "words" will naturally form an unordered but meaningful "sentence", which depicts the semantic information of the corresponding sample.

Attribute Clothing Attribute Recognition +4

MetaAge: Meta-Learning Personalized Age Estimators

1 code implementation12 Jul 2022 Wanhua Li, Jiwen Lu, Abudukelimu Wuerkaixi, Jianjiang Feng, Jie zhou

Unlike most existing personalized methods that learn the parameters of a personalized estimator for each person in the training set, our method learns the mapping from identity information to age estimator parameters.

Age Estimation Meta-Learning +1

3rd Place Scheme on Instance Segmentation Track of ICCV 2021 VIPriors Challenges

no code implementations1 Oct 2021 PengYu Chen, Wanhua Li

In the end, our team achieved the result of 0. 366 for AP@0. 50:0. 95 on the test set, which is competitive with other top-ranking methods while only one GPU is used.

Data Augmentation Instance Segmentation +2

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

1 code implementation CVPR 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Meta-Mining Discriminative Samples for Kinship Verification

no code implementations CVPR 2021 Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, Jie zhou

In the end, the samples in the unbalanced train batch are re-weighted by the learned meta-miner to optimize the kinship models.

Kinship Verification

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes

1 code implementation24 Mar 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

no code implementations ICCV 2021 Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, Jie zhou

In this paper, we propose a frequency-aware spatiotemporal transformers for deep In this paper, we propose a Frequency-Aware Spatiotemporal Transformer (FAST) for video inpainting detection, which aims to simultaneously mine the traces of video inpainting from spatial, temporal, and frequency domains.

Video Inpainting

Graph-Based Social Relation Reasoning

1 code implementation ECCV 2020 Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, Jie zhou

Human beings are fundamentally sociable -- that we generally organize our social lives in terms of relations with other people.

Relation Relational Reasoning +1

Graph-based Kinship Reasoning Network

no code implementations22 Apr 2020 Wanhua Li, Yingqiang Zhang, Kangchen Lv, Jiwen Lu, Jianjiang Feng, Jie zhou

In this paper, we propose a graph-based kinship reasoning (GKR) network for kinship verification, which aims to effectively perform relational reasoning on the extracted features of an image pair.

Kinship Verification Relational Reasoning

BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation

no code implementations CVPR 2019 Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie zhou, Qi Tian

Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process.

Age Estimation MORPH

Cannot find the paper you are looking for? You can Submit a new open access paper.