Search Results for author: Yongfei Liu

Found 9 papers, 7 papers with code

Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

no code implementations2 Mar 2023 Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He

Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.

Human-Object Interaction Detection Knowledge Distillation +2

Intention-aware Feature Propagation Network for Interactive Segmentation

no code implementations10 Mar 2022 Chuyu Zhang, Chuanyang Hu, Yongfei Liu, Xuming He

We aim to tackle the problem of point-based interactive segmentation, in which two key challenges are to infer user's intention correctly and to propagate the user-provided annotations to unlabeled regions efficiently.

Interactive Segmentation

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation Findings (NAACL) 2022 Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Representation Learning

GEM: A General Evaluation Benchmark for Multimodal Tasks

1 code implementation Findings (ACL) 2021 Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti

Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

1 code implementation CVPR 2021 Yongfei Liu, Bo Wan, Lin Ma, Xuming He

Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding.

Scene Understanding Visual Grounding +1

Learning Cross-modal Context Graph for Visual Grounding

2 code implementations20 Nov 2019 Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He

To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.

Graph Matching Visual Grounding

Pose-aware Multi-level Feature Network for Human Object Interaction Detection

1 code implementation ICCV 2019 Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He

Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories.

Human-Object Interaction Detection Scene Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.