no code implementations • 2 Mar 2023 • Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He
Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.
Human-Object Interaction Detection Knowledge Distillation +2
1 code implementation • CVPR 2022 • Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal
Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems.
no code implementations • 10 Mar 2022 • Chuyu Zhang, Chuanyang Hu, Yongfei Liu, Xuming He
We aim to tackle the problem of point-based interactive segmentation, in which two key challenges are to infer user's intention correctly and to propagate the user-provided annotations to unlabeled regions efficiently.
Ranked #4 on Interactive Segmentation on SBD
1 code implementation • Findings (NAACL) 2022 • Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan
Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.
1 code implementation • Findings (ACL) 2021 • Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti
Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.
1 code implementation • CVPR 2021 • Yongfei Liu, Bo Wan, Lin Ma, Xuming He
Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding.
2 code implementations • ECCV 2020 • Yongfei Liu, Xiangyi Zhang, Songyang Zhang, Xuming He
In this paper, we propose a novel few-shot semantic segmentation framework based on the prototype representation.
Ranked #3 on Few-Shot Semantic Segmentation on Pascal5i
Few-Shot Semantic Segmentation Semi-Supervised Semantic Segmentation
2 code implementations • 20 Nov 2019 • Yongfei Liu, Bo Wan, Xiaodan Zhu, Xuming He
To address their limitations, this paper proposes a language-guided graph representation to capture the global context of grounding entities and their relations, and develop a cross-modal graph matching strategy for the multiple-phrase visual grounding task.
1 code implementation • ICCV 2019 • Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He
Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories.