1 code implementation • 18 Apr 2024 • Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li
Initialization is responsible for encoding images and text using a VLM, followed by a feature filter that selects text features similar to image.
no code implementations • 19 Mar 2024 • Xiaoyu Qiu, Yuechen Wang, Jiaxin Shi, Wengang Zhou, Houqiang Li
To efficiently transfer soft prompt, we propose a novel framework, Multilingual Prompt Translator (MPT), where a multilingual prompt translator is introduced to properly process crucial knowledge embedded in prompt by changing language knowledge while retaining task knowledge.
no code implementations • 17 Aug 2023 • Yuechen Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li
Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation.
no code implementations • Findings (EMNLP) 2021 • Yuechen Wang, Wengang Zhou, Houqiang Li
In this work, we propose a novel candidate-free framework: Fine-grained Semantic Alignment Network (FSAN), for weakly supervised TLG.
2 code implementations • 15 Oct 2022 • Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.
2 code implementations • 25 Oct 2021 • Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li
Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.
no code implementations • ICCV 2021 • Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, Houqiang Li
To validate the effectiveness of our method on SLR, we perform extensive experiments on four public benchmark datasets, i. e., NMFs-CSL, SLR500, MSASL and WLASL.
Ranked #1 on Sign Language Recognition on WLASL100 (using extra training data)
1 code implementation • 30 Jun 2021 • Yuechen Wang, Jiajun Deng, Wengang Zhou, Houqiang Li
To this end, we introduce a novel weakly supervised temporal adjacent network (WSTAN) for temporal language grounding.
no code implementations • 12 Apr 2020 • Shangwen Lv, Yuechen Wang, Daya Guo, Duyu Tang, Nan Duan, Fuqing Zhu, Ming Gong, Linjun Shou, Ryan Ma, Daxin Jiang, Guihong Cao, Ming Zhou, Songlin Hu
In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks.
no code implementations • IJCNLP 2019 • Jingjing Xu, Yuechen Wang, Duyu Tang, Nan Duan, Pengcheng Yang, Qi Zeng, Ming Zhou, Xu sun
We provide representative baselines for these tasks and further introduce a coarse-to-fine model for clarification question generation.