no code implementations • IJCNLP 2019 • Weike Jin, Zhou Zhao, Mao Gu, Jun Xiao, Furu Wei, Yueting Zhuang
Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history.
no code implementations • 8 Dec 2021 • Aoxiong Yin, Zhou Zhao, Jinglin Liu, Weike Jin, Meng Zhang, Xingshan Zeng, Xiaofei He
Sign language translation as a kind of technology with profound social significance has attracted growing researchers' interest in recent years.
no code implementations • CVPR 2022 • Aoxiong Yin, Zhou Zhao, Weike Jin, Meng Zhang, Xingshan Zeng, Xiaofei He
In addition, we also explore zero-shot translation in sign language and find that our model can achieve comparable performance to the supervised BSLT model on some language pairs.
no code implementations • 8 Sep 2022 • Jiong Wang, Zhou Zhao, Weike Jin
Multi-modal video question answering aims to predict correct answer and localize the temporal boundary relevant to the question.
1 code implementation • 21 Feb 2022 • Jiong Wang, Zhou Zhao, Weike Jin, Xinyu Duan, Zhen Lei, Baoxing Huai, Yiling Wu, Xiaofei He
In this paper, the VLAD aggregation method is adopted to quantize local features with visual vocabulary locally partitioning the feature space, and hence preserve the local discriminability.
1 code implementation • CVPR 2023 • Aoxiong Yin, Tianyun Zhong, Li Tang, Weike Jin, Tao Jin, Zhou Zhao
We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally.
Ranked #3 on Gloss-free Sign Language Translation on PHOENIX14T