1 code implementation • 30 Apr 2021 • Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan
Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation.
Ranked #16 on Text-to-Video Generation on MSR-VTT (CLIPSIM metric)
1 code implementation • NeurIPS 2019 • Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning.
5 code implementations • ICCV 2019 • Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries.
no code implementations • 17 Jun 2019 • Yaxian Xia, Lun Huang, Xiao-Yong Wei, Wenmin Wang
The first step, we call it intra-modal relation mechanism, in which we computes responses between different objects in an image or different words in a sentence separately; The second step, we call it inter-modal relation mechanism, in which the query plays a role of textual context to refine the relationship among object proposals in an image.