1 code implementation • 9 Dec 2024 • Xuesong Zhang, Yunbo Xu, Jia Li, Zhenzhen Hu, Richnag Hong
SUSA includes a Textual Semantic Understanding (TSU) module, which narrows the modality gap between instructions and environments by generating and associating the descriptions of environmental landmarks in the agent's immediate surroundings.
Ranked #1 on Visual Navigation on R2R
1 code implementation • 9 Oct 2024 • Jian Xiao, Zhenzhen Hu, Jia Li, Richang Hong
By replacing a single text query with a series of text proxies, TV-ProxyNet not only broadens the query scope but also achieves a more precise expansion.
1 code implementation • 10 Sep 2024 • Yin Chen, Jia Li, Yu Zhang, Zhenzhen Hu, Shiguang Shan, Meng Wang, Richang Hong
Dynamic facial expression recognition (DFER) infers emotions from the temporal evolution of expressions, unlike static facial expression recognition (SFER), which relies solely on a single snapshot.
Ranked #2 on Dynamic Facial Expression Recognition on DFEW
Dynamic Facial Expression Recognition Facial Expression Recognition +1
no code implementations • 9 Sep 2024 • Xuesong Zhang, Jia Li, Yunbo Xu, Zhenzhen Hu, Richang Hong
Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN).
no code implementations • 27 Oct 2023 • Zijie Song, Zhenzhen Hu, Richang Hong
Unsupervised representation learning for image clustering is essential in computer vision.
no code implementations • 19 Jul 2023 • Zijie Song, Zhenzhen Hu, Yuanen Zhou, Ye Zhao, Richang Hong, Meng Wang
The crucial issue in this task is to model the global and the local matching between the image and different languages.
1 code implementation • 6 Jan 2022 • Yuanen Zhou, Zhenzhen Hu, Daqing Liu, Huixia Ben, Meng Wang
In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly while the decoder can be executed parallelly.
1 code implementation • 17 Jun 2021 • Yuanen Zhou, Yong Zhang, Zhenzhen Hu, Meng Wang
To tackle this issue, non-autoregressive image captioning models have recently been proposed to significantly accelerate the speed of inference by generating all words in parallel.
1 code implementation • CVPR 2020 • Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang
To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision.
no code implementations • 15 Mar 2019 • Lei Chen, Le Wu, Zhenzhen Hu, Meng Wang
To tackle the above two challenges, in this paper, we propose a unified quality-aware GAN-based framework for unpaired image-to-image translation, where a quality-aware loss is explicitly incorporated by comparing each source image and the reconstructed image at the domain level.