Search Results for author: Zhenzhen Hu

Found 10 papers, 6 papers with code

Agent Journey Beyond RGB: Unveiling Hybrid Semantic-Spatial Environmental Representations for Vision-and-Language Navigation

1 code implementation9 Dec 2024 Xuesong Zhang, Yunbo Xu, Jia Li, Zhenzhen Hu, Richnag Hong

SUSA includes a Textual Semantic Understanding (TSU) module, which narrows the modality gap between instructions and environments by generating and associating the descriptions of environmental landmarks in the agent's immediate surroundings.

Object Localization Vision and Language Navigation +2

Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval

1 code implementation9 Oct 2024 Jian Xiao, Zhenzhen Hu, Jia Li, Richang Hong

By replacing a single text query with a series of text proxies, TV-ProxyNet not only broadens the query scope but also achieves a more precise expansion.

Text Retrieval Video-Text Retrieval

Static for Dynamic: Towards a Deeper Understanding of Dynamic Facial Expressions Using Static Expression Data

1 code implementation10 Sep 2024 Yin Chen, Jia Li, Yu Zhang, Zhenzhen Hu, Shiguang Shan, Meng Wang, Richang Hong

Dynamic facial expression recognition (DFER) infers emotions from the temporal evolution of expressions, unlike static facial expression recognition (SFER), which relies solely on a single snapshot.

Dynamic Facial Expression Recognition Facial Expression Recognition +1

Seeing is Believing? Enhancing Vision-Language Navigation using Visual Perturbations

no code implementations9 Sep 2024 Xuesong Zhang, Jia Li, Yunbo Xu, Zhenzhen Hu, Richang Hong

Autonomous navigation for an embodied agent guided by natural language instructions remains a formidable challenge in vision-and-language navigation (VLN).

Autonomous Navigation Diversity +2

Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning

no code implementations19 Jul 2023 Zijie Song, Zhenzhen Hu, Yuanen Zhou, Ye Zhao, Richang Hong, Meng Wang

The crucial issue in this task is to model the global and the local matching between the image and different languages.

Image Captioning

Compact Bidirectional Transformer for Image Captioning

1 code implementation6 Jan 2022 Yuanen Zhou, Zhenzhen Hu, Daqing Liu, Huixia Ben, Meng Wang

In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly while the decoder can be executed parallelly.

Decoder Image Captioning +1

Semi-Autoregressive Transformer for Image Captioning

1 code implementation17 Jun 2021 Yuanen Zhou, Yong Zhang, Zhenzhen Hu, Meng Wang

To tackle this issue, non-autoregressive image captioning models have recently been proposed to significantly accelerate the speed of inference by generating all words in parallel.

Image Captioning

More Grounded Image Captioning by Distilling Image-Text Matching Model

1 code implementation CVPR 2020 Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang

To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision.

Image Captioning Image-text matching +4

Quality-aware Unpaired Image-to-Image Translation

no code implementations15 Mar 2019 Lei Chen, Le Wu, Zhenzhen Hu, Meng Wang

To tackle the above two challenges, in this paper, we propose a unified quality-aware GAN-based framework for unpaired image-to-image translation, where a quality-aware loss is explicitly incorporated by comparing each source image and the reconstructed image at the domain level.

Image Quality Assessment Image-to-Image Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.