Search Results for author: Xiangxi Shi

Found 8 papers, 2 papers with code

Efficient Reasoning with Hidden Thinking

1 code implementation31 Jan 2025 Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu

Meanwhile, we design corresponding Heima Decoder with traditional Large Language Models (LLMs) to adaptively interpret the hidden representations into variable-length textual sequence, reconstructing reasoning processes that closely resemble the original CoTs.

Decoder Multimodal Reasoning

Hijacking Vision-and-Language Navigation Agents with Adversarial Environmental Attacks

no code implementations3 Dec 2024 Zijiao Yang, Xiangxi Shi, Eric Slyman, Stefan Lee

Assistive embodied agents that can be instructed in natural language to perform tasks in open-world environments have the potential to significantly impact labor tasks like manufacturing or in-home care -- benefiting the lives of those who come to depend on them.

Adversarial Attack Vision and Language Navigation

Viewpoint-Aware Visual Grounding in 3D Scenes

no code implementations CVPR 2024 Xiangxi Shi, Zhonghua Wu, Stefan Lee

In this paper we investigate the significance of viewpoint information in 3D visual grounding -- introducing a model that explicitly predicts the speaker's viewpoint based on the referring expression and scene.

3D visual grounding Referring Expression

Learning Meta-class Memory for Few-Shot Semantic Segmentation

1 code implementation ICCV 2021 Zhonghua Wu, Xiangxi Shi, Guosheng Lin, Jianfei Cai

To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage.

Few-Shot Semantic Segmentation Segmentation +1

Remember What You have drawn: Semantic Image Manipulation with Memory

no code implementations27 Jul 2021 Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty

Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description.

Image Manipulation

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations ECCV 2020 Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Reinforcement Learning (RL)

Watch It Twice: Video Captioning with a Refocused Video Encoder

no code implementations21 Jul 2019 Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.

Video Captioning

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

no code implementations8 Jul 2018 Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.

Decoder Language Modeling +5

Cannot find the paper you are looking for? You can Submit a new open access paper.