Search Results for author: Jiuxiang Gu

Found 18 papers, 4 papers with code

Open-World Entity Segmentation

2 code implementations29 Jul 2021 Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia

We introduce a new image segmentation task, termed Entity Segmentation (ES) with the aim to segment all visual entities in an image without considering semantic category labels.

Image Manipulation Semantic Segmentation

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection

no code implementations CVPR 2021 Huiyuan Yang, Lijun Yin, Yi Zhou, Jiuxiang Gu

The learned AU semantic embeddings are then used as guidance for the generation of attention maps through a cross-modality attention network.

Action Unit Detection Facial Action Unit Detection

SelfDoc: Self-Supervised Document Representation Learning

no code implementations CVPR 2021 Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu

For downstream usage, we propose a novel modality-adaptive attention mechanism for multimodal feature fusion by adaptively emphasizing language and vision signals.

Representation Learning

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU Models

no code implementations NAACL 2021 Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu, Tong Sun, Xia Hu

These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample.

Self-Supervised Relationship Probing

no code implementations NeurIPS 2020 Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun

Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.

Contrastive Learning Language Modelling

Unsupervised Cross-lingual Image Captioning

no code implementations3 Oct 2020 Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu

Research in image captioning has mostly focused on English because of the availability of image-caption paired datasets in this language.

Image Captioning Machine Translation +1

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations ECCV 2020 Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Watch It Twice: Video Captioning with a Refocused Video Encoder

no code implementations21 Jul 2019 Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.

Video Captioning

Scene Graph Generation with External Knowledge and Image Reconstruction

no code implementations CVPR 2019 Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling

Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.

Graph Generation Image Reconstruction +3

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

no code implementations8 Jul 2018 Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.

Language Modelling Text Generation +2

Unpaired Image Captioning by Language Pivoting

no code implementations ECCV 2018 Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.

Image Captioning

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

no code implementations CVPR 2018 Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities.

Cross-Modal Retrieval

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

1 code implementation11 Sep 2017 Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen

On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.

Image Captioning

Recent Advances in Convolutional Neural Networks

no code implementations22 Dec 2015 Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.