Search Results for author: Jiuxiang Gu

Found 46 papers, 16 papers with code

DocTime: A Document-level Temporal Dependency Graph Parser

no code implementations • NAACL 2022 • Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, Jiuxiang Gu, Franck Dernoncourt, Quan Tran, Ani Nenkova, Dinesh Manocha, Rajiv Jain

We introduce DocTime - a novel temporal dependency graph (TDG) parser that takes as input a text document and produces a temporal dependency graph.

Paper
Add Code

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns

no code implementations • Findings (ACL) 2022 • Zihan Wang, Jiuxiang Gu, Jason Kuen, Handong Zhao, Vlad Morariu, Ruiyi Zhang, Ani Nenkova, Tong Sun, Jingbo Shang

We present a comprehensive study of sparse attention patterns in Transformer models.

Paper
Add Code

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

no code implementations • 18 Apr 2024 • Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.

Segmentation

Paper
Add Code

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

2 code implementations • 15 Feb 2024 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou

Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality.

Data Augmentation Instruction Following

Paper
Code

Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

no code implementations • 12 Feb 2024 • Jiuxiang Gu, Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou

Our research presents a thorough analytical characterization of the features learned by stylized one-hidden layer neural networks and one-layer Transformers in addressing this task.

2k Mathematical Reasoning

Paper
Add Code

Customization Assistant for Text-to-image Generation

1 code implementation • 5 Dec 2023 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun

Some existing methods do not require fine-tuning, while their performance are unsatisfactory.

Descriptive Language Modelling +2

460

Paper
Code

LRM: Large Reconstruction Model for Single Image to 3D

1 code implementation • 8 Nov 2023 • Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan

We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.

Image to 3D

736

Paper
Code

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances

no code implementations • 25 Oct 2023 • Zhendong Chu, Ruiyi Zhang, Tong Yu, Rajiv Jain, Vlad I Morariu, Jiuxiang Gu, Ani Nenkova

To achieve state-of-the-art performance, one still needs to train NER models on large-scale, high-quality annotated data, an asset that is both costly and time-intensive to accumulate.

NER

Paper
Add Code

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

2 code implementations • 18 Oct 2023 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Heng Huang, Jiuxiang Gu, Tianyi Zhou

Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation.

Natural Language Understanding

Paper
Code

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

1 code implementation • 29 Jun 2023 • Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun

Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.

16k Image Captioning +3

234

Paper
Code

AIMS: All-Inclusive Multi-Level Segmentation

1 code implementation • 28 May 2023 • Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.

Image Segmentation Segmentation +1

667

Paper
Code

Learning the Visualness of Text Using Large Vision-Language Models

no code implementations • 11 May 2023 • Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova

Visual text evokes an image in a person's mind, while non-visual text fails to do so.

Contrastive Learning Image Retrieval +2

Paper
Add Code

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents

no code implementations • IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023 • Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Ani Nenkova, Dinesh Manocha, Vlad I. Morariu

Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.

Reading Order Detection

Paper
Add Code

High Quality Entity Segmentation

no code implementations • ICCV 2023 • Lu Qi, Jason Kuen, Tiancheng Shen, Jiuxiang Gu, Wenbo Li, Weidong Guo, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images.

Image Segmentation Segmentation +1

Paper
Add Code

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

no code implementations • 27 Nov 2022 • Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu

In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.

Paper
Add Code

Delving into Out-of-Distribution Detection with Vision-Language Representations

2 code implementations • 24 Nov 2022 • Yifei Ming, Ziyang Cai, Jiuxiang Gu, Yiyou Sun, Wei Li, Yixuan Li

Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world.

Ranked #5 on Out-of-Distribution Detection on ImageNet-1k vs Places

Out-of-Distribution Detection

Paper
Code

High-Quality Entity Segmentation

1 code implementation • 10 Nov 2022 • Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.

Image Segmentation Segmentation +2

667

Paper
Code

User-Entity Differential Privacy in Learning Natural Language Models

1 code implementation • 1 Nov 2022 • Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios

In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs).

Paper
Code

Improving the Reliability for Confidence Estimation

no code implementations • 13 Oct 2022 • Haoxuan Qu, Yanchao Li, Lin Geng Foo, Jason Kuen, Jiuxiang Gu, Jun Liu

Confidence estimation, a task that aims to evaluate the trustworthiness of the model's prediction output during deployment, has received lots of research attention recently, due to its importance for the safe deployment of deep models.

Image Classification Meta-Learning +1

Paper
Add Code

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

no code implementations • 23 Jul 2022 • Li Xu, Haoxuan Qu, Jason Kuen, Jiuxiang Gu, Jun Liu

Video scene graph generation (VidSGG) aims to parse the video content into scene graphs, which involves modeling the spatio-temporal contextual information in the video.

Graph Generation Meta-Learning +2

Paper
Add Code

Unified Pretraining Framework for Document Understanding

no code implementations • 22 Apr 2022 • Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun

Document intelligence automates the extraction of information from documents and supports many business applications.

Ranked #7 on Document Layout Analysis on PubLayNet val

Document Layout Analysis document understanding +1

Paper
Add Code

Towards Language-Free Training for Text-to-Image Generation

no code implementations • CVPR 2022 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.

Zero-Shot Text-to-Image Generation

Paper
Add Code

EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval

no code implementations • CVPR 2022 • Haoyu Ma, Handong Zhao, Zhe Lin, Ajinkya Kale, Zhangyang Wang, Tong Yu, Jiuxiang Gu, Sunav Choudhary, Xiaohui Xie

recommendation, and marketing services.

Causal Inference Contrastive Learning +3

Paper
Add Code

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

667

Paper
Code

UniDoc: Unified Pretraining Framework for Document Understanding

no code implementations • NeurIPS 2021 • Jiuxiang Gu, Jason Kuen, Vlad Morariu, Handong Zhao, Rajiv Jain, Nikolaos Barmpalios, Ani Nenkova, Tong Sun

Document intelligence automates the extraction of information from documents and supports many business applications.

document understanding Self-Supervised Learning

Paper
Add Code

LAFITE: Towards Language-Free Training for Text-to-Image Generation

2 code implementations • 27 Nov 2021 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.

Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ

Zero-Shot Text-to-Image Generation

176

Paper
Code

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling

1 code implementation • CVPR 2022 • Dat Huynh, Jason Kuen, Zhe Lin, Jiuxiang Gu, Ehsan Elhamifar

To address this, we propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.

Instance Segmentation Semantic Segmentation

Paper
Code

Bit-aware Randomized Response for Local Differential Privacy in Federated Learning

no code implementations • 29 Sep 2021 • Phung Lai, Hai Phan, Li Xiong, Khang Phuc Tran, My Thai, Tong Sun, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios, Rajiv Jain

In this paper, we develop BitRand, a bit-aware randomized response algorithm, to preserve local differential privacy (LDP) in federated learning (FL).

Federated Learning Image Classification

Paper
Add Code

Multi-Scale Aligned Distillation for Low-Resolution Detection

2 code implementations • CVPR 2021 • Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia

However, this option traditionally hurts the detection performance much.

Knowledge Distillation object-detection +1

128

Paper
Code

Open-World Entity Segmentation

2 code implementations • 29 Jul 2021 • Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia

By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality.

Image Manipulation Image Segmentation +2

667

Paper
Code

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection

no code implementations • CVPR 2021 • Huiyuan Yang, Lijun Yin, Yi Zhou, Jiuxiang Gu

The learned AU semantic embeddings are then used as guidance for the generation of attention maps through a cross-modality attention network.

Action Unit Detection Facial Action Unit Detection +1

Paper
Add Code

SelfDoc: Self-Supervised Document Representation Learning

no code implementations • CVPR 2021 • Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu

For downstream usage, we propose a novel modality-adaptive attention mechanism for multimodal feature fusion by adaptively emphasizing language and vision signals.

Representation Learning

Paper
Add Code

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU Models

no code implementations • NAACL 2021 • Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu, Tong Sun, Xia Hu

These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample.

Paper
Add Code

Self-Supervised Relationship Probing

no code implementations • NeurIPS 2020 • Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun

Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.

Contrastive Learning Language Modelling +1

Paper
Add Code

UNISON: Unpaired Cross-lingual Image Captioning

no code implementations • 3 Oct 2020 • Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu

In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language.

Caption Generation Image Captioning +3

Paper
Add Code

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations • ECCV 2020 • Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Reinforcement Learning (RL)

Paper
Add Code

Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach

no code implementations • 6 Nov 2019 • Shuhan Yao, Jiuxiang Gu, Peng Wang, Tianyang Zhao, Huajun Zhang, Xiaochuan Liu

Mobile energy storage systems (MESSs) provide mobility and flexibility to enhance distribution system resilience.

Scheduling

Paper
Add Code

Watch It Twice: Video Captioning with a Refocused Video Encoder

no code implementations • 21 Jul 2019 • Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.

Video Captioning

Paper
Add Code

Scene Graph Generation with External Knowledge and Image Reconstruction

no code implementations • CVPR 2019 • Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling

Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.

Graph Generation Image Reconstruction +6

Paper
Add Code

Unpaired Image Captioning via Scene Graph Alignments

no code implementations • ICCV 2019 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang

Most of current image captioning models heavily rely on paired image-caption datasets.

Image Captioning Sentence

Paper
Add Code

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

no code implementations • 8 Jul 2018 • Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.

Language Modelling Sentence +3

Paper
Add Code

Unpaired Image Captioning by Language Pivoting

no code implementations • ECCV 2018 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.

Image Captioning Sentence

Paper
Add Code

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

no code implementations • CVPR 2018 • Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities.

Cross-Modal Retrieval Retrieval +1

Paper
Add Code

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

1 code implementation • 11 Sep 2017 • Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen

On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.

Image Captioning Sentence

Paper
Code

An Empirical Study of Language CNN for Image Captioning

2 code implementations • ICCV 2017 • Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen

Language Models based on recurrent neural networks have dominated recent image caption generation tasks.

Caption Generation Image Captioning +1

Paper
Code

Recent Advances in Convolutional Neural Networks

no code implementations • 22 Dec 2015 • Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.