no code implementations • NAACL 2022 • Puneet Mathur, Vlad Morariu, Verena Kaynig-Fittkau, Jiuxiang Gu, Franck Dernoncourt, Quan Tran, Ani Nenkova, Dinesh Manocha, Rajiv Jain
We introduce DocTime - a novel temporal dependency graph (TDG) parser that takes as input a text document and produces a temporal dependency graph.
no code implementations • Findings (ACL) 2022 • Zihan Wang, Jiuxiang Gu, Jason Kuen, Handong Zhao, Vlad Morariu, Ruiyi Zhang, Ani Nenkova, Tong Sun, Jingbo Shang
We present a comprehensive study of sparse attention patterns in Transformer models.
no code implementations • 18 Apr 2024 • Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang
Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.
2 code implementations • 15 Feb 2024 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou
Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality.
no code implementations • 12 Feb 2024 • Jiuxiang Gu, Chenyang Li, YIngyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou
Our research presents a thorough analytical characterization of the features learned by stylized one-hidden layer neural networks and one-layer Transformers in addressing this task.
1 code implementation • 5 Dec 2023 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun
Some existing methods do not require fine-tuning, while their performance are unsatisfactory.
1 code implementation • 8 Nov 2023 • Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds.
no code implementations • 25 Oct 2023 • Zhendong Chu, Ruiyi Zhang, Tong Yu, Rajiv Jain, Vlad I Morariu, Jiuxiang Gu, Ani Nenkova
To achieve state-of-the-art performance, one still needs to train NER models on large-scale, high-quality annotated data, an asset that is both costly and time-intensive to accumulate.
2 code implementations • 18 Oct 2023 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Heng Huang, Jiuxiang Gu, Tianyi Zhou
Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation.
1 code implementation • 29 Jun 2023 • Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun
Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.
1 code implementation • 28 May 2023 • Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang
Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved.
no code implementations • 11 May 2023 • Gaurav Verma, Ryan A. Rossi, Christopher Tensmeyer, Jiuxiang Gu, Ani Nenkova
Visual text evokes an image in a person's mind, while non-visual text fails to do so.
no code implementations • IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023 • Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Ani Nenkova, Dinesh Manocha, Vlad I. Morariu
Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.
no code implementations • ICCV 2023 • Lu Qi, Jason Kuen, Tiancheng Shen, Jiuxiang Gu, Wenbo Li, Weidong Guo, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang
Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images.
no code implementations • 27 Nov 2022 • Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu
In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.
2 code implementations • 24 Nov 2022 • Yifei Ming, Ziyang Cai, Jiuxiang Gu, Yiyou Sun, Wei Li, Yixuan Li
Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world.
1 code implementation • 10 Nov 2022 • Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
1 code implementation • 1 Nov 2022 • Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios
In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs).
no code implementations • 13 Oct 2022 • Haoxuan Qu, Yanchao Li, Lin Geng Foo, Jason Kuen, Jiuxiang Gu, Jun Liu
Confidence estimation, a task that aims to evaluate the trustworthiness of the model's prediction output during deployment, has received lots of research attention recently, due to its importance for the safe deployment of deep models.
no code implementations • 23 Jul 2022 • Li Xu, Haoxuan Qu, Jason Kuen, Jiuxiang Gu, Jun Liu
Video scene graph generation (VidSGG) aims to parse the video content into scene graphs, which involves modeling the spatio-temporal contextual information in the video.
no code implementations • 22 Apr 2022 • Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun
Document intelligence automates the extraction of information from documents and supports many business applications.
Ranked #7 on Document Layout Analysis on PubLayNet val
no code implementations • CVPR 2022 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.
no code implementations • CVPR 2022 • Haoyu Ma, Handong Zhao, Zhe Lin, Ajinkya Kale, Zhangyang Wang, Tong Yu, Jiuxiang Gu, Sunav Choudhary, Xiaohui Xie
recommendation, and marketing services.
1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia
To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.
no code implementations • NeurIPS 2021 • Jiuxiang Gu, Jason Kuen, Vlad Morariu, Handong Zhao, Rajiv Jain, Nikolaos Barmpalios, Ani Nenkova, Tong Sun
Document intelligence automates the extraction of information from documents and supports many business applications.
2 code implementations • 27 Nov 2021 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
1 code implementation • CVPR 2022 • Dat Huynh, Jason Kuen, Zhe Lin, Jiuxiang Gu, Ehsan Elhamifar
To address this, we propose a cross-modal pseudo-labeling framework, which generates training pseudo masks by aligning word semantics in captions with visual features of object masks in images.
no code implementations • 29 Sep 2021 • Phung Lai, Hai Phan, Li Xiong, Khang Phuc Tran, My Thai, Tong Sun, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios, Rajiv Jain
In this paper, we develop BitRand, a bit-aware randomized response algorithm, to preserve local differential privacy (LDP) in federated learning (FL).
2 code implementations • CVPR 2021 • Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia
However, this option traditionally hurts the detection performance much.
2 code implementations • 29 Jul 2021 • Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia
By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality.
no code implementations • CVPR 2021 • Huiyuan Yang, Lijun Yin, Yi Zhou, Jiuxiang Gu
The learned AU semantic embeddings are then used as guidance for the generation of attention maps through a cross-modality attention network.
no code implementations • CVPR 2021 • Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu
For downstream usage, we propose a novel modality-adaptive attention mechanism for multimodal feature fusion by adaptively emphasizing language and vision signals.
no code implementations • NAACL 2021 • Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu, Tong Sun, Xia Hu
These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample.
no code implementations • NeurIPS 2020 • Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun
Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.
no code implementations • 3 Oct 2020 • Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu
In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language.
no code implementations • ECCV 2020 • Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai
In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.
no code implementations • 6 Nov 2019 • Shuhan Yao, Jiuxiang Gu, Peng Wang, Tianyang Zhao, Huajun Zhang, Xiaochuan Liu
Mobile energy storage systems (MESSs) provide mobility and flexibility to enhance distribution system resilience.
no code implementations • 21 Jul 2019 • Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu
With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.
no code implementations • CVPR 2019 • Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling
Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.
no code implementations • ICCV 2019 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang
Most of current image captioning models heavily rely on paired image-caption datasets.
no code implementations • 8 Jul 2018 • Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty
In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.
no code implementations • ECCV 2018 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang
Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.
no code implementations • CVPR 2018 • Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang
Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities.
1 code implementation • 11 Sep 2017 • Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen
On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.
2 code implementations • ICCV 2017 • Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen
Language Models based on recurrent neural networks have dominated recent image caption generation tasks.
no code implementations • 22 Dec 2015 • Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen
In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.