no code implementations • Findings (ACL) 2022 • Zihan Wang, Jiuxiang Gu, Jason Kuen, Handong Zhao, Vlad Morariu, Ruiyi Zhang, Ani Nenkova, Tong Sun, Jingbo Shang
We present a comprehensive study of sparse attention patterns in Transformer models.
1 code implementation • ICML 2020 • Hai Phan, My T. Thai, Han Hu, Ruoming Jin, Tong Sun, Dejing Dou
In this paper, we aim to develop a scalable algorithm to preserve differential privacy (DP) in adversarial learning for deep neural networks (DNNs), with certified robustness to adversarial examples.
no code implementations • 17 Dec 2024 • Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu
Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.
1 code implementation • 17 Dec 2024 • Zihao Lin, Zichao Wang, Yuanting Pan, Varun Manjunatha, Ryan Rossi, Angela Lau, Lifu Huang, Tong Sun
Suggested questions (SQs) provide an effective initial interface for users to engage with their documents in AI-powered reading applications.
no code implementations • 13 Dec 2024 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun
Compared to previous methods, SUGAR achieves state-of-the-art results in identity preservation, video dynamics, and video-text alignment for subject-driven video customization, demonstrating the effectiveness of our proposed method.
no code implementations • 2 Nov 2024 • Jian Chen, Ruiyi Zhang, Yufan Zhou, Tong Yu, Franck Dernoncourt, Jiuxiang Gu, Ryan A. Rossi, Changyou Chen, Tong Sun
In this work, we present a novel framework named LoRA-Contextualizing Adaptation of Large multimodal models (LoCAL), which broadens the capabilities of any LMM to support long-document understanding.
no code implementations • 27 Jul 2024 • Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun
Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images.
no code implementations • 17 Jun 2024 • Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang
Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image.
no code implementations • 13 Jun 2024 • Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun
Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
no code implementations • 12 Jun 2024 • Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós
While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge.
1 code implementation • CVPR 2024 • Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun
In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model.
no code implementations • 5 May 2024 • Zhendong Chu, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun
Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks.
no code implementations • 23 Apr 2024 • Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun
Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability.
no code implementations • 18 Apr 2024 • Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang
Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.
no code implementations • 29 Jan 2024 • Nahyun Kwon, Tong Sun, Yuyang Gao, Liang Zhao, Xu Wang, Jeeeun Kim, Sungsoo Ray Hong
While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help.
1 code implementation • CVPR 2024 • Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong Sun
Some existing methods do not require fine-tuning, while their performance are unsatisfactory.
no code implementations • 20 Nov 2023 • Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan
Our work aims to address this concern by introducing a novel approach to detecting adversarial prompts at a token level, leveraging the LLM's capability to predict the next token's probability.
1 code implementation • 23 Oct 2023 • Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
Safety alignment of Large Language Models (LLMs) can be compromised with manual jailbreak attacks and (automatic) adversarial attacks.
2 code implementations • 29 Jun 2023 • Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun
Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.
1 code implementation • NeurIPS 2023 • Jian Chen, Ruiyi Zhang, Tong Yu, Rohan Sharma, Zhiqiang Xu, Tong Sun, Changyou Chen
Remarkably, by incorporating conditional information from the powerful CLIP model, our method can boost the current SOTA accuracy by 10-20 absolute points in many cases.
Ranked #1 on Image Classification on Food-101N (using extra training data)
1 code implementation • 23 May 2023 • Yufan Zhou, Ruiyi Zhang, Tong Sun, Jinhui Xu
However, generating images of novel concept provided by the user input image is still a challenging task.
no code implementations • 11 Mar 2023 • Yulong Wang, Tong Sun, Shenghong Li, Xin Yuan, Wei Ni, Ekram Hossain, H. Vincent Poor
This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques, with a focus on deep neural network-based classification models.
no code implementations • 27 Nov 2022 • Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu
In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.
1 code implementation • 1 Nov 2022 • Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios
In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs).
no code implementations • 22 Apr 2022 • Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun
Document intelligence automates the extraction of information from documents and supports many business applications.
Ranked #8 on Document Layout Analysis on PubLayNet val
1 code implementation • 6 Feb 2022 • Yuyang Gao, Tong Sun, Liang Zhao, Sungsoo Hong
We propose a novel framework of Interactive Attention Alignment (IAA) that aims at realizing human-steerable Deep Neural Networks (DNNs).
no code implementations • CVPR 2022 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.
no code implementations • NeurIPS 2021 • Jiuxiang Gu, Jason Kuen, Vlad Morariu, Handong Zhao, Rajiv Jain, Nikolaos Barmpalios, Ani Nenkova, Tong Sun
Document intelligence automates the extraction of information from documents and supports many business applications.
3 code implementations • 27 Nov 2021 • Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
no code implementations • 29 Sep 2021 • Phung Lai, Hai Phan, Li Xiong, Khang Phuc Tran, My Thai, Tong Sun, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios, Rajiv Jain
In this paper, we develop BitRand, a bit-aware randomized response algorithm, to preserve local differential privacy (LDP) in federated learning (FL).
no code implementations • 8 Apr 2021 • Jia Wang, Tong Sun, Benyuan Liu, Yu Cao, Hongwei Zhu
Financial markets are a complex dynamical system.
no code implementations • 5 Apr 2021 • Jia Wang, Tong Sun, Benyuan Liu, Yu Cao, Degang Wang
Financial markets are difficult to predict due to its complex systems dynamics.
no code implementations • NAACL 2021 • Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu, Tong Sun, Xia Hu
These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample.
no code implementations • NeurIPS 2020 • Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun
Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.
no code implementations • NAACL 2021 • Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han
Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.
1 code implementation • CVPR 2020 • Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu
We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.
no code implementations • 6 Jul 2017 • Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao
Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task.
no code implementations • 19 Jun 2017 • Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, Jing Gao
Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results.