Search Results for author: Tong Sun

Found 38 papers, 12 papers with code

Scalable Differential Privacy with Certified Robustness in Adversarial Learning

1 code implementation ICML 2020 Hai Phan, My T. Thai, Han Hu, Ruoming Jin, Tong Sun, Dejing Dou

In this paper, we aim to develop a scalable algorithm to preserve differential privacy (DP) in adversarial learning for deep neural networks (DNNs), with certified robustness to adversarial examples.

Numerical Pruning for Efficient Autoregressive Models

no code implementations17 Dec 2024 Xuan Shen, Zhao Song, Yufa Zhou, Bo Chen, Jing Liu, Ruiyi Zhang, Ryan A. Rossi, Hao Tan, Tong Yu, Xiang Chen, Yufan Zhou, Tong Sun, Pu Zhao, Yanzhi Wang, Jiuxiang Gu

Transformers have emerged as the leading architecture in deep learning, proving to be versatile and highly effective across diverse domains beyond language and image processing.

Decoder Image Generation

Persona-SQ: A Personalized Suggested Question Generation Framework For Real-world Documents

1 code implementation17 Dec 2024 Zihao Lin, Zichao Wang, Yuanting Pan, Varun Manjunatha, Ryan Rossi, Angela Lau, Lifu Huang, Tong Sun

Suggested questions (SQs) provide an effective initial interface for users to engage with their documents in AI-powered reading applications.

Question Generation Question-Generation

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

no code implementations13 Dec 2024 Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Nanxuan Zhao, Jing Shi, Tong Sun

Compared to previous methods, SUGAR achieves state-of-the-art results in identity preservation, video dynamics, and video-text alignment for subject-driven video customization, demonstrating the effectiveness of our proposed method.

LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

no code implementations2 Nov 2024 Jian Chen, Ruiyi Zhang, Yufan Zhou, Tong Yu, Franck Dernoncourt, Jiuxiang Gu, Ryan A. Rossi, Changyou Chen, Tong Sun

In this work, we present a novel framework named LoRA-Contextualizing Adaptation of Large multimodal models (LoCAL), which broadens the capabilities of any LMM to support long-document understanding.

document understanding Question Answering +1

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

no code implementations27 Jul 2024 Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun

Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images.

Language Modeling Language Modelling +2

ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models

no code implementations17 Jun 2024 Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang

Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image.

Disentanglement Image Generation

DocSynthv2: A Practical Autoregressive Modeling for Document Generation

no code implementations12 Jun 2024 Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós

While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge.

TRINS: Towards Multimodal Language Models that Can Read

1 code implementation CVPR 2024 Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun

In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model.

Language Modeling Language Modelling +2

Improve Temporal Awareness of LLMs for Sequential Recommendation

no code implementations5 May 2024 Zhendong Chu, Zichao Wang, Ruiyi Zhang, Yangfeng Ji, Hongning Wang, Tong Sun

Large language models (LLMs) have demonstrated impressive zero-shot abilities in solving a wide range of general-purpose tasks.

Sequential Recommendation

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

no code implementations23 Apr 2024 Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability.

Instruction Following

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

no code implementations18 Apr 2024 Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.

Segmentation

3DPFIX: Improving Remote Novices' 3D Printing Troubleshooting through Human-AI Collaboration

no code implementations29 Jan 2024 Nahyun Kwon, Tong Sun, Yuyang Gao, Liang Zhao, Xu Wang, Jeeeun Kim, Sungsoo Ray Hong

While troubleshooting plays an essential part of 3D printing, the process remains challenging for many remote novices even with the help of well-developed online sources, such as online troubleshooting archives and online community help.

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

no code implementations20 Nov 2023 Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan

Our work aims to address this concern by introducing a novel approach to detecting adversarial prompts at a token level, leveraging the LLM's capability to predict the next token's probability.

AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models

1 code implementation23 Oct 2023 Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun

Safety alignment of Large Language Models (LLMs) can be compromised with manual jailbreak attacks and (automatic) adversarial attacks.

Adversarial Attack Blocking +1

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

2 code implementations29 Jun 2023 Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun

Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.

16k Image Captioning +3

Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels

1 code implementation NeurIPS 2023 Jian Chen, Ruiyi Zhang, Tong Yu, Rohan Sharma, Zhiqiang Xu, Tong Sun, Changyou Chen

Remarkably, by incorporating conditional information from the powerful CLIP model, our method can boost the current SOTA accuracy by 10-20 absolute points in many cases.

 Ranked #1 on Image Classification on Food-101N (using extra training data)

Image Classification Retrieval

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

1 code implementation23 May 2023 Yufan Zhou, Ruiyi Zhang, Tong Sun, Jinhui Xu

However, generating images of novel concept provided by the user input image is still a challenging task.

Text-to-Image Generation

Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey

no code implementations11 Mar 2023 Yulong Wang, Tong Sun, Shenghong Li, Xin Yuan, Wei Ni, Ekram Hossain, H. Vincent Poor

This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques, with a focus on deep neural network-based classification models.

Adversarial Attack Adversarial Defense

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

no code implementations27 Nov 2022 Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu

In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.

User-Entity Differential Privacy in Learning Natural Language Models

1 code implementation1 Nov 2022 Phung Lai, NhatHai Phan, Tong Sun, Rajiv Jain, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios

In this paper, we introduce a novel concept of user-entity differential privacy (UeDP) to provide formal privacy protection simultaneously to both sensitive entities in textual data and data owners in learning natural language models (NLMs).

Aligning Eyes between Humans and Deep Neural Network through Interactive Attention Alignment

1 code implementation6 Feb 2022 Yuyang Gao, Tong Sun, Liang Zhao, Sungsoo Hong

We propose a novel framework of Interactive Attention Alignment (IAA) that aims at realizing human-steerable Deep Neural Networks (DNNs).

Gender Classification

Towards Language-Free Training for Text-to-Image Generation

no code implementations CVPR 2022 Yufan Zhou, Ruiyi Zhang, Changyou Chen, Chunyuan Li, Chris Tensmeyer, Tong Yu, Jiuxiang Gu, Jinhui Xu, Tong Sun

One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.

Zero-Shot Text-to-Image Generation

Bit-aware Randomized Response for Local Differential Privacy in Federated Learning

no code implementations29 Sep 2021 Phung Lai, Hai Phan, Li Xiong, Khang Phuc Tran, My Thai, Tong Sun, Franck Dernoncourt, Jiuxiang Gu, Nikolaos Barmpalios, Rajiv Jain

In this paper, we develop BitRand, a bit-aware randomized response algorithm, to preserve local differential privacy (LDP) in federated learning (FL).

Federated Learning Image Classification

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU Models

no code implementations NAACL 2021 Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu, Tong Sun, Xia Hu

These two observations are further employed to formulate a measurement which can quantify the shortcut degree of each training sample.

Self-Supervised Relationship Probing

no code implementations NeurIPS 2020 Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun

Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.

Contrastive Learning Language Modeling +2

Open-Domain Question Answering with Pre-Constructed Question Spaces

no code implementations NAACL 2021 Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han

Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.

Information Retrieval Knowledge Graphs +2

Cross-Domain Document Object Detection: Benchmark Suite and Method

1 code implementation CVPR 2020 Kai Li, Curtis Wigington, Chris Tensmeyer, Handong Zhao, Nikolaos Barmpalios, Vlad I. Morariu, Varun Manjunatha, Tong Sun, Yun Fu

We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.

object-detection Object Detection

Long-Term Memory Networks for Question Answering

no code implementations6 Jul 2017 Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao

Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task.

Question Answering

Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

no code implementations19 Jun 2017 Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, Jing Gao

Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results.

Cannot find the paper you are looking for? You can Submit a new open access paper.