Search Results for author: Xuandong Zhao

Found 19 papers, 13 papers with code

Mapping the Increasing Use of LLMs in Scientific Papers

no code implementations • 1 Apr 2024 • Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

To address this gap, we conduct the first systematic, large-scale analysis across 950, 965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time.

Paper
Add Code

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

no code implementations • 11 Mar 2024 • Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM).

Language Modelling Large Language Model

Paper
Add Code

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

1 code implementation • 20 Feb 2024 • Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao

Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty.

Language Modelling

Paper
Code

Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models

no code implementations • 18 Feb 2024 • Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei LI, William Yang Wang

Recent studies show that self-feedback improves large language models (LLMs) on certain tasks while worsens other tasks.

Mathematical Reasoning Text Generation

Paper
Add Code

DE-COP: Detecting Copyrighted Content in Language Models Training Data

1 code implementation • 15 Feb 2024 • André V. Duarte, Xuandong Zhao, Arlindo L. Oliveira, Lei LI

We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text.

Language Modelling Multiple-choice

Paper
Code

Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs

1 code implementation • 8 Feb 2024 • Xuandong Zhao, Lei LI, Yu-Xiang Wang

In this paper, we propose a new decoding method called Permute-and-Flip (PF) decoder.

Paper
Code

Weak-to-Strong Jailbreaking on Large Language Models

1 code implementation • 30 Jan 2024 • Xuandong Zhao, Xianjun Yang, Tianyu Pang, Chao Du, Lei LI, Yu-Xiang Wang, William Yang Wang

In this paper, we propose the weak-to-strong jailbreaking attack, an efficient method to attack aligned LLMs to produce harmful text.

Paper
Code

A Survey on Detection of LLMs-Generated Content

1 code implementation • 24 Oct 2023 • Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng

The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education.

128

Paper
Code

Provable Robust Watermarking for AI-Generated Text

4 code implementations • 30 Jun 2023 • Xuandong Zhao, Prabhanjan Ananth, Lei LI, Yu-Xiang Wang

We propose a robust and high-quality watermark method, Unigram-Watermark, by extending an existing approach with a simplified fixed grouping strategy.

Language Modelling

447

Paper
Code

"Private Prediction Strikes Back!'' Private Kernelized Nearest Neighbors with Individual Renyi Filter

1 code implementation • 12 Jun 2023 • Yuqing Zhu, Xuandong Zhao, Chuan Guo, Yu-Xiang Wang

Most existing approaches of differentially private (DP) machine learning focus on private training.

Paper
Code

Invisible Image Watermarks Are Provably Removable Using Generative AI

1 code implementation • 2 Jun 2023 • Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, Lei LI

However, if we do not require the watermarked image to look the same as the original one, watermarks that keep the image semantically similar can be an alternative defense against our attack.

Image Denoising

132

Paper
Code

Protecting Language Generation Models via Invisible Watermarking

2 code implementations • 6 Feb 2023 • Xuandong Zhao, Yu-Xiang Wang, Lei LI

We can then detect the secret message by probing a suspect model to tell if it is distilled from the protected one.

Model extraction Text Generation

Paper
Code

Pre-trained Language Models Can be Fully Zero-Shot Learners

2 code implementations • 14 Dec 2022 • Xuandong Zhao, Siqi Ouyang, Zhiguo Yu, Ming Wu, Lei LI

How can we extend a pre-trained model to many language understanding tasks, without labeled or additional unlabeled data?

Retrieval text-classification +3

Paper
Code

Distillation-Resistant Watermarking for Model Protection in NLP

1 code implementation • 7 Oct 2022 • Xuandong Zhao, Lei LI, Yu-Xiang Wang

We prove that a protected model still retains the original accuracy within a certain bound.

Named Entity Recognition Named Entity Recognition (NER) +2

Paper
Code

Provably Confidential Language Modelling

1 code implementation • NAACL 2022 • Xuandong Zhao, Lei LI, Yu-Xiang Wang

Large language models are shown to memorize privacy information such as social security numbers in training data.

Language Modelling Memorization +1

Paper
Code

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

1 code implementation • Findings (ACL) 2022 • Xuandong Zhao, Zhiguo Yu, Ming Wu, Lei LI

How to learn highly compact yet effective sentence representation?

Language Modelling Retrieval +5

Paper
Code

An Optimal Reduction of TV-Denoising to Adaptive Online Learning

no code implementations • 23 Jan 2021 • Dheeraj Baby, Xuandong Zhao, Yu-Xiang Wang

We consider the problem of estimating a function from $n$ noisy samples whose discrete Total Variation (TV) is bounded by $C_n$.

Denoising Time Series +1

Paper
Add Code

A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning

no code implementations • 19 Jul 2020 • Xuandong Zhao, Jinbao Xue, Jin Yu, Xi Li, Hongxia Yang

In real-world applications, networks usually consist of billions of various types of nodes and edges with abundant attributes.

Link Prediction Network Embedding

Paper
Add Code

Predicting Alzheimer's Disease by Hierarchical Graph Convolution from Positron Emission Tomography Imaging

no code implementations • 1 Oct 2019 • Jiaming Guo, Wei Qiu, Xiang Li, Xuandong Zhao, Ning Guo, Quanzheng Li

Imaging-based early diagnosis of Alzheimer Disease (AD) has become an effective approach, especially by using nuclear medicine imaging techniques such as Positron Emission Topography (PET).

Clustering Graph Clustering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.