Search Results for author: Yuji Zhang

Found 13 papers, 5 papers with code

#HowYouTagTweets: Learning User Hashtagging Preferences via Personalized Topic Attention

1 code implementation EMNLP 2021 Yuji Zhang, Yubo Zhang, Chunpu Xu, Jing Li, Ziyan Jiang, Baolin Peng

It is hypothesized that one’s interests in a hashtag are related with what they said before (user history) and the existing posts present the hashtag (hashtag contexts).

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

no code implementations22 Feb 2025 Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi R. Fung, Kathleen McKeown, ChengXiang Zhai, Manling Li, Heng Ji

To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details.

Hallucination Text Generation

Internal Activation as the Polar Star for Steering Unsafe LLM Behavior

no code implementations3 Feb 2025 Peixuan Han, Cheng Qian, Xiusi Chen, Yuji Zhang, Denghui Zhang, Heng Ji

Large language models (LLMs) have demonstrated exceptional capabilities across a wide range of tasks but also pose significant risks due to their potential to generate harmful content.

Safety Alignment

EscapeBench: Pushing Language Models to Think Outside the Box

1 code implementation18 Dec 2024 Cheng Qian, Peixuan Han, Qinyu Luo, Bingxiang He, Xiusi Chen, Yuji Zhang, Hongyi Du, Jiarui Yao, Xiaocheng Yang, Denghui Zhang, Yunzhu Li, Heng Ji

Language model agents excel in long-session planning and reasoning, but existing benchmarks primarily focus on goal-oriented tasks with explicit objectives, neglecting creative adaptation in unfamiliar environments.

Language Modeling Language Modelling

Integrative Decoding: Improve Factuality via Implicit Self-consistency

1 code implementation2 Oct 2024 Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models.

TruthfulQA

A Survey on the Honesty of Large Language Models

2 code implementations27 Sep 2024 Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam

Honesty is a fundamental principle for aligning large language models (LLMs) with human values, requiring these models to recognize what they know and don't know and be able to faithfully express their knowledge.

Survey

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

no code implementations10 Jul 2024 Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, Jing Li, Manling Li, Heng Ji

This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes. From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns).

Hallucination Language Modeling +1

EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries

no code implementations17 Feb 2024 Jiateng Liu, Pengfei Yu, Yuji Zhang, Sha Li, Zixuan Zhang, Heng Ji

The dynamic nature of real-world information necessitates efficient knowledge editing (KE) in large language models (LLMs) for knowledge updating.

knowledge editing

VIBE: Topic-Driven Temporal Adaptation for Twitter Classification

no code implementations16 Oct 2023 Yuji Zhang, Jing Li, Wenjie Li

Language features are evolving in real-world social media, resulting in the deteriorating performance of text classification in dynamics.

text-classification Text Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.