Search Results for author: Fan Yin

Found 18 papers, 14 papers with code

Evaluating Human Alignment and Model Faithfulness of LLM Rationale

no code implementations28 Jun 2024 Mohsen Fayyaz, Fan Yin, Jiao Sun, Nanyun Peng

We study how well large language models (LLMs) explain their generations with rationales -- a set of tokens extracted from the input texts that reflect the decision process of LLMs.

Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

no code implementations19 Jun 2024 Di wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks.

Retrieval Uncertainty Quantification

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

1 code implementation4 Jun 2024 Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations.

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

no code implementations30 May 2024 Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data and append its self-generated image descriptions to the prompts.

Image Comprehension Visual Question Answering

Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

no code implementations28 Feb 2024 Fan Yin, Jayanth Srinivasa, Kai-Wei Chang

We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs), which serves as a crucial step in building trust between humans and LLMs.

Language Modelling Large Language Model +1

Contrastive Instruction Tuning

1 code implementation17 Feb 2024 Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks.

Sentence

On Prompt-Driven Safeguarding for Large Language Models

2 code implementations31 Jan 2024 Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

In this work, we investigate how LLMs' behavior (i. e., complying with or refusing user queries) is affected by safety prompts from the perspective of model representation.

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

1 code implementation1 Nov 2023 Po-Nien Kung, Fan Yin, Di wu, Kai-Wei Chang, Nanyun Peng

Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.

Informativeness Out-of-Distribution Generalization +1

Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

1 code implementation1 Jun 2023 Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks.

Red Teaming Language Model Detectors with Language Models

2 code implementations31 May 2023 Zhouxing Shi, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang, Cho-Jui Hsieh

The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.

Adversarial Robustness Language Modelling +2

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

1 code implementation23 May 2023 Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.

Continual Learning

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

1 code implementation ICCV 2023 Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang

Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data.

Backdoor Attack Contrastive Learning +1

ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation

1 code implementation22 Oct 2022 Fan Yin, Yao Li, Cho-Jui Hsieh, Kai-Wei Chang

Finally, our analysis shows that the two types of uncertainty provided by \textbf{ADDMU} can be leveraged to characterize adversarial examples and identify the ones that contribute most to model's robustness in adversarial training.

On the Sensitivity and Stability of Model Interpretations in NLP

1 code implementation ACL 2022 Fan Yin, Zhouxing Shi, Cho-Jui Hsieh, Kai-Wei Chang

We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria.

Adversarial Robustness Dependency Parsing +2

On the Robustness of Language Encoders against Grammatical Errors

1 code implementation ACL 2020 Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang

We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors.

Cloze Test Linguistic Acceptability +1

Glyce: Glyph-vectors for Chinese Character Representations

2 code implementations NeurIPS 2019 Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li

However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.

Chinese Dependency Parsing Chinese Named Entity Recognition +21

Cannot find the paper you are looking for? You can Submit a new open access paper.