Search Results for author: Fan Yin

Found 14 papers, 13 papers with code

Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

no code implementations28 Feb 2024 Fan Yin, Jayanth Srinivasa, Kai-Wei Chang

We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs), which serves as a crucial step in building trust between humans and LLMs.

Language Modelling Large Language Model +1

Contrastive Instruction Tuning

1 code implementation17 Feb 2024 Tianyi Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks.

Sentence

On Prompt-Driven Safeguarding for Large Language Models

1 code implementation31 Jan 2024 Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) from complying with queries that contain harmful intents.

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

1 code implementation1 Nov 2023 Po-Nien Kung, Fan Yin, Di wu, Kai-Wei Chang, Nanyun Peng

Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.

Informativeness Out-of-Distribution Generalization +1

Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

1 code implementation1 Jun 2023 Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks.

Red Teaming Language Model Detectors with Language Models

2 code implementations31 May 2023 Zhouxing Shi, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang, Cho-Jui Hsieh

The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.

Adversarial Robustness Language Modelling +2

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

1 code implementation23 May 2023 Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.

Continual Learning

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

1 code implementation ICCV 2023 Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang

Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data.

Backdoor Attack Contrastive Learning +1

ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation

1 code implementation22 Oct 2022 Fan Yin, Yao Li, Cho-Jui Hsieh, Kai-Wei Chang

Finally, our analysis shows that the two types of uncertainty provided by \textbf{ADDMU} can be leveraged to characterize adversarial examples and identify the ones that contribute most to model's robustness in adversarial training.

On the Sensitivity and Stability of Model Interpretations in NLP

1 code implementation ACL 2022 Fan Yin, Zhouxing Shi, Cho-Jui Hsieh, Kai-Wei Chang

We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria.

Adversarial Robustness Dependency Parsing +2

On the Robustness of Language Encoders against Grammatical Errors

1 code implementation ACL 2020 Fan Yin, Quanyu Long, Tao Meng, Kai-Wei Chang

We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors.

Cloze Test Linguistic Acceptability +1

Glyce: Glyph-vectors for Chinese Character Representations

2 code implementations NeurIPS 2019 Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li

However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.

Chinese Dependency Parsing Chinese Named Entity Recognition +21

Cannot find the paper you are looking for? You can Submit a new open access paper.