Search Results for author: Zhengxuan Wu

Found 32 papers, 28 papers with code

Structured Self-AttentionWeights Encode Semantics in Sentiment Analysis

1 code implementation EMNLP (BlackboxNLP) 2020 Zhengxuan Wu, Thanh-Son Nguyen, Desmond Ong

Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks.

Sentiment Analysis Time Series +1

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

1 code implementation31 Jul 2024 Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results.

Instruction Following Multi-Task Learning

ReFT: Representation Finetuning for Language Models

2 code implementations4 Apr 2024 Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency.

Arithmetic Reasoning

Mapping the Increasing Use of LLMs in Scientific Papers

1 code implementation1 Apr 2024 Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

To address this gap, we conduct the first systematic, large-scale analysis across 950, 965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time.

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

3 code implementations12 Mar 2024 Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.

Model Editing

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

1 code implementation3 Mar 2024 Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited.

Hallucination TruthfulQA

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

1 code implementation23 Jan 2024 Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".

Rigorously Assessing Natural Language Explanations of Neurons

no code implementations19 Sep 2023 Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging.

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

2 code implementations24 May 2023 Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen

The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option.

knowledge editing Language Modelling +2

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

1 code implementation NeurIPS 2023 Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman

With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables.

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

1 code implementation5 Mar 2023 Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations.

Explainable artificial intelligence

Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

no code implementations11 Jan 2023 Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard

Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI models.

Explainable Artificial Intelligence (XAI)

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training

1 code implementation19 Dec 2022 Jing Huang, Zhengxuan Wu, Kyle Mahowald, Christopher Potts

Language tasks involving character-level manipulations (e. g., spelling corrections, arithmetic operations, word games) are challenging for models operating on subword units.

Spelling Correction

ZeroC: A Neuro-Symbolic Model for Zero-shot Concept Recognition and Acquisition at Inference Time

1 code implementation30 Jun 2022 Tailin Wu, Megan Tjandrasuwita, Zhengxuan Wu, Xuelin Yang, Kevin Liu, Rok Sosič, Jure Leskovec

In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way.

Novel Concepts

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

1 code implementation24 Feb 2022 Zhengxuan Wu, Alex Tamkin, Isabel Papadimitriou

When we transfer a pretrained language model to a new language, there are many axes of variation that change at once.

Cross-Lingual Transfer Language Modelling +1

Inducing Causal Structure for Interpretable Neural Networks

2 code implementations1 Dec 2021 Atticus Geiger, Zhengxuan Wu, Hanson Lu, Josh Rozner, Elisa Kreiss, Thomas Icard, Noah D. Goodman, Christopher Potts

In IIT, we (1) align variables in a causal model (e. g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input.

counterfactual Data Augmentation +1

ReaSCAN: Compositional Reasoning in Language Grounding

3 code implementations18 Sep 2021 Zhengxuan Wu, Elisa Kreiss, Desmond C. Ong, Christopher Potts

The ability to compositionally map language to referents, relations, and actions is an essential component of language understanding.

Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models

1 code implementation RepL4NLP (ACL) 2022 Zhengxuan Wu, Nelson F. Liu, Christopher Potts

There is growing evidence that pretrained language models improve task-specific fine-tuning not just for the languages seen in pretraining, but also for new languages and even non-linguistic data.

Transfer Learning

On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification

2 code implementations1 Jan 2021 Zhengxuan Wu, Desmond C. Ong

In this paper, we adapt existing attribution methods on explaining decision makings of BERT in sequence classification tasks.

General Classification Sentiment Analysis

DynaSent: A Dynamic Benchmark for Sentiment Analysis

1 code implementation ACL 2021 Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela

We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.

Sentiment Analysis

Structured Self-Attention Weights Encode Semantics in Sentiment Analysis

1 code implementation10 Oct 2020 Zhengxuan Wu, Thanh-Son Nguyen, Desmond C. Ong

Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks.

Sentiment Analysis Time Series +1

Modeling emotion in complex stories: the Stanford Emotional Narratives Dataset

2 code implementations22 Nov 2019 Desmond C. Ong, Zhengxuan Wu, Tan Zhi-Xuan, Marianne Reddan, Isabella Kahhale, Alison Mattek, Jamil Zaki

We begin by assessing the state-of-the-art in time-series emotion recognition, and we review contemporary time-series approaches in affective computing, including discriminative and generative models.

Emotion Recognition Time Series +1

Disentangling Latent Emotions of Word Embeddings on Complex Emotional Narratives

no code implementations15 Aug 2019 Zhengxuan Wu, Yueyi Jiang

We showed that, in the proposed emotion space, we were able to better disentangle emotions than using raw GloVe vectors alone.

Word Embeddings

Attending to Emotional Narratives

1 code implementation8 Jul 2019 Zhengxuan Wu, Xiyu Zhang, Tan Zhi-Xuan, Jamil Zaki, Desmond C. Ong

Attention mechanisms in deep neural networks have achieved excellent performance on sequence-prediction tasks.

Emotion Recognition Time Series +1

Cannot find the paper you are looking for? You can Submit a new open access paper.