1 code implementation • EMNLP (BlackboxNLP) 2020 • Zhengxuan Wu, Thanh-Son Nguyen, Desmond Ong
Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks.
1 code implementation • 31 Jul 2024 • Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang
Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results.
2 code implementations • 4 Apr 2024 • Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT), and we identify an ablation of this method that trades some performance for increased efficiency.
1 code implementation • 1 Apr 2024 • Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou
To address this gap, we conduct the first systematic, large-scale analysis across 950, 965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time.
3 code implementations • 12 Mar 2024 • Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts
Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.
1 code implementation • 3 Mar 2024 • Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited.
1 code implementation • 27 Feb 2024 • Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger
Individual neurons participate in the representation of multiple high-level concepts.
1 code implementation • 23 Jan 2024 • Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman
We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".
no code implementations • 19 Sep 2023 • Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts
Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging.
2 code implementations • 24 May 2023 • Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen
The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option.
1 code implementation • NeurIPS 2023 • Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman
With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables.
1 code implementation • 24 Mar 2023 • Zhengxuan Wu, Christopher D. Manning, Christopher Potts
We argue that this concern is realized for the COGS benchmark.
1 code implementation • 5 Mar 2023 • Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman
In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations.
no code implementations • 11 Jan 2023 • Atticus Geiger, Duligur Ibeling, Amir Zur, Maheep Chaudhary, Sonakshi Chauhan, Jing Huang, Aryaman Arora, Zhengxuan Wu, Noah Goodman, Christopher Potts, Thomas Icard
Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that are faithful simplifications of the known, but opaque low-level details of black box AI models.
1 code implementation • 19 Dec 2022 • Jing Huang, Zhengxuan Wu, Kyle Mahowald, Christopher Potts
Language tasks involving character-level manipulations (e. g., spelling corrections, arithmetic operations, word games) are challenging for models operating on subword units.
1 code implementation • 28 Sep 2022 • Zhengxuan Wu, Karel D'Oosterlinck, Atticus Geiger, Amir Zur, Christopher Potts
The core of our proposal is the Causal Proxy Model (CPM).
1 code implementation • 30 Jun 2022 • Tailin Wu, Megan Tjandrasuwita, Zhengxuan Wu, Xuelin Yang, Kevin Liu, Rok Sosič, Jure Leskovec
In this work, we introduce Zero-shot Concept Recognition and Acquisition (ZeroC), a neuro-symbolic architecture that can recognize and acquire novel concepts in a zero-shot way.
1 code implementation • 27 May 2022 • Eldar David Abraham, Karel D'Oosterlinck, Amir Feder, Yair Ori Gat, Atticus Geiger, Christopher Potts, Roi Reichart, Zhengxuan Wu
We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP).
1 code implementation • 24 Feb 2022 • Zhengxuan Wu, Alex Tamkin, Isabel Papadimitriou
When we transfer a pretrained language model to a new language, there are many axes of variation that change at once.
1 code implementation • NAACL 2022 • Zhengxuan Wu, Atticus Geiger, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D. Goodman
Distillation efforts have led to language models that are more compact and efficient without serious drops in performance.
2 code implementations • 1 Dec 2021 • Atticus Geiger, Zhengxuan Wu, Hanson Lu, Josh Rozner, Elisa Kreiss, Thomas Icard, Noah D. Goodman, Christopher Potts
In IIT, we (1) align variables in a causal model (e. g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input.
3 code implementations • 18 Sep 2021 • Zhengxuan Wu, Elisa Kreiss, Desmond C. Ong, Christopher Potts
The ability to compositionally map language to referents, relations, and actions is an essential component of language understanding.
1 code implementation • RepL4NLP (ACL) 2022 • Zhengxuan Wu, Nelson F. Liu, Christopher Potts
There is growing evidence that pretrained language models improve task-specific fine-tuning not just for the languages seen in pretraining, but also for new languages and even non-linguistic data.
no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.
2 code implementations • 1 Jan 2021 • Zhengxuan Wu, Desmond C. Ong
In this paper, we adapt existing attribution methods on explaining decision makings of BERT in sequence classification tasks.
1 code implementation • ACL 2021 • Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela
We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.
1 code implementation • 15 Oct 2020 • Zhengxuan Wu, Desmond C. Ong
We train both models with pretrained BERT on two (T)ABSA datasets: SentiHood and SemEval-2014 (Task 4).
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)
1 code implementation • 10 Oct 2020 • Zhengxuan Wu, Thanh-Son Nguyen, Desmond C. Ong
Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks.
1 code implementation • SCiL 2021 • Zhengxuan Wu, Desmond C. Ong
One important task that involves grounding contextual modifiers is color generation.
2 code implementations • 22 Nov 2019 • Desmond C. Ong, Zhengxuan Wu, Tan Zhi-Xuan, Marianne Reddan, Isabella Kahhale, Alison Mattek, Jamil Zaki
We begin by assessing the state-of-the-art in time-series emotion recognition, and we review contemporary time-series approaches in affective computing, including discriminative and generative models.
no code implementations • 15 Aug 2019 • Zhengxuan Wu, Yueyi Jiang
We showed that, in the proposed emotion space, we were able to better disentangle emotions than using raw GloVe vectors alone.
1 code implementation • 8 Jul 2019 • Zhengxuan Wu, Xiyu Zhang, Tan Zhi-Xuan, Jamil Zaki, Desmond C. Ong
Attention mechanisms in deep neural networks have achieved excellent performance on sequence-prediction tasks.