Search Results for author: Chunting Zhou

Found 31 papers, 21 papers with code

Mega: Moving Average Equipped Gated Attention

5 code implementations • 21 Sep 2022 • Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.

Ranked #1 on Long-range modeling on LRA

Image Classification Inductive Bias +3

124,793

Paper
Code

LIMA: Less Is More for Alignment

5 code implementations • NeurIPS 2023 • Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.

Language Modelling reinforcement-learning

2,486

Paper
Code

FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

3 code implementations • 25 Jul 2023 • I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, PengFei Liu

With the above challenges in mind, in this paper, we propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models (e. g., ChatGPT).

Code Generation Fact Checking +1

762

Paper
Code

Towards a Unified View of Parameter-Efficient Transfer Learning

1 code implementation • ICLR 2022 • Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.

Machine Translation text-classification +3

486

Paper
Code

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

7 code implementations • ACL 2018 • Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig

Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures.

Code Generation Semantic Parsing

461

Paper
Code

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation • 12 Apr 2024 • Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

290

Paper
Code

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

2 code implementations • IJCNLP 2019 • Xuezhe Ma, Chunting Zhou, Xi-An Li, Graham Neubig, Eduard Hovy

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens.

Ranked #3 on Machine Translation on WMT2016 English-Romanian

Machine Translation NMT +1

244

Paper
Code

A C-LSTM Neural Network for Text Classification

10 code implementations • 27 Nov 2015 • Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C. M. Lau

In this work, we combine the strengths of both architectures and propose a novel and unified model called C-LSTM for sentence representation and text classification.

Ranked #10 on Text Classification on TREC-6

General Classification Sentence +3

189

Paper
Code

Self-Alignment with Instruction Backtranslation

2 code implementations • 11 Aug 2023 • Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.

Instruction Following Language Modelling

145

Paper
Code

Luna: Linear Unified Nested Attention

2 code implementations • NeurIPS 2021 • Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer

Specifically, with the first attention function, Luna packs the input sequence into a sequence of fixed length.

Language Modelling Machine Translation +2

102

Paper
Code

Density Matching for Bilingual Word Embedding

1 code implementation • NAACL 2019 • Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages.

Bilingual Lexicon Induction Word Embeddings +1

Paper
Code

Detecting Hallucinated Content in Conditional Neural Sequence Generation

2 code implementations • Findings (ACL) 2021 • Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, Marjan Ghazvininejad

Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the input.

Abstractive Text Summarization Hallucination +1

Paper
Code

Prompt Consistency for Zero-Shot Task Generalization

1 code implementation • 29 Apr 2022 • Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting.

Paper
Code

Training Trajectories of Language Models Across Scales

1 code implementation • 19 Dec 2022 • Mengzhou Xia, Mikel Artetxe, Chunting Zhou, Xi Victoria Lin, Ramakanth Pasunuru, Danqi Chen, Luke Zettlemoyer, Ves Stoyanov

Why do larger language models demonstrate more desirable behaviors?

In-Context Learning Multiple-choice

Paper
Code

Examining and Combating Spurious Features under Distribution Shift

1 code implementation • 14 Jun 2021 • Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig

Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the worst-case training loss over a set of pre-defined groups.

Paper
Code

Distributionally Robust Multilingual Machine Translation

1 code implementation • EMNLP 2021 • Chunting Zhou, Daniel Levy, Xian Li, Marjan Ghazvininejad, Graham Neubig

Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models.

Machine Translation Translation

Paper
Code

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

1 code implementation • EMNLP 2018 • Aditi Chaudhary, Chunting Zhou, Lori Levin, Graham Neubig, David R. Mortensen, Jaime G. Carbonell

Much work in Natural Language Processing (NLP) has been for resource-rich languages, making generalization to new, less-resourced languages challenging.

Avg Machine Translation +6

Paper
Code

Handling Syntactic Divergence in Low-resource Machine Translation

1 code implementation • IJCNLP 2019 • Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig

Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs.

Data Augmentation Machine Translation +2

Paper
Code

Multi-Dimensional Evaluation of Text Summarization with In-Context Learning

1 code implementation • 1 Jun 2023 • Sameer Jain, Vaishakh Keshava, Swarnashree Mysore Sathyendra, Patrick Fernandes, PengFei Liu, Graham Neubig, Chunting Zhou

Most frameworks that perform such multi-dimensional evaluation require training on large manually or synthetically generated datasets.

In-Context Learning Text Generation +1

Paper
Code

In-context Examples Selection for Machine Translation

1 code implementation • 5 Dec 2022 • Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad

Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model.

In-Context Learning Language Modelling +2

Paper
Code

Look-back Decoding for Open-Ended Text Generation

1 code implementation • 22 May 2023 • Nan Xu, Chunting Zhou, Asli Celikyilmaz, Xuezhe Ma

Given a prefix (context), open-ended generation aims to decode texts that are coherent, which do not abruptly drift from previous topics, and informative, which do not suffer from undesired repetitions.

Story Generation

Paper
Code

Multi-space Variational Encoder-Decoders for Semi-supervised Labeled Sequence Transduction

no code implementations • ACL 2017 • Chunting Zhou, Graham Neubig

Labeled sequence transduction is a task of transforming one sequence into another sequence that satisfies desiderata specified by a set of labels.

Morphological Inflection

Paper
Add Code

Category Enhanced Word Embedding

no code implementations • 27 Nov 2015 • Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C. M. Lau

In this paper, we incorporate category information of documents in the learning of word representations and to learn the proposed models in a document-wise manner.

General Classification Representation Learning +4

Paper
Add Code

Morphological Inflection Generation with Multi-space Variational Encoder-Decoders

no code implementations • CONLL 2017 • Chunting Zhou, Graham Neubig

Information Retrieval Machine Translation +1

Paper
Add Code

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

no code implementations • ICLR 2019 • Xuezhe Ma, Chunting Zhou, Eduard Hovy

Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.

Density Estimation Image Generation +1

Paper
Add Code

The ARIEL-CMU Systems for LoReHLT18

no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Paper
Add Code

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

no code implementations • ICLR 2020 • Chunting Zhou, Graham Neubig, Jiatao Gu

We find that knowledge distillation can reduce the complexity of data sets and help NAT to model the variations in the output data.

Knowledge Distillation Machine Translation +1

Paper
Add Code

Learning Structures for Deep Neural Networks

no code implementations • 27 May 2021 • Jinhui Yuan, Fei Pan, Chunting Zhou, Tao Qin, Tie-Yan Liu

We further establish connections between this principle and the theory of Bayesian optimal classification, and empirically verify that larger entropy of the outputs of a deep neural network indeed corresponds to a better classification accuracy.

Classification Image Classification

Paper
Add Code

In-Context Pretraining: Language Modeling Beyond Document Boundaries

no code implementations • 16 Oct 2023 • Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion.

In-Context Learning Language Modelling +1

Paper
Add Code

MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

no code implementations • 13 Nov 2023 • Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao

Specifically, an adversarial LLM and a target LLM interplay with each other in an iterative manner, where the adversarial LLM aims to generate challenging prompts that elicit unsafe responses from the target LLM, while the target LLM is fine-tuned with safety aligned data on these adversarial prompts.

Instruction Following Response Generation

Paper
Add Code

Instruction-tuned Language Models are Better Knowledge Learners

no code implementations • 20 Feb 2024 • Zhengbao Jiang, Zhiqing Sun, Weijia Shi, Pedro Rodriguez, Chunting Zhou, Graham Neubig, Xi Victoria Lin, Wen-tau Yih, Srinivasan Iyer

The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.