Search Results for author: Xuezhe Ma

Found 53 papers, 31 papers with code

AESOP: Paraphrase Generation with Adaptive Syntactic Control

1 code implementation EMNLP 2021 Jiao Sun, Xuezhe Ma, Nanyun Peng

We propose to control paraphrase generation through carefully chosen target syntactic structures to generate more proper and higher quality paraphrases.

Data Augmentation Language Modelling +2

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation12 Apr 2024 Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

Evaluating Large Language Models on Controlled Generation Tasks

1 code implementation23 Oct 2023 Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma

While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks.

Question Generation Question-Generation +2

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

1 code implementation5 Oct 2023 Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

no code implementations4 Oct 2023 Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information.

RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation

1 code implementation12 Jun 2023 Shuai Liu, Hyundong J. Cho, Marjorie Freedman, Xuezhe Ma, Jonathan May

Endowing chatbots with a consistent persona is essential to an engaging conversation, yet it remains an unresolved challenge.

Response Generation Retrieval

Challenges in Context-Aware Neural Machine Translation

1 code implementation23 May 2023 Linghao Jin, Jacqueline He, Jonathan May, Xuezhe Ma

Context-aware neural machine translation involves leveraging information beyond sentence-level context to resolve inter-sentential discourse dependencies and improve document-level translation quality, and has given rise to a number of recent techniques.

Machine Translation Sentence +1

Look-back Decoding for Open-Ended Text Generation

1 code implementation22 May 2023 Nan Xu, Chunting Zhou, Asli Celikyilmaz, Xuezhe Ma

Given a prefix (context), open-ended generation aims to decode texts that are coherent, which do not abruptly drift from previous topics, and informative, which do not suffer from undesired repetitions.

Story Generation

LIMA: Less Is More for Alignment

5 code implementations NeurIPS 2023 Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.

Language Modelling reinforcement-learning

On Human Visual Contrast Sensitivity and Machine Vision Robustness: A Comparative Study

no code implementations16 Dec 2022 Ming-Chang Chiu, Yingfei Wang, Derrick Eui Gyu Kim, Pin-Yu Chen, Xuezhe Ma

It is well established in neuroscience that color vision plays an essential part in the human visual perception system.

Data Augmentation

Better May Not Be Fairer: A Study on Subgroup Discrepancy in Image Classification

1 code implementation ICCV 2023 Ming-Chang Chiu, Pin-Yu Chen, Xuezhe Ma

In this paper, we provide 20, 000 non-trivial human annotations on popular datasets as a first step to bridge gap to studying how natural semantic spurious features affect image classification, as prior works often study datasets mixing low-level features due to limitations in accessing realistic datasets.

Data Augmentation Image Classification

Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping

1 code implementation19 Oct 2022 Chenghao Yang, Xuezhe Ma

Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks for practical applications.

Mega: Moving Average Equipped Gated Attention

5 code implementations21 Sep 2022 Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.

Image Classification Inductive Bias +3

Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning

no code implementations25 May 2022 Mozhdeh Gheini, Xuezhe Ma, Jonathan May

A recent family of techniques, dubbed lightweight fine-tuning methods, facilitates parameter-efficient transfer learning by updating only a small set of additional parameters while keeping the parameters of the pretrained language model frozen.

Cross-Lingual NER Language Modelling +3

Investigating the Benefits of Free-Form Rationales

no code implementations25 May 2022 Jiao Sun, Swabha Swayamdipta, Jonathan May, Xuezhe Ma

After controlling for instances where rationales leak the correct answer while not providing additional background knowledge, we find that incorporating only 5% of rationales during training can boost model performance by 47. 22% for CoS-E and 57. 14% for ECQA during inference.

Prompt Consistency for Zero-Shot Task Generalization

1 code implementation29 Apr 2022 Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

One of the most impressive results of recent NLP history is the ability of pre-trained language models to solve new tasks in a zero-shot setting.

Learning Representations Robust to Group Shifts and Adversarial Examples

no code implementations18 Feb 2022 Ming-Chang Chiu, Xuezhe Ma

Despite the high performance achieved by deep neural networks on various tasks, extensive studies have demonstrated that small tweaks in the input could fail the model predictions.

Representation Learning

Towards a Unified View of Parameter-Efficient Transfer Learning

1 code implementation ICLR 2022 Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.

Machine Translation text-classification +3

Examining and Combating Spurious Features under Distribution Shift

1 code implementation14 Jun 2021 Chunting Zhou, Xuezhe Ma, Paul Michel, Graham Neubig

Group distributionally robust optimization (DRO) provides an effective tool to alleviate covariate shift by minimizing the worst-case training loss over a set of pre-defined groups.

COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

1 code implementation Findings (ACL) 2021 Shikhar Singh, Nuan Wen, Yu Hou, Pegah Alipoormolabashi, Te-Lin Wu, Xuezhe Ma, Nanyun Peng

To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs.

4k Sentence

Personalized Response Generation via Generative Split Memory Network

1 code implementation NAACL 2021 Yuwei Wu, Xuezhe Ma, Diyi Yang

Despite the impressive successes of generation and dialogue systems, how to endow a text generation system with particular personality traits to deliver more personalized responses remains under-investigated.

Response Generation Text Generation

Apollo: An Adaptive Parameter-wised Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

no code implementations1 Jan 2021 Xuezhe Ma

In this paper, we introduce Apollo, a quasi-newton method for noncovex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.

Stochastic Optimization

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

3 code implementations28 Sep 2020 Xuezhe Ma

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.

Stochastic Optimization

A Two-Step Approach for Implicit Event Argument Detection

no code implementations ACL 2020 Zhisong Zhang, Xiang Kong, Zhengzhong Liu, Xuezhe Ma, Eduard Hovy

It remains a challenge to detect implicit arguments, calling for more future work of document-level modeling for this task.

Sentence Vocal Bursts Valence Prediction

Decoupling Global and Local Representations via Invertible Generative Flows

1 code implementation ICLR 2021 Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy

In this work, we propose a new generative model that is capable of automatically decoupling global and local representations of images in an entirely unsupervised setting, by embedding a generative flow in the VAE framework to model the decoder.

Density Estimation Image Generation +2

Cross-lingual Dependency Parsing with Unlabeled Auxiliary Languages

1 code implementation CONLL 2019 Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, Nanyun Peng

We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.

Cross-Lingual Transfer Dependency Parsing +2

Handling Syntactic Divergence in Low-resource Machine Translation

1 code implementation IJCNLP 2019 Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig

Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs.

Data Augmentation Machine Translation +2

An Empirical Investigation of Structured Output Modeling for Graph-based Neural Dependency Parsing

1 code implementation ACL 2019 Zhisong Zhang, Xuezhe Ma, Eduard Hovy

In this paper, we investigate the aspect of structured output modeling for the state-of-the-art graph-based neural dependency parser (Dozat and Manning, 2017).

Dependency Parsing Sentence

Choosing Transfer Languages for Cross-Lingual Learning

1 code implementation ACL 2019 Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig

Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages.

Cross-Lingual Transfer

Density Matching for Bilingual Word Embedding

1 code implementation NAACL 2019 Chunting Zhou, Xuezhe Ma, Di Wang, Graham Neubig

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages.

Bilingual Lexicon Induction Word Embeddings +1

MaCow: Masked Convolutional Generative Flow

2 code implementations NeurIPS 2019 Xuezhe Ma, Xiang Kong, Shanghang Zhang, Eduard Hovy

Flow-based generative models, conceptually attractive due to tractability of both the exact log-likelihood computation and latent-variable inference, and efficiency of both training and sampling, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.

Computational Efficiency Density Estimation +1

MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders

no code implementations ICLR 2019 Xuezhe Ma, Chunting Zhou, Eduard Hovy

Variational Autoencoder (VAE), a simple and effective deep generative model, has led to a number of impressive empirical successes and spawned many advanced variants and theoretical investigations.

Density Estimation Image Generation +1

Stack-Pointer Networks for Dependency Parsing

3 code implementations ACL 2018 Xuezhe Ma, Zecong Hu, Jingzhou Liu, Nanyun Peng, Graham Neubig, Eduard Hovy

Combining pointer networks~\citep{vinyals2015pointer} with an internal stack, the proposed model first reads and encodes the whole sentence, then builds the dependency tree top-down (from root-to-leaf) in a depth-first fashion.

Dependency Parsing Sentence

STCP: Simplified-Traditional Chinese Conversion and Proofreading

no code implementations IJCNLP 2017 Jiarui Xu, Xuezhe Ma, Chen-Tse Tsai, Eduard Hovy

This paper aims to provide an effective tool for conversion between Simplified Chinese and Traditional Chinese.

Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

no code implementations ICLR 2018 Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy

Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes.

Dependency Parsing Image Captioning +6

An Interpretable Knowledge Transfer Model for Knowledge Base Completion

no code implementations ACL 2017 Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy

Knowledge bases are important resources for a variety of natural language processing tasks but suffer from incompleteness.

Knowledge Base Completion Transfer Learning

Neural Probabilistic Model for Non-projective MST Parsing

no code implementations IJCNLP 2017 Xuezhe Ma, Eduard Hovy

In this paper, we propose a probabilistic parsing model, which defines a proper conditional probability distribution over non-projective dependency trees for a given sentence, using neural representations as inputs.

Sentence

Dropout with Expectation-linear Regularization

no code implementations26 Sep 2016 Xuezhe Ma, Yingkai Gao, Zhiting Hu, Yao-Liang Yu, Yuntian Deng, Eduard Hovy

Algorithmically, we show that our proposed measure of the inference gap can be used to regularize the standard dropout training objective, resulting in an \emph{explicit} control of the gap.

Image Classification

Harnessing Deep Neural Networks with Logic Rules

2 code implementations ACL 2016 Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric Xing

Combining deep neural networks with structured logic rules is desirable to harness flexibility and reduce uninterpretability of the neural models.

named-entity-recognition Named Entity Recognition +2

Unsupervised Ranking Model for Entity Coreference Resolution

no code implementations NAACL 2016 Xuezhe Ma, Zhengzhong Liu, Eduard Hovy

Coreference resolution is one of the first stages in deep language understanding and its importance has been well recognized in the natural language processing community.

coreference-resolution

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

25 code implementations ACL 2016 Xuezhe Ma, Eduard Hovy

State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing.

Feature Engineering Named Entity Recognition +3

Probabilistic Models for High-Order Projective Dependency Parsing

no code implementations14 Feb 2015 Xuezhe Ma, Hai Zhao

This paper presents generalized probabilistic models for high-order projective dependency parsing and an algorithmic framework for learning these statistical models involving dependency trees.

Dependency Parsing Vocal Bursts Intensity Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.