Search Results for author: Mingda Chen

Found 22 papers, 15 papers with code

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

no code implementations • 2 Oct 2023 • Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih

Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build.

Few-Shot Learning Open-Domain Question Answering +1

Paper
Add Code

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

1 code implementation • 22 Jun 2023 • Mingda Chen, Kevin Heffernan, Onur Çelebi, Alex Mourachko, Holger Schwenk

In comparison to xSIM, we show that xSIM++ is better correlated with the downstream BLEU scores of translation systems trained on mined bitexts, providing a reliable proxy of bitext mining performance without needing to run expensive bitext mining pipelines.

NMT

3,515

Paper
Code

Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering

no code implementations • 23 May 2023 • Mingda Chen, Xilun Chen, Wen-tau Yih

Few-shot learning for open domain multi-hop question answering typically relies on the incontext learning capability of large language models (LLMs).

Fact Verification Few-Shot Learning +2

Paper
Add Code

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

1 code implementation • 16 Dec 2022 • Mingda Chen, Paul-Ambroise Duquenne, Pierre Andrews, Justine Kao, Alexandre Mourachko, Holger Schwenk, Marta R. Costa-jussà

In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

238

Paper
Code

Leveraging Natural Supervision for Language Representation Learning and Generation

1 code implementation • 21 Jul 2022 • Mingda Chen

In this thesis, we describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision.

Data-to-Text Generation Language Modelling +4

Paper
Code

Improving In-Context Few-Shot Learning via Self-Supervised Training

no code implementations • NAACL 2022 • Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, Zornitsa Kozareva

Self-supervised pretraining has made few-shot learning possible for many NLP tasks.

Few-Shot Learning

Paper
Add Code

TVStoryGen: A Dataset for Generating Stories with Character Descriptions

1 code implementation • 18 Sep 2021 • Mingda Chen, Kevin Gimpel

We introduce TVStoryGen, a story generation dataset that requires generating detailed TV show episode recaps from a brief summary and a set of documents describing the characters involved.

Ranked #1 on Story Generation on Fandom dev

Abstractive Text Summarization Story Generation

Paper
Code

SummScreen: A Dataset for Abstractive Screenplay Summarization

1 code implementation • ACL 2022 • Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel

Since characters are fundamental to TV series, we also propose two entity-centric evaluation metrics.

Abstractive Text Summarization

Paper
Code

WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

1 code implementation • Findings (ACL) 2021 • Mingda Chen, Sam Wiseman, Kevin Gimpel

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation.

Data-to-Text Generation Sentence

Paper
Code

Exemplar-Controllable Paraphrasing and Translation using Bitext

1 code implementation • 12 Oct 2020 • Mingda Chen, Sam Wiseman, Kevin Gimpel

Our experimental results show that our models achieve competitive results on controlled paraphrase generation and strong performance on controlled machine translation.

Machine Translation Paraphrase Generation +1

Paper
Code

Mining Knowledge for Natural Language Inference from Wikipedia Categories

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Mingda Chen, Zewei Chu, Karl Stratos, Kevin Gimpel

Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations.

Lexical Entailment Natural Language Inference

Paper
Code

A novel random access scheme for M2M communication in crowded asynchronous massive MIMO systems

no code implementations • 13 Jul 2020 • Huimei Han, Wenchao Zhai, Zhefu Wu, Ying Li, Jun Zhao, Mingda Chen

Simulation results show that, compared to the exiting random access scheme for the crowded asynchronous massive MIMO systems, the proposed scheme can improve the uplink throughput and estimate the effective timing offsets accurately at the same time.

Paper
Add Code

Learning Probabilistic Sentence Representations from Paraphrases

no code implementations • WS 2020 • Mingda Chen, Kevin Gimpel

Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences.

Sentence Specificity +1

Paper
Add Code

How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

1 code implementation • 21 Nov 2019 • Zewei Chu, Mingda Chen, Jing Chen, Miaosen Wang, Kevin Gimpel, Manaal Faruqui, Xiance Si

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one.

Question Rewriting

Paper
Code

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

48 code implementations • ICLR 2020 • Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

Ranked #1 on Natural Language Inference on QNLI

Common Sense Reasoning Linguistic Acceptability +7

124,353

Paper
Code

EntEval: A Holistic Evaluation Benchmark for Entity Representations

2 code implementations • IJCNLP 2019 • Mingda Chen, Zewei Chu, Yang Chen, Karl Stratos, Kevin Gimpel

Rich entity representations are useful for a wide class of problems involving entities.

Entity Disambiguation Entity Typing

Paper
Code

Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations

2 code implementations • IJCNLP 2019 • Mingda Chen, Zewei Chu, Kevin Gimpel

Prior work on pretrained sentence embeddings and benchmarks focus on the capabilities of stand-alone sentences.

Sentence Sentence Embeddings

Paper
Code

Variational Sequential Labelers for Semi-Supervised Learning

1 code implementation • EMNLP 2018 • Mingda Chen, Qingming Tang, Karen Livescu, Kevin Gimpel

Our model family consists of a latent-variable generative model and a discriminative labeler.

Ranked #72 on Named Entity Recognition (NER) on CoNLL 2003 (English)

Learning Word Embeddings

Paper
Code

Smaller Text Classifiers with Discriminative Cluster Embeddings

1 code implementation • NAACL 2018 • Mingda Chen, Kevin Gimpel

Word embedding parameters often dominate overall model sizes in neural methods for natural language processing.

Clustering

Paper
Code

Controllable Paraphrase Generation with a Syntactic Exemplar

no code implementations • ACL 2019 • Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel

Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori.

Attribute Paraphrase Generation +2

Paper
Add Code

Variational recurrent models for representation learning

no code implementations • ICLR 2019 • Qingming Tang, Mingda Chen, Weiran Wang, Karen Livescu

Existing variational recurrent models typically use stochastic recurrent connections to model the dependence among neighboring latent variables, while generation assumes independence of generated data per time step given the latent sequence.

MULTI-VIEW LEARNING Representation Learning

Paper
Add Code

A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations

1 code implementation • NAACL 2019 • Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel

We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics.

Disentanglement Semantic Similarity +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.