Search Results for author: Mingda Chen

Found 22 papers, 15 papers with code

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

no code implementations2 Oct 2023 Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih

Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build.

Few-Shot Learning Open-Domain Question Answering +1

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

1 code implementation22 Jun 2023 Mingda Chen, Kevin Heffernan, Onur Çelebi, Alex Mourachko, Holger Schwenk

In comparison to xSIM, we show that xSIM++ is better correlated with the downstream BLEU scores of translation systems trained on mined bitexts, providing a reliable proxy of bitext mining performance without needing to run expensive bitext mining pipelines.

NMT

Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering

no code implementations23 May 2023 Mingda Chen, Xilun Chen, Wen-tau Yih

Few-shot learning for open domain multi-hop question answering typically relies on the incontext learning capability of large language models (LLMs).

Fact Verification Few-Shot Learning +2

Leveraging Natural Supervision for Language Representation Learning and Generation

1 code implementation21 Jul 2022 Mingda Chen

In this thesis, we describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision.

Data-to-Text Generation Language Modelling +4

TVStoryGen: A Dataset for Generating Stories with Character Descriptions

1 code implementation18 Sep 2021 Mingda Chen, Kevin Gimpel

We introduce TVStoryGen, a story generation dataset that requires generating detailed TV show episode recaps from a brief summary and a set of documents describing the characters involved.

Abstractive Text Summarization Story Generation

WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

1 code implementation Findings (ACL) 2021 Mingda Chen, Sam Wiseman, Kevin Gimpel

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation.

Data-to-Text Generation Sentence

Exemplar-Controllable Paraphrasing and Translation using Bitext

1 code implementation12 Oct 2020 Mingda Chen, Sam Wiseman, Kevin Gimpel

Our experimental results show that our models achieve competitive results on controlled paraphrase generation and strong performance on controlled machine translation.

Machine Translation Paraphrase Generation +1

A novel random access scheme for M2M communication in crowded asynchronous massive MIMO systems

no code implementations13 Jul 2020 Huimei Han, Wenchao Zhai, Zhefu Wu, Ying Li, Jun Zhao, Mingda Chen

Simulation results show that, compared to the exiting random access scheme for the crowded asynchronous massive MIMO systems, the proposed scheme can improve the uplink throughput and estimate the effective timing offsets accurately at the same time.

Learning Probabilistic Sentence Representations from Paraphrases

no code implementations WS 2020 Mingda Chen, Kevin Gimpel

Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences.

Sentence Specificity +1

How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

1 code implementation21 Nov 2019 Zewei Chu, Mingda Chen, Jing Chen, Miaosen Wang, Kevin Gimpel, Manaal Faruqui, Xiance Si

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one.

Question Rewriting

Smaller Text Classifiers with Discriminative Cluster Embeddings

1 code implementation NAACL 2018 Mingda Chen, Kevin Gimpel

Word embedding parameters often dominate overall model sizes in neural methods for natural language processing.

Clustering

Controllable Paraphrase Generation with a Syntactic Exemplar

no code implementations ACL 2019 Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel

Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori.

Attribute Paraphrase Generation +2

Variational recurrent models for representation learning

no code implementations ICLR 2019 Qingming Tang, Mingda Chen, Weiran Wang, Karen Livescu

Existing variational recurrent models typically use stochastic recurrent connections to model the dependence among neighboring latent variables, while generation assumes independence of generated data per time step given the latent sequence.

MULTI-VIEW LEARNING Representation Learning

A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations

1 code implementation NAACL 2019 Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel

We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics.

Disentanglement Semantic Similarity +2

Cannot find the paper you are looking for? You can Submit a new open access paper.