Search Results for author: Alex Wang

Found 29 papers, 14 papers with code

Overview of the SustaiNLP 2020 Shared Task

no code implementations • EMNLP (sustainlp) 2020 • Alex Wang, Thomas Wolf

We describe the SustaiNLP 2020 shared task: efficient inference on the SuperGLUE benchmark (Wang et al., 2019).

Paper
Add Code

OpenChemIE: An Information Extraction Toolkit For Chemistry Literature

no code implementations • 1 Apr 2024 • Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, Regina Barzilay

Our machine learning models attain state-of-the-art performance when evaluated individually, and we meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole, achieving an F1 score of 69. 5%.

Paper
Add Code

Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning

1 code implementation • 19 Jan 2024 • Adib Hasan, Ileana Rugina, Alex Wang

Large Language Models (LLMs) are vulnerable to `Jailbreaking' prompts, a type of attack that can coax these models into generating harmful and illegal content.

Paper
Code

JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation

1 code implementation • 29 Oct 2023 • Yao Yao, Peike Li, BoYu Chen, Alex Wang

With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch.

Music Generation

Paper
Code

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

no code implementations • 8 Sep 2023 • Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker

In this work, we take a wider view and explore scalable estimates of data quality that can be used to systematically measure the quality of pretraining data.

Memorization

Paper
Add Code

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

2 code implementations • 9 Aug 2023 • Peike Li, BoYu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

Despite the task's significance, prevailing generative models exhibit limitations in music quality, computational efficiency, and generalization.

Ranked #1 on Text-to-Music Generation on MusicCaps

Computational Efficiency In-Context Learning +2

Paper
Code

GLARE: A Dataset for Traffic Sign Detection in Sun Glare

1 code implementation • 19 Sep 2022 • Nicholas Gray, Megan Moraes, Jiang Bian, Alex Wang, Allen Tian, Kurt Wilson, Yan Huang, Haoyi Xiong, Zhishan Guo

It provides an essential enrichment to the widely used LISA Traffic Sign dataset.

object-detection Object Detection +2

Paper
Code

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

no code implementations • 26 Aug 2022 • Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman

We present the results of the NLP Community Metasurvey.

Ethics Inductive Bias

Paper
Add Code

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Paper
Add Code

SQuALITY: Building a Long-Document Summarization Dataset the Hard Way

1 code implementation • 23 May 2022 • Alex Wang, Richard Yuanzhe Pang, Angelica Chen, Jason Phang, Samuel R. Bowman

Summarization datasets are often assembled either by scraping naturally occurring public-domain summaries -- which are nearly always in difficult-to-work-with technical domains -- or by using approximate heuristics to extract them from everyday text -- which frequently yields unfaithful summaries.

Document Summarization Multiple-choice

Paper
Code

Benchmarking Active Learning Strategies for Materials Optimization and Discovery

no code implementations • 12 Apr 2022 • Alex Wang, Haotong Liang, Austin McDannald, Ichiro Takeuchi, A. Gilad Kusne

In these systems, machine learning controls experiment design, execution, and analysis in a closed loop.

Active Learning Benchmarking +2

Paper
Add Code

A Low-Cost Robot Science Kit for Education with Symbolic Regression for Hypothesis Discovery and Validation

no code implementations • 8 Apr 2022 • Logan Saar, Haotong Liang, Alex Wang, Austin McDannald, Efrain Rodriguez, Ichiro Takeuchi, A. Gilad Kusne

We present the next generation in science education, a kit for building a low-cost autonomous scientist.

Experimental Design Symbolic Regression

Paper
Add Code

QuestEval: Summarization Asks for Fact-based Evaluation

1 code implementation • EMNLP 2021 • Thomas Scialom, Paul-Alexis Dray, Patrick Gallinari, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano, Alex Wang

Summarization evaluation remains an open research problem: current metrics such as ROUGE are known to be limited and to correlate poorly with human judgments.

Question Answering

Paper
Code

Application of Quantum Machine Learning using the Quantum Variational Classifier Method to High Energy Physics Analysis at the LHC on IBM Quantum Computer Simulator and Hardware with 10 qubits

no code implementations • 21 Dec 2020 • Sau Lan Wu, Jay Chan, Wen Guan, Shaojun Sun, Alex Wang, Chen Zhou, Miron Livny, Federico Carminati, Alberto Di Meglio, Andy C. Y. Li, Joseph Lykken, Panagiotis Spentzouris, Samuel Yen-Chi Chen, Shinjae Yoo, Tzu-Chieh Wei

On the quantum hardware, the quantum variational classifier method has shown promising discrimination power, comparable to that on the quantum simulator.

Quantum Physics High Energy Physics - Experiment

Paper
Add Code

Label Representations in Modeling Classification as Text Generation

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Xinyi Chen, Jingxian Xu, Alex Wang

Several recent state-of-the-art transfer learning methods model classification tasks as text generation, where labels are represented as strings for the model to generate.

text-classification Text Classification +2

Paper
Add Code

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries

2 code implementations • ACL 2020 • Alex Wang, Kyunghyun Cho, Mike Lewis

QAGS is based on the intuition that if we ask questions about a summary and its source, we will receive similar answers if the summary is factually consistent with the source.

Abstractive Text Summarization

104

Paper
Code

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

6 code implementations • ACL 2020 • Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, Ian Tenney, Samuel R. Bowman

We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks.

Transfer Learning

1,603

Paper
Code

A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models

1 code implementation • 29 May 2019 • Elman Mansimov, Alex Wang, Sean Welleck, Kyunghyun Cho

We investigate this problem by proposing a generalized model of sequence generation that unifies decoding in directed and undirected models.

Machine Translation Natural Language Inference +3

Paper
Code

What do you learn from context? Probing for sentence structure in contextualized word representations

2 code implementations • ICLR 2019 • Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, Ellie Pavlick

The jiant toolkit for general-purpose text understanding models

Language Modelling Sentence

Paper
Code

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

6 code implementations • NeurIPS 2019 • Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks.

Transfer Learning

1,603

Paper
Code

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

no code implementations • ICLR 2019 • Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen

Work on the problem of contextualized word representation—the development of reusable neural network components for sentence understanding—has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo (Peters et al., 2018).

Language Modelling Sentence

Paper
Add Code

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

no code implementations • SEMEVAL 2019 • Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably.

CCG Supertagging Language Modelling +3

Paper
Add Code

On Measuring Social Biases in Sentence Encoders

1 code implementation • NAACL 2019 • Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger

The Word Embedding Association Test shows that GloVe and word2vec word embeddings exhibit human-like implicit biases based on gender, race, and other social constructs (Caliskan et al., 2017).

Sentence Word Embeddings

Paper
Code

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

11 code implementations • WS 2019 • Alex Wang, Kyunghyun Cho

We show that BERT (Devlin et al., 2018) is a Markov random field language model.

Language Modelling

316

Paper
Code

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

no code implementations • ACL 2019 • Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave, Ellie Pavlick, Samuel R. Bowman

Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling.

Language Modelling Natural Language Understanding +2

Paper
Add Code

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

11 code implementations • WS 2018 • Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset.

Ranked #46 on Natural Language Inference on MultiNLI

Natural Language Inference Natural Language Understanding +2

2,315

Paper
Code

Clustering Stable Instances of Euclidean k-means

no code implementations • 4 Dec 2017 • Abhratanu Dutta, Aravindan Vijayaraghavan, Alex Wang

We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable.

Clustering

Paper
Add Code

Clustering Stable Instances of Euclidean k-means.

no code implementations • NeurIPS 2017 • Aravindan Vijayaraghavan, Abhratanu Dutta, Alex Wang

To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering?

Clustering

Paper
Add Code

Learning Linguistic Descriptors of User Roles in Online Communities

no code implementations • WS 2016 • Alex Wang, William L. Hamilton, Jure Leskovec

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.