Search Results for author: Hannaneh Hajishirzi

Found 165 papers, 109 papers with code

Reframing Instructional Prompts to GPTk’s Language

no code implementations Findings (ACL) 2022 Daniel Khashabi, Chitta Baral, Yejin Choi, Hannaneh Hajishirzi

Our experiments compare the zero-shot and few-shot performance of LMs prompted with reframed instructions on 12 NLP tasks across 6 categories.

OLMES: A Standard for Language Model Evaluations

no code implementations12 Jun 2024 Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi

Evaluating language models in particular is challenging, as small changes to how a model is evaluated on a task can lead to large changes in measured performance.

Language Modelling Multiple-choice

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

1 code implementation10 Jun 2024 Joongwon Kim, Bhargavi Paranjape, Tushar Khot, Hannaneh Hajishirzi

Despite using 7B models, Husky matches or even exceeds frontier LMs such as GPT-4 on these tasks, showcasing the efficacy of our holistic approach in addressing complex reasoning problems.

Multi-hop Question Answering Question Answering

SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

no code implementations10 Jun 2024 David Wadden, Kejian Shi, Jacob Morrison, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan

We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification.

Claim Verification Instruction Following +3

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation1 Apr 2024 Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Reliable, Adaptable, and Attributable Language Models with Retrieval

no code implementations5 Mar 2024 Akari Asai, Zexuan Zhong, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi, Wen-tau Yih

Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability.

Question Answering Retrieval

Set the Clock: Temporal Alignment of Pretrained Language Models

1 code implementation26 Feb 2024 Bowen Zhao, Zander Brumbaugh, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith

We then develop several methods, from prompting to finetuning, to align LMs to use their most recent knowledge when answering questions, and investigate various factors in this alignment.

Data Engineering for Scaling Language Models to 128K Context

2 code implementations15 Feb 2024 Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

4k Continual Pretraining

Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens

1 code implementation30 Jan 2024 Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi

Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $\infty$-gram LM with backoff.

Language Modelling

APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

1 code implementation22 Jan 2024 Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao

Compared to baselines, our experiments show that APT maintains up to 98% task performance when pruning RoBERTa and T5 models with 40% parameters left while keeping 86. 4% LLaMA models' performance with 70% parameters remained.

Fine-grained Hallucination Detection and Editing for Language Models

no code implementations12 Jan 2024 Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi

On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.

Hallucination Retrieval

Paloma: A Benchmark for Evaluating Language Model Fit

no code implementations16 Dec 2023 Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

We invite submissions to our benchmark and organize results by comparability based on compliance with guidelines such as removal of benchmark contamination from pretraining.

Language Modelling

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

2 code implementations17 Nov 2023 Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.

SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

no code implementations18 Oct 2023 Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh Hajishirzi

We introduce SHARCS for adaptive inference that takes into account the hardness of input samples.

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

1 code implementation17 Oct 2023 Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu

In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.

Language Modelling Large Language Model +2

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

2 code implementations17 Oct 2023 Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi

Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens.

Fact Verification Response Generation +1

MatFormer: Nested Transformer for Elastic Inference

2 code implementations11 Oct 2023 Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, KaiFeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain

Furthermore, we observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.

Decoder Language Modelling

Crystal: Introspective Reasoners Reinforced with Self-Feedback

1 code implementation7 Oct 2023 Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, Asli Celikyilmaz

Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the reasoning process is explicitly verbalized and utilized.

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

1 code implementation2 Oct 2023 Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks.

Hallucination Retrieval

Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding

no code implementations26 Sep 2023 Jiacheng Liu, Andrew Cohen, Ramakanth Pasunuru, Yejin Choi, Hannaneh Hajishirzi, Asli Celikyilmaz

The key idea is not to throw out the value network, a byproduct of PPO training for evaluating partial output sequences, when decoding text out of the policy network.

Text Generation

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore

1 code implementation8 Aug 2023 Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke Zettlemoyer

SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text and (2) augmenting it with a more general and easily modifiable nonparametric datastore (e. g., containing copyrighted books or news) that is only queried during inference.

Language Modelling Sentence

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

2 code implementations NeurIPS 2023 Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.

Instruction Following

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

no code implementations NeurIPS 2023 Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e. g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e. g., factual incorrectness, irrelevance, and information incompleteness).

Language Modelling Long Form Question Answering +2

PuMer: Pruning and Merging Tokens for Efficient Vision Language Models

1 code implementation27 May 2023 Qingqing Cao, Bhargavi Paranjape, Hannaneh Hajishirzi

Large-scale vision language (VL) models use Transformers to perform cross-modal interactions between the input text and image.

Token Reduction

Machine Reading Comprehension using Case-based Reasoning

no code implementations24 May 2023 Dung Thai, Dhruv Agarwal, Mudit Chaudhary, Wenlong Zhao, Rajarshi Das, Manzil Zaheer, Jay-Yoon Lee, Hannaneh Hajishirzi, Andrew McCallum

Given a test question, CBR-MRC first retrieves a set of similar cases from a nonparametric memory and then predicts an answer by selecting the span in the test context that is most similar to the contextualized representations of answers in the retrieved cases.

Attribute Machine Reading Comprehension

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

6 code implementations23 May 2023 Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly.

Language Modelling Retrieval +1

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

1 code implementation22 May 2023 Joongwon Kim, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi

TaskShop uses TaskWeb to estimate the benefit of using a source task for learning a new target task, and to choose a subset of helpful training tasks for multi-task training.

Multi-Task Learning

ReFIT: Relevance Feedback from a Reranker during Inference

no code implementations19 May 2023 Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi

Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e. g., K=100), which are then reranked by a more powerful cross-encoder model.

Information Retrieval Retrieval

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

1 code implementation5 May 2023 Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures.

ART: Automatic multi-step reasoning and tool-use for large language models

2 code implementations16 Mar 2023 Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, Marco Tulio Ribeiro

We introduce Automatic Reasoning and Tool-use (ART), a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program.

When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories

1 code implementation20 Dec 2022 Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi

Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge.

Knowledge Probing Memorization +2

Self-Instruct: Aligning Language Models with Self-Generated Instructions

17 code implementations20 Dec 2022 Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi

Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.

Instruction Following Language Modelling

HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation

no code implementations20 Dec 2022 Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew Peters

By converting instructions into modules, HINT models can effectively disregard the length of instructions and few-shot example inputs in terms of compute usage.

In-Context Learning

Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations

2 code implementations19 Dec 2022 Xinxi Lyu, Sewon Min, Iz Beltagy, Luke Zettlemoyer, Hannaneh Hajishirzi

Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available.

Few-Shot Learning In-Context Learning

Editing Models with Task Arithmetic

3 code implementations8 Dec 2022 Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Suchin Gururangan, Ludwig Schmidt, Hannaneh Hajishirzi, Ali Farhadi

Changing how pre-trained models behave -- e. g., improving their performance on a downstream task or mitigating biases learned during pre-training -- is a common practice when developing machine learning systems.


AGRO: Adversarial Discovery of Error-prone groups for Robust Optimization

1 code implementation2 Dec 2022 Bhargavi Paranjape, Pradeep Dasigi, Vivek Srikumar, Luke Zettlemoyer, Hannaneh Hajishirzi

We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization -- an end-to-end approach that jointly identifies error-prone groups and improves accuracy on them.


Nonparametric Masked Language Modeling

1 code implementation2 Dec 2022 Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer

Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases.

Language Modelling Masked Language Modeling +2

Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

1 code implementation1 Dec 2022 Hamish Ivison, Noah A. Smith, Hannaneh Hajishirzi, Pradeep Dasigi

Obtaining labeled data to train a model for a task of interest is often expensive.

CREPE: Open-Domain Question Answering with False Presuppositions

1 code implementation30 Nov 2022 Xinyan Velocity Yu, Sewon Min, Luke Zettlemoyer, Hannaneh Hajishirzi

We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.

Open-Domain Question Answering

Task-aware Retrieval with Instructions

1 code implementation16 Nov 2022 Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau Yih

We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries.


SciFact-Open: Towards open-domain scientific claim verification

1 code implementation25 Oct 2022 David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, Hannaneh Hajishirzi

While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature.

Claim Verification Information Retrieval +1

CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation

1 code implementation10 Oct 2022 Tanay Dixit, Bhargavi Paranjape, Hannaneh Hajishirzi, Luke Zettlemoyer

We present COunterfactual Generation via Retrieval and Editing (CORE), a retrieval-augmented generation framework for creating diverse counterfactual perturbations for CDA.

counterfactual Data Augmentation +6

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

1 code implementation6 Oct 2022 Jiacheng Liu, Skyler Hallinan, Ximing Lu, Pengfei He, Sean Welleck, Hannaneh Hajishirzi, Yejin Choi

Our work is the first to report that knowledge generated by models that are orders of magnitude smaller than GPT-3, even without direct supervision on the knowledge itself, can exceed the quality of commonsense knowledge elicited from GPT-3.

Question Answering Reinforcement Learning (RL)

Patching open-vocabulary models by interpolating weights

1 code implementation10 Aug 2022 Gabriel Ilharco, Mitchell Wortsman, Samir Yitzhak Gadre, Shuran Song, Hannaneh Hajishirzi, Simon Kornblith, Ali Farhadi, Ludwig Schmidt

We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate.

Image Classification

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

4 code implementations9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

NaturalProver: Grounded Mathematical Proof Generation with Language Models

1 code implementation25 May 2022 Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi

Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence.

Automated Theorem Proving Language Modelling

ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts

1 code implementation24 May 2022 Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi

Our method, called ATTEMPT (ATTEntional Mixtures of Prompt Tuning), obtains source prompts as encodings of large-scale source tasks into a small number of parameters and trains an attention module to interpolate the source prompts and a newly initialized target prompt for every instance in the target task.

Few-Shot Learning Language Modelling +1

Aligning to Social Norms and Values in Interactive Narratives

no code implementations NAACL 2022 Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi

We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games -- environments wherein an agent perceives and interacts with a world through natural language.

text-based games

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

2 code implementations25 Feb 2022 Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.

In-Context Learning

UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training

1 code implementation23 Feb 2022 Daniel Khashabi, Yeganeh Kordi, Hannaneh Hajishirzi

We present UnifiedQA-v2, a QA model built with the same process as UnifiedQA, except that it utilizes more supervision -- roughly 3x the number of datasets used for UnifiedQA.

Question Answering

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

1 code implementation22 Feb 2022 Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, Andrew McCallum

Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed.

Knowledge Base Question Answering

CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning

no code implementations16 Dec 2021 Zeqiu Wu, Yi Luan, Hannah Rashkin, David Reitter, Hannaneh Hajishirzi, Mari Ostendorf, Gaurav Singh Tomar

Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) poses new challenges in understanding the current user question, as each question needs to be interpreted within the dialogue context.

Conversational Question Answering Passage Retrieval +3

Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks

1 code implementation NAACL 2022 Akari Asai, Matt Gardner, Hannaneh Hajishirzi

We introduce a multi-task learning framework to jointly generate the final output and predict the evidentiality of each passage, leveraging a new task-agnostic method to obtain silver evidentiality labels for supervision.

Attribute Fact Verification +4

MultiVerS: Improving scientific claim verification with weak supervision and full-document context

3 code implementations Findings (NAACL) 2022 David Wadden, Kyle Lo, Lucy Lu Wang, Arman Cohan, Iz Beltagy, Hannaneh Hajishirzi

Our approach outperforms two competitive baselines on three scientific claim verification datasets, with particularly strong performance in zero / few-shot domain adaptation experiments.

Claim Verification Domain Adaptation +2

MetaICL: Learning to Learn In Context

2 code implementations NAACL 2022 Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks.

Few-Shot Learning In-Context Learning +4

Generated Knowledge Prompting for Commonsense Reasoning

1 code implementation ACL 2022 Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi

It remains an open question whether incorporating external knowledge benefits commonsense reasoning while maintaining the flexibility of pretrained sequence models.

Language Modelling Open-Ended Question Answering

Reframing Instructional Prompts to GPTk's Language

no code implementations16 Sep 2021 Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, Hannaneh Hajishirzi

Our experiments compare the zero-shot and few-shot performance of LMs prompted with reframed instructions on 12 NLP tasks across 6 categories.

Few-Shot Learning Question Generation +1

DIALKI: Knowledge Identification in Conversational Systems through Dialogue-Document Contextualization

1 code implementation EMNLP 2021 Zeqiu Wu, Bo-Ru Lu, Hannaneh Hajishirzi, Mari Ostendorf

Identifying relevant knowledge to be used in conversational systems that are grounded in long documents is critical to effective response generation.

Response Generation

Robust fine-tuning of zero-shot models

3 code implementations CVPR 2022 Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo-Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt

Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.

Ranked #12 on Image Classification on ObjectNet (using extra training data)

Image Classification Transfer Learning

One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval

1 code implementation NeurIPS 2021 Akari Asai, Xinyan Yu, Jungo Kasai, Hannaneh Hajishirzi

We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific annotated data or knowledge sources.

Answer Generation Passage Retrieval +3

FaVIQ: FAct Verification from Information-seeking Questions

2 code implementations ACL 2022 Jungsoo Park, Sewon Min, Jaewoo Kang, Luke Zettlemoyer, Hannaneh Hajishirzi

Claims in FAVIQ are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.

Fact Checking Fact Verification +1

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

no code implementations Findings (ACL) 2021 Bhargavi Paranjape, Julian Michael, Marjan Ghazvininejad, Luke Zettlemoyer, Hannaneh Hajishirzi

Many commonsense reasoning NLP tasks involve choosing between one or more possible answers to a question or prompt based on knowledge that is often implicit.


Efficient Passage Retrieval with Hashing for Open-domain Question Answering

1 code implementation ACL 2021 Ikuya Yamada, Akari Asai, Hannaneh Hajishirzi

Most state-of-the-art open-domain question answering systems use a neural retrieval model to encode passages into continuous vectors and extract them from a knowledge source.

Natural Questions Open-Domain Question Answering +3

Beyond Paragraphs: NLP for Long Sequences

1 code implementation NAACL 2021 Iz Beltagy, Arman Cohan, Hannaneh Hajishirzi, Sewon Min, Matthew E. Peters

In this tutorial, we aim at bringing interested NLP researchers up to speed about the recent and ongoing techniques for document-level representation learning.

Representation Learning

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

3 code implementations ACL 2022 Swaroop Mishra, Daniel Khashabi, Chitta Baral, Hannaneh Hajishirzi

Using this meta-dataset, we measure cross-task generalization by training models on seen tasks and measuring generalization to the remaining unseen ones.

Question Answering

Joint Passage Ranking for Diverse Multi-Answer Retrieval

no code implementations EMNLP 2021 Sewon Min, Kenton Lee, Ming-Wei Chang, Kristina Toutanova, Hannaneh Hajishirzi

We study multi-answer retrieval, an under-explored problem that requires retrieving passages to cover multiple distinct answers for a given question.

Answer Generation Passage Ranking +4

NaturalProofs: Mathematical Theorem Proving in Natural Language

1 code implementation24 Mar 2021 Sean Welleck, Jiacheng Liu, Ronan Le Bras, Hannaneh Hajishirzi, Yejin Choi, Kyunghyun Cho

Understanding and creating mathematics using natural mathematical language - the mixture of symbolic and natural language used by humans - is a challenging and important problem for driving progress in machine learning.

Automated Theorem Proving Domain Generalization +3

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

no code implementations EMNLP 2020 James Ferguson, Matt Gardner, Hannaneh Hajishirzi, Tushar Khot, Pradeep Dasigi

However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information.

Reading Comprehension

Extracting a Knowledge Base of Mechanisms from COVID-19 Papers

3 code implementations NAACL 2021 Tom Hope, Aida Amini, David Wadden, Madeleine van Zuylen, Sravanthi Parasa, Eric Horvitz, Daniel Weld, Roy Schwartz, Hannaneh Hajishirzi

The COVID-19 pandemic has spawned a diverse body of scientific literature that is challenging to navigate, stimulating interest in automated tools to help find useful knowledge.


X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

1 code implementation EMNLP 2020 Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi

X-LXMERT's image generation capabilities rival state of the art generative models while its question answering and captioning abilities remains comparable to LXMERT.

Image Captioning Image Generation +3

Extracting Summary Knowledge Graphs from Long Documents

1 code implementation19 Sep 2020 Zeqiu Wu, Rik Koncel-Kedziorski, Mari Ostendorf, Hannaneh Hajishirzi

Knowledge graphs capture entities and relations from long documents and can facilitate reasoning in many downstream applications.

Graph Learning Knowledge Graphs +1

DeLighT: Deep and Light-weight Transformer

2 code implementations ICLR 2021 Sachin Mehta, Marjan Ghazvininejad, Srinivasan Iyer, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce a deep and light-weight transformer, DeLighT, that delivers similar or better performance than standard transformer-based models with significantly fewer parameters.

Language Modelling Machine Translation +1

HATNet: An End-to-End Holistic Attention Network for Diagnosis of Breast Biopsy Images

1 code implementation25 Jul 2020 Sachin Mehta, Ximing Lu, Donald Weaver, Joann G. Elmore, Hannaneh Hajishirzi, Linda Shapiro

HATNet extends the bag-of-words approach and uses self-attention to encode global information, allowing it to learn representations from clinically relevant tissue structures without any explicit supervision.

Histopathological Image Classification Image Classification

Multi-modal Information Extraction from Text, Semi-structured, and Tabular Data on the Web

no code implementations ACL 2020 Xin Luna Dong, Hannaneh Hajishirzi, Colin Lockard, Prashant Shiralkar

In this tutorial we take a holistic view toward information extraction, exploring the commonalities in the challenges and solutions developed to address these different forms of text.

document understanding Entity Linking

ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages

no code implementations14 May 2020 Colin Lockard, Prashant Shiralkar, Xin Luna Dong, Hannaneh Hajishirzi

In this work, we propose a solution for "zero-shot" open-domain relation extraction from webpages with a previously unseen template, including from websites with little overlap with existing sources of knowledge for distant supervision and websites in entirely new subject verticals.

Graph Neural Network Relation +1

A Controllable Model of Grounded Response Generation

1 code implementation1 May 2020 Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf, Bill Dolan

Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process, often resulting in uninteresting responses.

Informativeness Response Generation

An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction

2 code implementations EMNLP 2020 Bhargavi Paranjape, Mandar Joshi, John Thickstun, Hannaneh Hajishirzi, Luke Zettlemoyer

Decisions of complex language understanding models can be rationalized by limiting their inputs to a relevant subsequence of the original text.

SciREX: A Challenge Dataset for Document-Level Information Extraction

1 code implementation ACL 2020 Sarthak Jain, Madeleine van Zuylen, Hannaneh Hajishirzi, Iz Beltagy

It is challenging to create a large-scale information extraction (IE) dataset at the document level since it requires an understanding of the whole document to annotate entities and their document-level relationships that usually span beyond sentences or even sections.


Fact or Fiction: Verifying Scientific Claims

2 code implementations EMNLP 2020 David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, Hannaneh Hajishirzi

We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to identify rationales justifying each decision.

Claim Verification Domain Adaptation +1

AmbigQA: Answering Ambiguous Open-domain Questions

2 code implementations EMNLP 2020 Sewon Min, Julian Michael, Hannaneh Hajishirzi, Luke Zettlemoyer

Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer.

Open-Domain Question Answering Weakly-supervised Learning

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

4 code implementations15 Feb 2020 Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah Smith

We publicly release all of our experimental data, including training and validation scores for 2, 100 trials, to encourage further analysis of training dynamics during fine-tuning.

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

2 code implementations ICLR 2020 Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong

Answering questions that require multi-hop reasoning at web-scale necessitates retrieving multiple evidence documents, one of which often has little lexical or semantic relationship to the question.

Question Answering Retrieval

Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering

7 code implementations10 Nov 2019 Sewon Min, Danqi Chen, Luke Zettlemoyer, Hannaneh Hajishirzi

We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from an external knowledge base or co-occurrence in the same article.

Natural Questions Open-Domain Question Answering +5

Contextualized Sparse Representations for Real-Time Open-Domain Question Answering

3 code implementations ACL 2020 Jinhyuk Lee, Minjoon Seo, Hannaneh Hajishirzi, Jaewoo Kang

Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models.

Information Retrieval Open-Domain Question Answering +1

On Making Reading Comprehension More Comprehensive

no code implementations WS 2019 Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min

In this work, we justify a question answering approach to reading comprehension and describe the various kinds of questions one might use to more fully test a system{'}s comprehension of a passage, moving beyond questions that only probe local predicate-argument structures.

Machine Reading Comprehension Question Answering

Question Answering is a Format; When is it Useful?

no code implementations25 Sep 2019 Matt Gardner, Jonathan Berant, Hannaneh Hajishirzi, Alon Talmor, Sewon Min

In this opinion piece, we argue that question answering should be considered a format which is sometimes useful for studying particular phenomena, not a phenomenon or task in itself.

Machine Translation Question Answering +4

Entity, Relation, and Event Extraction with Contextualized Span Representations

3 code implementations IJCNLP 2019 David Wadden, Ulme Wennberg, Yi Luan, Hannaneh Hajishirzi

We examine the capabilities of a unified, multi-task framework for three information extraction tasks: named entity recognition, relation extraction, and event extraction.

Event Extraction Joint Entity and Relation Extraction +5

Mixture Content Selection for Diverse Sequence Generation

1 code implementation IJCNLP 2019 Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi

The diversification stage uses a mixture of experts to sample different binary masks on the source sequence for diverse content selection.

Abstractive Text Summarization Decoder +3

Potential-Based Advice for Stochastic Policy Learning

no code implementations20 Jul 2019 Baicen Xiao, Bhaskar Ramasubramanian, Andrew Clark, Hannaneh Hajishirzi, Linda Bushnell, Radha Poovendran

This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies.


Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

1 code implementation ACL 2019 Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query.

Open-Domain Question Answering

DiCENet: Dimension-wise Convolutions for Efficient Networks

2 code implementations8 Jun 2019 Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari

When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation.

Image Classification Neural Architecture Search +3

MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

no code implementations NAACL 2019 Aida Amini, Saadia Gabriel, Peter Lin, Rik Koncel-Kedziorski, Yejin Choi, Hannaneh Hajishirzi

We introduce a new representation language to model precise operation programs corresponding to each math problem that aim to improve both the performance and the interpretability of the learned models.

Math Math Word Problem Solving

A General Framework for Information Extraction using Dynamic Span Graphs

3 code implementations NAACL 2019 Yi Luan, Dave Wadden, Luheng He, Amy Shah, Mari Ostendorf, Hannaneh Hajishirzi

We introduce a general framework for several information extraction tasks that share span representations using dynamically constructed span graphs.

 Ranked #1 on Relation Extraction on ACE 2004 (Cross Sentence metric)

Joint Entity and Relation Extraction Named Entity Recognition (NER) +1