Search Results for author: Martin Potthast

Found 48 papers, 23 papers with code

Task Proposal: Abstractive Snippet Generation for Web Pages

no code implementations INLG (ACL) 2020 Shahbaz Syed, Wei-Fan Chen, Matthias Hagen, Benno Stein, Henning Wachsmuth, Martin Potthast

We propose a shared task on abstractive snippet generation for web pages, a novel task of generating query-biased abstractive summaries for documents that are to be shown on a search results page.

Abstractive Text Summarization

Language Models as Context-sensitive Word Search Engines

1 code implementation In2Writing (ACL) 2022 Matti Wiegmann, Michael Völske, Benno Stein, Martin Potthast

Context-sensitive word search engines are writing assistants that support word choice, phrasing, and idiomatic language use by indexing large-scale n-gram collections and implementing a wildcard search.

Language Modelling

Casting the Same Sentiment Classification Problem

1 code implementation Findings (EMNLP) 2021 Erik Körner, Ahmad Dawar Hakimi, Gerhard Heyer, Martin Potthast

We introduce and study a problem variant of sentiment analysis, namely the “same sentiment classification problem”, where, given a pair of texts, the task is to determine if they have the same sentiment, disregarding the actual sentiment polarity.

Classification Language Modelling +1

Image Retrieval for Arguments Using Stance-Aware Query Expansion

no code implementations EMNLP (ArgMining) 2021 Johannes Kiesel, Nico Reichenbach, Benno Stein, Martin Potthast

Many forms of argumentation employ images as persuasive means, but research in argument mining has been focused on verbal argumentation so far.

Argument Mining Argument Retrieval +1

Sparse Pairwise Re-ranking with Pre-trained Transformers

1 code implementation10 Jul 2022 Lukas Gienapp, Maik Fröbe, Matthias Hagen, Martin Potthast

Pairwise re-ranking models predict which of two documents is more relevant to a query and then aggregate a final ranking from such preferences.

Passage Ranking Re-Ranking

How Train-Test Leakage Affects Zero-shot Retrieval

no code implementations29 Jun 2022 Maik Fröbe, Christopher Akiki, Martin Potthast, Matthias Hagen

We investigate the impact of this unintended train-test leakage by training neural retrieval models on combinations of a fixed number of MS MARCO / ORCAS queries that are highly similar to the actual test queries and an increasing number of other queries.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

1 code implementation9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramón Risco Delgado, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Timothy Telleen-Lawton, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning

Clickbait Spoiling via Question Answering and Passage Retrieval

1 code implementation ACL 2022 Matthias Hagen, Maik Fröbe, Artur Jurk, Martin Potthast

We introduce and study the task of clickbait spoiling: generating a short text that satisfies the curiosity induced by a clickbait post.

Passage Retrieval Question Answering

Tracking Discourse Influence in Darknet Forums

1 code implementation4 Feb 2022 Christopher Akiki, Lukas Gienapp, Martin Potthast

This technical report documents our efforts in addressing the tasks set forth by the 2021 AMoC (Advanced Modelling of Cyber Criminal Careers) Hackathon.

STEREO: Scientific Text Reuse in Open Access Publications

1 code implementation22 Dec 2021 Lukas Gienapp, Wolfgang Kircheis, Bjarne Sievers, Benno Stein, Martin Potthast

We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications.

FastWARC: Optimizing Large-Scale Web Archive Analytics

1 code implementation22 Nov 2021 Janek Bevendorff, Martin Potthast, Benno Stein

Web search and other large-scale web data analytics rely on processing archives of web pages stored in a standardized and efficient format.

The Impact of Main Content Extraction on Near-Duplicate Detection

no code implementations21 Nov 2021 Maik Fröbe, Matthias Hagen, Janek Bevendorff, Michael Völske, Benno Stein, Christopher Schröder, Robby Wagner, Lukas Gienapp, Martin Potthast

Commercial web search engines employ near-duplicate detection to ensure that users see each relevant result only once, albeit the underlying web crawls typically include (near-)duplicates of many web pages.

Information Retrieval

BERTian Poetics: Constrained Composition with Masked LMs

1 code implementation28 Oct 2021 Christopher Akiki, Martin Potthast

Masked language models have recently been interpreted as energy-based sequence models that can be generated from using a Metropolis--Hastings sampler.

Modeling Proficiency with Implicit User Representations

no code implementations15 Oct 2021 Kim Breitwieser, Allison Lahnala, Charles Welch, Lucie Flek, Martin Potthast

We introduce the problem of proficiency modeling: Given a user's posts on a social media platform, the task is to identify the subset of posts or topics for which the user has some level of proficiency.

Summary Explorer: Visualizing the State of the Art in Text Summarization

1 code implementation EMNLP (ACL) 2021 Shahbaz Syed, Tariq Yousef, Khalid Al-Khatib, Stefan Jänicke, Martin Potthast

This paper introduces Summary Explorer, a new tool to support the manual inspection of text summarization systems by compiling the outputs of 55~state-of-the-art single document summarization approaches on three benchmark datasets, and visually exploring them during a qualitative assessment.

Document Summarization

Small-Text: Active Learning for Text Classification in Python

1 code implementation21 Jul 2021 Christopher Schröder, Lydia Müller, Andreas Niekler, Martin Potthast

We present small-text, a simple and modular active learning library, which offers pool-based active learning for single- and multi-label text classification in Python.

Active Learning Classification +3

Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers

1 code implementation Findings (ACL) 2022 Christopher Schröder, Andreas Niekler, Martin Potthast

Active learning is the iterative construction of a classification model through targeted labeling, enabling significant labeling cost savings.

Active Learning Classification +1

Generating Informative Conclusions for Argumentative Texts

1 code implementation Findings (ACL) 2021 Shahbaz Syed, Khalid Al-Khatib, Milad Alshomary, Henning Wachsmuth, Martin Potthast

Third, insights are provided into the suitability of our corpus for the task, the differences between the two generation paradigms, the trade-off between informativeness and conciseness, and the impact of encoding argumentative knowledge.

Informativeness

Target Inference in Argument Conclusion Generation

no code implementations ACL 2020 Milad Alshomary, Shahbaz Syed, Martin Potthast, Henning Wachsmuth

In particular, we argue here that a decisive step is to infer a conclusion{'}s target, and we hypothesize that this target is related to the premises{'} targets.

Efficient Pairwise Annotation of Argument Quality

no code implementations ACL 2020 Lukas Gienapp, Benno Stein, Matthias Hagen, Martin Potthast

We present an efficient annotation framework for argument quality, a feature difficult to be measured reliably as per previous work.

Crawling and Preprocessing Mailing Lists At Scale for Dialog Analysis

no code implementations ACL 2020 Janek Bevendorff, Khalid Al Khatib, Martin Potthast, Benno Stein

This paper introduces the Webis Gmane Email Corpus 2019, the largest publicly available and fully preprocessed email corpus to date.

Abstractive Snippet Generation

1 code implementation25 Feb 2020 Wei-Fan Chen, Shahbaz Syed, Benno Stein, Matthias Hagen, Martin Potthast

An abstractive snippet is an originally created piece of text to summarize a web page on a search engine results page.

Text Summarization

Common Conversational Community Prototype: Scholarly Conversational Assistant

no code implementations19 Jan 2020 Krisztian Balog, Lucie Flekova, Matthias Hagen, Rosie Jones, Martin Potthast, Filip Radlinski, Mark Sanderson, Svitlana Vakulenko, Hamed Zamani

This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions.

Conversational Search

Towards Summarization for Social Media - Results of the TL;DR Challenge

no code implementations WS 2019 Shahbaz Syed, Michael V{\"o}lske, Nedim Lipka, Benno Stein, Hinrich Sch{\"u}tze, Martin Potthast

In this paper, we report on the results of the TL;DR challenge, discussing an extensive manual evaluation of the expected properties of a good summary based on analyzing the comments provided by human annotators.

Bias Analysis and Mitigation in the Evaluation of Authorship Verification

1 code implementation ACL 2019 Janek Bevendorff, Matthias Hagen, Benno Stein, Martin Potthast

The PAN series of shared tasks is well known for its continuous and high quality research in the field of digital text forensics.

Authorship Verification

Celebrity Profiling

1 code implementation ACL 2019 Matti Wiegmann, Benno Stein, Martin Potthast

Celebrities are among the most prolific users of social media, promoting their personas and rallying followers.

Gender Prediction Occupation prediction +1

Heuristic Authorship Obfuscation

1 code implementation ACL 2019 Janek Bevendorff, Martin Potthast, Matthias Hagen, Benno Stein

Authorship verification is the task of determining whether two texts were written by the same author.

Authorship Verification

Task Proposal: The TL;DR Challenge

no code implementations WS 2018 Shahbaz Syed, Michael V{\"o}lske, Martin Potthast, Nedim Lipka, Benno Stein, Hinrich Sch{\"u}tze

The TL;DR challenge fosters research in abstractive summarization of informal text, the largest and fastest-growing source of textual data on the web, which has been overlooked by summarization research so far.

Abstractive Text Summarization Information Retrieval +1

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations CONLL 2018 Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing Morphological Analysis +1

Crowdsourcing a Large Corpus of Clickbait on Twitter

no code implementations COLING 2018 Martin Potthast, Tim Gollub, Kristof Komlossy, Sebastian Schuster, Matti Wiegmann, Garces Fern, Erika Patricia ez, Matthias Hagen, Benno Stein

To address the urging task of clickbait detection, we constructed a new corpus of 38, 517 annotated Twitter tweets, the Webis Clickbait Corpus 2017.

Clickbait Detection

Heuristic Feature Selection for Clickbait Detection

no code implementations4 Feb 2018 Matti Wiegmann, Michael Völske, Benno Stein, Matthias Hagen, Martin Potthast

We study feature selection as a means to optimize the baseline clickbait detector employed at the Clickbait Challenge 2017.

Clickbait Detection Feature Engineering +2

TL;DR: Mining Reddit to Learn Automatic Summarization

no code implementations WS 2017 Michael V{\"o}lske, Martin Potthast, Shahbaz Syed, Benno Stein

Recent advances in automatic text summarization have used deep neural networks to generate high-quality abstractive summaries, but the performance of these models strongly depends on large amounts of suitable training data.

Abstractive Text Summarization Document Summarization

A Stylometric Inquiry into Hyperpartisan and Fake News

1 code implementation ACL 2018 Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, Benno Stein

The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing.

Authorship Verification Fake News Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.