Search Results for author: Matthias Hagen

Found 45 papers, 20 papers with code

Mining Health-related Cause-Effect Statements with High Precision at Large Scale

1 code implementation • COLING 2022 • Ferdinand Schlatt, Dieter Bettin, Matthias Hagen, Benno Stein, Martin Potthast

An efficient assessment of the health relatedness of text passages is important to mine the web at scale to conduct health sociological analyses or to develop a health search engine.

Paper
Code

CausalQA: A Benchmark for Causal Question Answering

1 code implementation • COLING 2022 • Alexander Bondarenko, Magdalena Wolska, Stefan Heindorf, Lukas Blübaum, Axel-Cyrille Ngonga Ngomo, Benno Stein, Pavel Braslavski, Matthias Hagen, Martin Potthast

At least 5% of questions submitted to search engines ask about cause-effect relationships in some way.

Question Answering

Paper
Code

Task Proposal: Abstractive Snippet Generation for Web Pages

no code implementations • INLG (ACL) 2020 • Shahbaz Syed, Wei-Fan Chen, Matthias Hagen, Benno Stein, Henning Wachsmuth, Martin Potthast

We propose a shared task on abstractive snippet generation for web pages, a novel task of generating query-biased abstractive summaries for documents that are to be shown on a search results page.

Abstractive Text Summarization

Paper
Add Code

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

2 code implementations • 10 Apr 2024 • Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen

Cross-encoders are effective passage re-rankers.

Passage Re-Ranking Re-Ranking

131

Paper
Code

Task-Oriented Paraphrase Analytics

1 code implementation • 26 Mar 2024 • Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein

Since paraphrasing is an ill-defined task, the term "paraphrasing" covers text transformation tasks with different characteristics.

Paper
Code

Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models

1 code implementation • 12 Mar 2024 • Andrew Parry, Maik Fröbe, Sean MacAvaney, Martin Potthast, Matthias Hagen

Modern sequence-to-sequence relevance models like monoT5 can effectively capture complex textual interactions between queries and documents through cross-encoding.

Retrieval

Paper
Code

Detecting Generated Native Ads in Conversational Search

no code implementations • 7 Feb 2024 • Sebastian Schmidt, Ines Zelch, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

It is only a small step to also use this technology to generate and integrate advertising within these answers - instead of placing ads separately from the organic search results.

Conversational Search Sentence

Paper
Add Code

Investigating the Effects of Sparse Attention on Cross-Encoders

1 code implementation • 29 Dec 2023 • Ferdinand Schlatt, Maik Fröbe, Matthias Hagen

Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models.

Re-Ranking Retrieval

Paper
Code

Evaluating Generative Ad Hoc Information Retrieval

no code implementations • 8 Nov 2023 • Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guide Zucoon, Benno Stein, Matthias Hagen, Martin Potthast

Recent advances in large language models have enabled the development of viable generative information retrieval systems.

Document Ranking Information Retrieval +1

Paper
Add Code

Commercialized Generative AI: A Critical Study of the Feasibility and Ethics of Generating Native Advertising Using Large Language Models in Conversational Web Search

no code implementations • 7 Oct 2023 • Ines Zelch, Matthias Hagen, Martin Potthast

How will generative AI pay for itself?

Ethics

Paper
Add Code

The Information Retrieval Experiment Platform

1 code implementation • 30 May 2023 • Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures.

Information Retrieval Retrieval

Paper
Code

Perspectives on Large Language Models for Relevance Judgment

no code implementations • 13 Apr 2023 • Guglielmo Faggioli, Laura Dietz, Charles Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, Henning Wachsmuth

When asked, large language models (LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.

Retrieval

Paper
Add Code

The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

2 code implementations • 2 Apr 2023 • Jan Heinrich Reimer, Sebastian Schmidt, Maik Fröbe, Lukas Gienapp, Harrisen Scells, Benno Stein, Matthias Hagen, Martin Potthast

The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years.

Privacy Preserving Retrieval

Paper
Code

Paraphrase Acquisition from Image Captions

1 code implementation • 26 Jan 2023 • Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein

We propose to use image captions from the Web as a previously underutilized resource for paraphrases (i. e., texts with the same "message") and to create and analyze a corresponding dataset.

Image Captioning

Paper
Code

Sparse Pairwise Re-ranking with Pre-trained Transformers

1 code implementation • 10 Jul 2022 • Lukas Gienapp, Maik Fröbe, Matthias Hagen, Martin Potthast

Pairwise re-ranking models predict which of two documents is more relevant to a query and then aggregate a final ranking from such preferences.

Passage Ranking Re-Ranking +1

Paper
Code

How Train-Test Leakage Affects Zero-shot Retrieval

1 code implementation • 29 Jun 2022 • Maik Fröbe, Christopher Akiki, Martin Potthast, Matthias Hagen

We investigate the impact of this unintended train-test leakage by training neural retrieval models on combinations of a fixed number of MS MARCO / ORCAS queries that are highly similar to the actual test queries and an increasing number of other queries.

Retrieval

Paper
Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,644

Paper
Code

Clickbait Spoiling via Question Answering and Passage Retrieval

1 code implementation • ACL 2022 • Matthias Hagen, Maik Fröbe, Artur Jurk, Martin Potthast

We introduce and study the task of clickbait spoiling: generating a short text that satisfies the curiosity induced by a clickbait post.

Passage Retrieval Question Answering +1

Paper
Code

The Impact of Main Content Extraction on Near-Duplicate Detection

no code implementations • 21 Nov 2021 • Maik Fröbe, Matthias Hagen, Janek Bevendorff, Michael Völske, Benno Stein, Christopher Schröder, Robby Wagner, Lukas Gienapp, Martin Potthast

Commercial web search engines employ near-duplicate detection to ensure that users see each relevant result only once, albeit the underlying web crawls typically include (near-)duplicates of many web pages.

Information Retrieval Retrieval

Paper
Add Code

Towards Axiomatic Explanations for Neural Ranking Models

no code implementations • 15 Jun 2021 • Michael Völske, Alexander Bondarenko, Maik Fröbe, Matthias Hagen, Benno Stein, Jaspreet Singh, Avishek Anand

We investigate whether one can explain the behavior of neural ranking models in terms of their congruence with well understood principles of document ranking by using established theories from axiomatic IR.

Document Ranking Information Retrieval +1

Paper
Add Code

Query Interpretations from Entity-Linked Segmentations

1 code implementation • 18 May 2021 • Vaibhav Kasturia, Marcel Gohsen, Matthias Hagen

Web search queries can be ambiguous: is "source of the nile" meant to find information on the actual river or on a board game of that name?

Entity Linking

Paper
Code

Efficient Pairwise Annotation of Argument Quality

no code implementations • ACL 2020 • Lukas Gienapp, Benno Stein, Matthias Hagen, Martin Potthast

We present an efficient annotation framework for argument quality, a feature difficult to be measured reliably as per previous work.

Paper
Add Code

The Importance of Suppressing Domain Style in Authorship Analysis

no code implementations • 29 May 2020 • Sebastian Bischoff, Niklas Deckers, Marcel Schliebs, Ben Thies, Matthias Hagen, Efstathios Stamatatos, Benno Stein, Martin Potthast

The prerequisite of many approaches to authorship analysis is a representation of writing style.

Paper
Add Code

Conversational Search -- A Report from Dagstuhl Seminar 19461

no code implementations • 18 May 2020 • Avishek Anand, Lawrence Cavedon, Matthias Hagen, Hideo Joho, Mark Sanderson, Benno Stein

Dagstuhl Seminar 19461 "Conversational Search" was held on 10-15 November 2019.

Conversational Search Information Retrieval +1

Paper
Add Code

Abstractive Snippet Generation

1 code implementation • 25 Feb 2020 • Wei-Fan Chen, Shahbaz Syed, Benno Stein, Matthias Hagen, Martin Potthast

An abstractive snippet is an originally created piece of text to summarize a web page on a search engine results page.

Ranked #1 on Text Summarization on Webis-Snippet-20 Corpus

Text Summarization

Paper
Code

Common Conversational Community Prototype: Scholarly Conversational Assistant

no code implementations • 19 Jan 2020 • Krisztian Balog, Lucie Flekova, Matthias Hagen, Rosie Jones, Martin Potthast, Filip Radlinski, Mark Sanderson, Svitlana Vakulenko, Hamed Zamani

This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions.

Conversational Search

Paper
Add Code

Unraveling the Search Space of Abusive Language in Wikipedia with Dynamic Lexicon Acquisition

no code implementations • WS 2019 • Wei-Fan Chen, Khalid Al Khatib, Matthias Hagen, Henning Wachsmuth, Benno Stein

Many discussions on online platforms suffer from users offending others by using abusive terminology, threatening each other, or being sarcastic.

Abusive Language

Paper
Add Code

Bias Analysis and Mitigation in the Evaluation of Authorship Verification

1 code implementation • ACL 2019 • Janek Bevendorff, Matthias Hagen, Benno Stein, Martin Potthast

The PAN series of shared tasks is well known for its continuous and high quality research in the field of digital text forensics.

Authorship Verification Benchmarking

Paper
Code

Heuristic Authorship Obfuscation

1 code implementation • ACL 2019 • Janek Bevendorff, Martin Potthast, Matthias Hagen, Benno Stein

Authorship verification is the task of determining whether two texts were written by the same author.

Authorship Verification

Paper
Code

TARGER: Neural Argument Mining at Your Fingertips

1 code implementation • ACL 2019 • Artem Chernodub, Oleksiy Oliynyk, Philipp Heidenreich, Alex Bondarenko, Matthias Hagen, Chris Biemann, Alex Panchenko, er

We present TARGER, an open source neural argument mining framework for tagging arguments in free input texts and for keyword-based retrieval of arguments from an argument-tagged web-scale corpus.

Argument Mining Retrieval

250

Paper
Code

Generalizing Unmasking for Short Texts

1 code implementation • NAACL 2019 • Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

Authorship verification is the problem of inferring whether two texts were written by the same author.

Authorship Verification General Classification

Paper
Code

Answering Comparative Questions: Better than Ten-Blue-Links?

no code implementations • 15 Jan 2019 • Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko

We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl.

Paper
Add Code

The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength

no code implementations • 27 Dec 2018 • Martin Potthast, Tim Gollub, Matthias Hagen, Benno Stein

Clickbait has grown to become a nuisance to social media users and social media operators alike.

Clickbait Detection regression

Paper
Add Code

Categorizing Comparative Sentences

3 code implementations • WS 2019 • Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e. g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB).

Argument Mining Sentence +1