Search Results for author: Preslav Nakov

Found 261 papers, 97 papers with code

COVID-19 in Bulgarian Social Media: Factuality, Harmfulness, Propaganda, and Framing

1 code implementation RANLP 2021 Preslav Nakov, Firoj Alam, Shaden Shaar, Giovanni Da San Martino, Yifan Zhang

With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic.

MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation

no code implementations26 Nov 2024 Harsh Singh, Rocktim Jyoti Das, Mingfei Han, Preslav Nakov, Ivan Laptev

Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.

Code Generation In-Context Learning +2

Arabic Dataset for LLM Safeguard Evaluation

no code implementations22 Oct 2024 Yasser Ashraf, Yuxia Wang, Bin Gu, Preslav Nakov, Timothy Baldwin

The growing use of large language models (LLMs) has raised concerns regarding their safety.

DocEdit-v2: Document Structure Editing Via Multimodal LLM Grounding

no code implementations21 Oct 2024 Manan Suri, Puneet Mathur, Franck Dernoncourt, Rajiv Jain, Vlad I Morariu, Ramit Sawhney, Preslav Nakov, Dinesh Manocha

Document structure editing involves manipulating localized textual, visual, and layout components in document images based on the user's requests.

FIRE: Fact-checking with Iterative Retrieval and Verification

1 code implementation17 Oct 2024 Zhuohan Xie, Rui Xing, Yuxia Wang, Jiahui Geng, Hasan Iqbal, Dhruv Sahnan, Iryna Gurevych, Preslav Nakov

The typical approach to fact-checking these atomic claims involves retrieving a fixed number of pieces of evidence, followed by a verification step.

Claim Verification Fact Checking +3

Exploring Language Model Generalization in Low-Resource Extractive QA

no code implementations27 Sep 2024 Saptarshi Sengupta, Wenpeng Yin, Preslav Nakov, Shreya Ghosh, Suhang Wang

In this paper, we investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift, i. e., can LLMs generalize well to closed-domains that require specific knowledge such as medicine and law in a zero-shot fashion without additional in-domain training?

Domain Generalization Extractive Question-Answering +2

TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning

1 code implementation18 Sep 2024 Xinyuan Lu, Liangming Pan, Yubo Ma, Preslav Nakov, Min-Yen Kan

Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning, which is crucial for tasks such as table question answering (TQA) and table-based fact verification (TFV).

Fact Verification Question Answering +1

Post-OCR Text Correction for Bulgarian Historical Documents

1 code implementation31 Aug 2024 Angel Beshirov, Milena Dobreva, Dimitar Dimitrov, Momchil Hardalov, Ivan Koychev, Preslav Nakov

We further develop a method for automatically generating synthetic data in this orthography, as well as in the subsequent Ivanchev orthography, by leveraging vast amounts of contemporary literature Bulgarian texts.

Optical Character Recognition Optical Character Recognition (OCR)

Grounding Fallacies Misrepresenting Scientific Publications in Evidence

1 code implementation23 Aug 2024 Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych

Health-related misinformation claims often falsely cite a credible biomedical publication as evidence, which superficially appears to support the false claim.

Fact Checking Logical Fallacies +3

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

2 code implementations6 Aug 2024 Hasan Iqbal, Yuxia Wang, Minghan Wang, Georgi Georgiev, Jiahui Geng, Iryna Gurevych, Preslav Nakov

The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate.

Fact Checking

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

1 code implementation28 Jun 2024 Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

To address this problem, we propose $\texttt{Web2Code}$, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs.

Code Translation

Can Machines Resonate with Humans? Evaluating the Emotional and Empathic Comprehension of LMs

1 code implementation17 Jun 2024 Muhammad Arslan Manzoor, Yuxia Wang, Minghan Wang, Preslav Nakov

Our systematic exploration of LMs' understanding of empathy reveals substantial opportunities for further investigation in both task formulation and modeling.

Contrastive Learning

Exploring the Limitations of Detecting Machine-Generated Text

no code implementations16 Jun 2024 Jad Doughman, Osama Mohammed Afzal, Hawau Olamide Toyin, Shady Shehata, Preslav Nakov, Zeerak Talat

We find that classifiers are highly sensitive to stylistic changes and differences in text complexity, and in some cases degrade entirely to random classifiers.

Text Detection

Corpus Poisoning via Approximate Greedy Gradient Descent

1 code implementation7 Jun 2024 Jinyan Su, Preslav Nakov, Claire Cardie

Dense retrievers are widely used in information retrieval and have also been successfully extended to other knowledge intensive areas such as language models, e. g., Retrieval-Augmented Generation (RAG) systems.

Information Retrieval RAG +1

Missci: Reconstructing Fallacies in Misrepresented Science

1 code implementation5 Jun 2024 Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych

Unlike previous fallacy detection datasets, Missci (i) focuses on implicit fallacies between the relevant content of the cited publication and the inaccurate claim, and (ii) requires models to verbalize the fallacious reasoning in addition to classifying it.

Fact Checking Misinformation

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

4 code implementations9 May 2024 Yuxia Wang, Minghan Wang, Hasan Iqbal, Georgi Georgiev, Jiahui Geng, Preslav Nakov

The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs.

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

1 code implementation7 Mar 2024 Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov

Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output.

Fact Checking Hallucination +1

Multimodal Large Language Models to Support Real-World Fact-Checking

no code implementations6 Mar 2024 Jiahui Geng, Yova Kementchedjhieva, Preslav Nakov, Iryna Gurevych

To the best of our knowledge, we are the first to evaluate MLLMs for real-world fact-checking.

Fact Checking

A Chinese Dataset for Evaluating the Safeguards in Large Language Models

1 code implementation19 Feb 2024 Yuxia Wang, Zenan Zhai, Haonan Li, Xudong Han, Lizhi Lin, Zhenxuan Zhang, Jingru Zhao, Preslav Nakov, Timothy Baldwin

Previous studies have proposed comprehensive taxonomies of the risks posed by LLMs, as well as corresponding prompts that can be used to examine the safety mechanisms of LLMs.

Factuality of Large Language Models: A Survey

no code implementations4 Feb 2024 Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov

Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place.

Survey Text Generation

Recent Advances in Hate Speech Moderation: Multimodality and the Role of Large Models

no code implementations30 Jan 2024 Ming Shan Hee, Shivam Sharma, Rui Cao, Palash Nandi, Preslav Nakov, Tanmoy Chakraborty, Roy Ka-Wei Lee

In the evolving landscape of online communication, moderating hate speech (HS) presents an intricate challenge, compounded by the multimodal nature of digital content.

Survey

Generating Zero-shot Abstractive Explanations for Rumour Verification

1 code implementation23 Jan 2024 Iman Munire Bilal, Preslav Nakov, Rob Procter, Maria Liakata

The task of rumour verification in social media concerns assessing the veracity of a claim on the basis of conversation threads that result from it.

Few-Shot Learning Informativeness +2

Large Language Models are Few-Shot Training Example Generators: A Case Study in Fallacy Recognition

1 code implementation16 Nov 2023 Tariq Alhindi, Smaranda Muresan, Preslav Nakov

In this study, we aim to enhance existing models for fallacy recognition by incorporating additional context and by leveraging large language models to generate synthetic data, thus increasing the representation of the infrequent classes.

A Survey of Confidence Estimation and Calibration in Large Language Models

no code implementations14 Nov 2023 Jiahui Geng, Fengyu Cai, Yuxia Wang, Heinz Koeppl, Preslav Nakov, Iryna Gurevych

Assessing their confidence and calibrating them across different tasks can help mitigate risks and enable LLMs to produce better generations.

Language Modelling

A Template Is All You Meme

1 code implementation11 Nov 2023 Luke Bates, Peter Ebert Christensen, Preslav Nakov, Iryna Gurevych

Here, to aid understanding of memes, we release a knowledge base of memes and information found on www. knowyourmeme. com, which we call the Know Your Meme Knowledge Base (KYMKB), composed of more than 54, 000 images.

ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection in Arabic Text

no code implementations6 Nov 2023 Maram Hasanain, Firoj Alam, Hamdy Mubarak, Samir Abdaljalil, Wajdi Zaghouani, Preslav Nakov, Giovanni Da San Martino, Abed Alhakim Freihat

We present an overview of the ArAIEval shared task, organized as part of the first ArabicNLP 2023 conference co-located with EMNLP 2023.

Adapting Fake News Detection to the Era of Large Language Models

1 code implementation2 Nov 2023 Jinyan Su, Claire Cardie, Preslav Nakov

With the proliferation of both human-written and machine-generated real and fake news, robustly and effectively discerning the veracity of news articles has become an intricate challenge.

Fake News Detection

Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media

1 code implementation27 Oct 2023 Shubham Mittal, Megha Sundriyal, Preslav Nakov

Claim span identification (CSI) is an important step in fact-checking pipelines, aiming to identify text segments that contain a checkworthy claim or assertion in a social media post.

Cross-Lingual Transfer Fact Checking +1

From Chaos to Clarity: Claim Normalization to Empower Fact-Checking

1 code implementation22 Oct 2023 Megha Sundriyal, Tanmoy Chakraborty, Preslav Nakov

To evaluate the effectiveness of our proposed model, we meticulously compile a comprehensive real-world dataset, CLAN, comprising more than 6k instances of social media posts alongside their respective normalized claims.

Fact Checking In-Context Learning

QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking

1 code implementation11 Oct 2023 Liangming Pan, Xinyuan Lu, Min-Yen Kan, Preslav Nakov

Fact-checking real-world claims often requires complex, multi-step reasoning due to the absence of direct evidence to support or refute them.

Decision Making Fact Checking +1

Rethinking STS and NLI in Large Language Models

no code implementations16 Sep 2023 Yuxia Wang, Minghan Wang, Preslav Nakov

Recent years have seen the rise of large language models (LLMs), where practitioners use task-specific prompts; this was shown to be effective for a variety of tasks.

Natural Language Inference Semantic Textual Similarity +1

Fake News Detectors are Biased against Texts Generated by Large Language Models

no code implementations15 Sep 2023 Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, Preslav Nakov

The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society.

Misinformation

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

1 code implementation25 Aug 2023 Yuxia Wang, Haonan Li, Xudong Han, Preslav Nakov, Timothy Baldwin

With the rapid evolution of large language models (LLMs), new and hard-to-predict harmful capabilities are emerging.

bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark

2 code implementations4 Jun 2023 Momchil Hardalov, Pepa Atanasova, Todor Mihaylov, Galia Angelova, Kiril Simov, Petya Osenova, Ves Stoyanov, Ivan Koychev, Preslav Nakov, Dragomir Radev

We run the first systematic evaluation of pre-trained language models for Bulgarian, comparing and contrasting results across the nine tasks in the benchmark.

Fact Checking named-entity-recognition +5

Understanding Breast Cancer Survival: Using Causality and Language Models on Multi-omics Data

no code implementations28 May 2023 Mugariya Farooq, Shahad Hardan, Aigerim Zhumbhayeva, Yujia Zheng, Preslav Nakov, Kun Zhang

The need for more usable and explainable machine learning models in healthcare increases the importance of developing and utilizing causal discovery algorithms, which aim to discover causal relations by analyzing observational data.

Causal Discovery

Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data

1 code implementation24 May 2023 Petar Ivanov, Ivan Koychev, Momchil Hardalov, Preslav Nakov

Developing tools to automatically detect check-worthy claims in political debates and speeches can greatly help moderators of debates, journalists, and fact-checkers.

Fact Checking Misinformation

DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text

1 code implementation23 May 2023 Jinyan Su, Terry Yue Zhuo, Di Wang, Preslav Nakov

One is called DetectLLM-LRR, which is fast and efficient, and the other is called DetectLLM-NPR, which is more accurate, but slower due to the need for perturbations.

Misinformation

On the Risk of Misinformation Pollution with Large Language Models

1 code implementation23 May 2023 Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, William Yang Wang

In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems.

Misinformation Open-Domain Question Answering

Detecting Propaganda Techniques in Code-Switched Social Media Text

1 code implementation23 May 2023 Muhammad Umar Salman, Asif Hanif, Shady Shehata, Preslav Nakov

Yet, it is common to find a mix of multiple languages in social media communication, a phenomenon known as code-switching.

Propaganda detection

Fact-Checking Complex Claims with Program-Guided Reasoning

1 code implementation22 May 2023 Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, Preslav Nakov

Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning.

Fact Checking In-Context Learning

SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables

1 code implementation22 May 2023 Xinyuan Lu, Liangming Pan, Qian Liu, Preslav Nakov, Min-Yen Kan

Current scientific fact-checking benchmarks exhibit several shortcomings, such as biases arising from crowd-sourced claims and an over-reliance on text-based evidence.

Claim Verification Fact Checking

Automated Mapping of CVE Vulnerability Records to MITRE CWE Weaknesses

no code implementations13 Apr 2023 Ashraf Haddad, Najwa Aaraj, Preslav Nakov, Septimiu Fabian Mare

In recent years, a proliferation of cyber-security threats and diversity has been on the rise culminating in an increase in their reporting and analysis.

Sentence

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

1 code implementation1 Feb 2023 Muhammad Arslan Manzoor, Sarah Albarri, Ziting Xian, Zaiqiao Meng, Preslav Nakov, Shangsong Liang

This survey presents the comprehensive literature on the evolution and enhancement of deep learning multimodal architectures to deal with textual, visual and audio features for diverse cross-modal and modern multimodal tasks.

Question Answering Representation Learning +4

Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?

no code implementations26 Jan 2023 Shivam Sharma, Atharva Kulkarni, Tharun Suresh, Himanshi Mathur, Preslav Nakov, Md. Shad Akhtar, Tanmoy Chakraborty

A common problem associated with meme comprehension lies in detecting the entities referenced and characterizing the role of each of these entities.

Semantic Role Labeling

Temporal Dynamics of Coordinated Online Behavior: Stability, Archetypes, and Influence

no code implementations17 Jan 2023 Serena Tardelli, Leonardo Nizzoli, Maurizio Tesconi, Mauro Conti, Preslav Nakov, Giovanni Da San Martino, Stefano Cresci

Large-scale online campaigns, malicious or otherwise, require a significant degree of coordination among participants, which sparked interest in the study of coordinated online behavior.

Community Detection Dynamic Community Detection

Overview of the WANLP 2022 Shared Task on Propaganda Detection in Arabic

no code implementations18 Nov 2022 Firoj Alam, Hamdy Mubarak, Wajdi Zaghouani, Giovanni Da San Martino, Preslav Nakov

Thus, there has been a lot of recent research on automatic detection of propaganda techniques in text as well as in memes.

Propaganda detection

GREENER: Graph Neural Networks for News Media Profiling

no code implementations10 Nov 2022 Panayot Panayotov, Utsav Shukla, Husrev Taha Sencar, Mohamed Nabeel, Preslav Nakov

We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias.

Fake News Detection Graph Neural Network

PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training

1 code implementation5 Nov 2022 Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du

In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4. 7 points (85. 6% vs. 80. 9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1. 5 points (90. 6% vs. 92. 1%).

Fact Checking Fact Verification +5

IITD at the WANLP 2022 Shared Task: Multilingual Multi-Granularity Network for Propaganda Detection

1 code implementation31 Oct 2022 Shubham Mittal, Preslav Nakov

In addition to finding the techniques, Subtask 2 further asks to identify the textual span for each instance of each technique that is present in the tweet; the task can be modeled as a sequence tagging problem.

Multi-Label Classification Propaganda detection +1

CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media

1 code implementation10 Oct 2022 Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry Ilvovsky, Preslav Nakov

Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision.

Fact Checking

Ten Years after ImageNet: A 360° Perspective on AI

no code implementations1 Oct 2022 Sanjay Chawla, Preslav Nakov, Ahmed Ali, Wendy Hall, Issa Khalil, Xiaosong Ma, Husrev Taha Sencar, Ingmar Weber, Michael Wooldridge, Ting Yu

The rise of attention networks, self-supervised learning, generative modeling, and graph neural networks has widened the application space of AI.

Decision Making Fairness +1

DISARM: Detecting the Victims Targeted by Harmful Memes

1 code implementation Findings (NAACL) 2022 Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Finally, we show that DISARM is interpretable and comparatively more generalizable and that it can reduce the relative error rate for harmful target identification by up to 9 points absolute over several strong multimodal rivals.

Named Entity Recognition Named Entity Recognition (NER) +1

TeamX@DravidianLangTech-ACL2022: A Comparative Analysis for Troll-Based Meme Classification

no code implementations DravidianLangTech (ACL) 2022 Rabindra Nath Nandi, Firoj Alam, Preslav Nakov

The spread of fake news, propaganda, misinformation, disinformation, and harmful content online raised concerns among social media platforms, government agencies, policymakers, and society as a whole.

Meme Classification Misinformation

Detecting the Role of an Entity in Harmful Memes: Techniques and Their Limitations

1 code implementation CONSTRAINT (ACL) 2022 Rabindra Nath Nandi, Firoj Alam, Preslav Nakov

The content that is posted and shared online can be textual, visual, or a combination of both, e. g., in a meme.

Detecting and Understanding Harmful Memes: A Survey

1 code implementation9 May 2022 Shivam Sharma, Firoj Alam, Md. Shad Akhtar, Dimitar Dimitrov, Giovanni Da San Martino, Hamed Firooz, Alon Halevy, Fabrizio Silvestri, Preslav Nakov, Tanmoy Chakraborty

One interesting finding is that many types of harmful memes are not really studied, e. g., such featuring self-harm and extremism, partly due to the lack of suitable datasets.

Survey

Faking Fake News for Real Fake News Detection: Propaganda-loaded Training Data Generation

1 code implementation10 Mar 2022 Kung-Hsiang Huang, Kathleen McKeown, Preslav Nakov, Yejin Choi, Heng Ji

Despite recent advances in detecting fake news generated by neural models, their results are not readily applicable to effective detection of human-written disinformation.

Fake News Detection Natural Language Inference +1

QCRI's COVID-19 Disinformation Detector: A System to Fight the COVID-19 Infodemic in Social Media

no code implementations8 Mar 2022 Preslav Nakov, Firoj Alam, Yifan Zhang, Animesh Prakash, Fahim Dalvi

Fighting the ongoing COVID-19 infodemic has been declared as one of the most important focus areas by the World Health Organization since the onset of the COVID-19 pandemic.

Leaf: Multiple-Choice Question Generation

1 code implementation22 Jan 2022 Kristiyan Vachev, Momchil Hardalov, Georgi Karadzhov, Georgi Georgiev, Ivan Koychev, Preslav Nakov

Testing with quiz questions has proven to be an effective way to assess and improve the educational process.

Multiple-choice Question Answering +2

Batch-Softmax Contrastive Loss for Pairwise Sentence Scoring Tasks

no code implementations NAACL 2022 Anton Chernyavskiy, Dmitry Ilvovsky, Pavel Kalinin, Preslav Nakov

The use of contrastive loss for representation learning has become prominent in computer vision, and it is now getting attention in Natural Language Processing (NLP).

Sentence Sentence Embeddings

Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets

no code implementations RANLP 2013 Jörg Tiedemann, Preslav Nakov

This paper provides an analysis of character-level machine translation models used in pivot-based translation when applied to sparse and noisy datasets, such as crowdsourced movie subtitles.

Machine Translation Translation

The Spread of Propaganda by Coordinated Communities on Social Media

no code implementations27 Sep 2021 Kristina Hristakieva, Stefano Cresci, Giovanni Da San Martino, Mauro Conti, Preslav Nakov

Large-scale manipulations on social media have two important characteristics: (i) use of propaganda to influence others, and (ii) adoption of coordinated behavior to spread it and to amplify its impact.

Feature-Rich Named Entity Recognition for Bulgarian Using Conditional Random Fields

no code implementations26 Sep 2021 Georgi Georgiev, Preslav Nakov, Kuzman Ganchev, Petya Osenova, Kiril Ivanov Simov

The paper presents a feature-rich approach to the automatic recognition and categorization of named entities (persons, organizations, locations, and miscellaneous) in news text for Bulgarian.

Miscellaneous named-entity-recognition +2

Improved statistical machine translation using monolingual paraphrases

no code implementations25 Sep 2021 Preslav Nakov

We propose a novel monolingual sentence paraphrasing method for augmenting the training data for statistical machine translation systems "for free" -- by creating it from data that is already available rather than having to create more aligned data.

Machine Translation Sentence +1

RuleBert: Teaching Soft Rules to Pre-trained Language Models

1 code implementation EMNLP 2021 Mohammed Saeed, Naser Ahmadi, Preslav Nakov, Paolo Papotti

While pre-trained language models (PLMs) are the go-to solution to tackle many natural language processing problems, they are still very limited in their ability to capture and to use common-sense knowledge.

Common Sense Reasoning

A Second Pandemic? Analysis of Fake News About COVID-19 Vaccines in Qatar

no code implementations RANLP 2021 Preslav Nakov, Firoj Alam, Shaden Shaar, Giovanni Da San Martino, Yifan Zhang

While COVID-19 vaccines are finally becoming widely available, a second pandemic that revolves around the circulation of anti-vaxxer fake news may hinder efforts to recover from the first one.

Assisting the Human Fact-Checkers: Detecting All Previously Fact-Checked Claims in a Document

1 code implementation14 Sep 2021 Shaden Shaar, Nikola Georgiev, Firoj Alam, Giovanni Da San Martino, Aisha Mohamed, Preslav Nakov

The output is a re-ranked list of the document sentences, so that those that can be verified are ranked as high as possible, together with corresponding evidence.

Fact Checking Learning-To-Rank +2

Few-Shot Cross-Lingual Stance Detection with Sentiment-Based Pre-Training

1 code implementation13 Sep 2021 Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

Most research in stance detection, however, has been limited to working with a single language and on a few limited targets, with little work on cross-lingual stance detection.

Stance Detection

Predicting the Factuality of Reporting of News Media Using Observations About User Attention in Their YouTube Channels

no code implementations RANLP 2021 Krasimira Bozhanova, Yoan Dinkov, Ivan Koychev, Maria Castaldo, Tommaso Venturini, Preslav Nakov

We propose a novel framework for predicting the factuality of reporting of news media outlets by studying the user attention cycles in their YouTube channels.

Detecting Propaganda Techniques in Memes

1 code implementation ACL 2021 Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov, Giovanni Da San Martino

We further create and release a new corpus of 950 memes, carefully annotated with 22 propaganda techniques, which can appear in the text, in the image, or in both.

AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking

1 code implementation NAACL (NLP4IF) 2021 Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed, Preslav Nakov

With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages.

Fact Checking Misinformation +1

SemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images

1 code implementation SEMEVAL 2021 Dimitar Dimitrov, Bishr Bin Ali, Shaden Shaar, Firoj Alam, Fabrizio Silvestri, Hamed Firooz, Preslav Nakov, Giovanni Da San Martino

We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images: the data, the annotation guidelines, the evaluation setup, the results, and the participating systems.

Cross-Domain Label-Adaptive Stance Detection

2 code implementations EMNLP 2021 Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

In this paper, we perform an in-depth analysis of 16 stance detection datasets, and we explore the possibility for cross-domain learning from them.

Domain Adaptation Stance Detection

Transformers: "The End of History" for NLP?

no code implementations9 Apr 2021 Anton Chernyavskiy, Dmitry Ilvovsky, Preslav Nakov

Recent advances in neural architectures, such as the Transformer, coupled with the emergence of large-scale pre-trained models such as BERT, have revolutionized the field of Natural Language Processing (NLP), pushing the state of the art for a number of NLP tasks.

A Survey on Predicting the Factuality and the Bias of News Media

no code implementations16 Mar 2021 Preslav Nakov, Husrev Taha Sencar, Jisun An, Haewoon Kwak

The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim or article, either manually or automatically.

Bias Detection Fact Checking +1

Automated Fact-Checking for Assisting Human Fact-Checkers

no code implementations13 Mar 2021 Preslav Nakov, David Corney, Maram Hasanain, Firoj Alam, Tamer Elsayed, Alberto Barrón-Cedeño, Paolo Papotti, Shaden Shaar, Giovanni Da San Martino

The reporting and the analysis of current events around the globe has expanded from professional, editor-lead journalism all the way to citizen journalism.

Fact Checking

Detecting Harmful Content On Online Platforms: What Platforms Need Vs. Where Research Efforts Go

no code implementations27 Feb 2021 Arnav Arora, Preslav Nakov, Momchil Hardalov, Sheikh Muhammad Sarwar, Vibha Nayak, Yoan Dinkov, Dimitrina Zlatkova, Kyle Dent, Ameya Bhatawdekar, Guillaume Bouchard, Isabelle Augenstein

The proliferation of harmful content on online platforms is a major societal problem, which comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other.

Abusive Language Misinformation

A Survey on Stance Detection for Mis- and Disinformation Identification

no code implementations Findings (NAACL) 2022 Momchil Hardalov, Arnav Arora, Preslav Nakov, Isabelle Augenstein

Understanding attitudes expressed in texts, also known as stance detection, plays an important role in systems for detecting false information online, be it misinformation (unintentionally false) or disinformation (intentionally false information).

Fact Checking Misinformation +3

EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering

2 code implementations EMNLP 2020 Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, Preslav Nakov

We perform various experiments with existing top-performing multilingual pre-trained models and we show that EXAMS offers multiple challenges that require multilingual knowledge and reasoning in multiple domains.

Question Answering Transfer Learning

Fact-Checking, Fake News, Propaganda, and Media Bias: Truth Seeking in the Post-Truth Era

no code implementations EMNLP 2020 Preslav Nakov, Giovanni Da San Martino

The rise of social media has democratized content creation and has made it easy for everybody to share and spread information online.

Fact Checking Misinformation

Team Alex at CLEF CheckThat! 2020: Identifying Check-Worthy Tweets With Transformer Models

3 code implementations7 Sep 2020 Alex Nikolov, Giovanni Da San Martino, Ivan Koychev, Preslav Nakov

While misinformation and disinformation have been thriving in social media for years, with the emergence of the COVID-19 pandemic, the political and the health misinformation merged, thus elevating the problem to a whole new level and giving rise to the first global infodemic.

Fact Checking Misinformation

FANG: Leveraging Social Context for Fake News Detection Using Graph Representation

1 code implementation18 Aug 2020 Van-Hoang Nguyen, Kazunari Sugiyama, Preslav Nakov, Min-Yen Kan

In particular, FANG yields significant improvements for the task of fake news detection, and it is robust in the case of limited training data.

Fake News Detection Representation Learning

Can We Spot the "Fake News" Before It Was Even Written?

no code implementations10 Aug 2020 Preslav Nakov

Given the recent proliferation of disinformation online, there has been also growing research interest in automatically debunking rumors, false claims, and "fake news."

Fact Checking

On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning

1 code implementation18 Jul 2020 Guillem Ramírez, Rumen Dangovski, Preslav Nakov, Marin Soljačić

We believe that our rethinking of the Wasserstein-Procrustes problem could enable further research, thus helping to develop better algorithms for aligning word embeddings across languages.

Word Embeddings

Fighting the COVID-19 Infodemic in Social Media: A Holistic Perspective and a Call to Arms

1 code implementation15 Jul 2020 Firoj Alam, Fahim Dalvi, Shaden Shaar, Nadir Durrani, Hamdy Mubarak, Alex Nikolov, Giovanni Da San Martino, Ahmed Abdelali, Hassan Sajjad, Kareem Darwish, Preslav Nakov

With the outbreak of the COVID-19 pandemic, people turned to social media to read and to share timely information including statistics, warnings, advice, and inspirational stories.

Misinformation

Overview of CheckThat! 2020: Automatic Identification and Verification of Claims in Social Media

3 code implementations15 Jul 2020 Alberto Barron-Cedeno, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Hamdan, Alex Nikolov, Shaden Shaar, Zien Sheikh Ali

The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification.

Claim Verification Retrieval +1

Predicting the Topical Stance and Political Leaning of Media using Tweets

no code implementations ACL 2020 Peter Stefanov, Kareem Darwish, Atanas Atanasov, Preslav Nakov

Discovering the stances of media outlets and influential people on current, debatable topics is important for social statisticians and policy makers.

Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection

no code implementations30 Apr 2020 Momchil Hardalov, Ivan Koychev, Preslav Nakov

Recently, the advances in pre-trained language models, namely contextualized models such as ELMo and BERT have revolutionized the field by tapping the potential of training very large models with just a few steps of fine-tuning on a task-specific dataset.

Intent Detection Natural Language Understanding +2

SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification

no code implementations Findings (ACL) 2021 Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Marcos Zampieri, Preslav Nakov

The widespread use of offensive content in social media has led to an abundance of research in detecting language such as hate speech, cyberbullying, and cyber-aggression.

Language Identification

On the Effect of Dropping Layers of Pre-trained Transformer Models

4 code implementations8 Apr 2020 Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov

Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments.

Knowledge Distillation Sentence +1

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

no code implementations27 Feb 2020 Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, Marianne Winslett

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.

Model Compression

A Context-Aware Approach for Detecting Check-Worthy Claims in Political Debates

no code implementations14 Dec 2019 Pepa Gencheva, Ivan Koychev, Lluís Màrquez, Alberto Barrón-Cedeño, Preslav Nakov

In the context of investigative journalism, we address the problem of automatically identifying which claims in a given document are most worthy and should be prioritized for fact-checking.

Fact Checking

Proppy: A System to Unmask Propaganda in Online News

no code implementations14 Dec 2019 Alberto Barrón-Cedeño, Giovanni Da San Martino, Israa Jaradat, Preslav Nakov

We present proppy, the first publicly available real-world, real-time propaganda detection system for online news, which aims at raising awareness, thus potentially limiting the impact of propaganda and helping fight disinformation.

Propaganda detection

SemEval-2013 Task 2: Sentiment Analysis in Twitter

no code implementations SEMEVAL 2013 Preslav Nakov, Zornitsa Kozareva, Alan Ritter, Sara Rosenthal, Veselin Stoyanov, Theresa Wilson

To address this issue, we have proposed SemEval-2013 Task 2: Sentiment Analysis in Twitter, which included two subtasks: A, an expression-level subtask, and B, a message-level subtask.

Sentiment Analysis Task 2