Search Results for author: Leon Derczynski

Found 78 papers, 25 papers with code

An IDR Framework of Opportunities and Barriers between HCI and NLP

no code implementations EACL (HCINLP) 2021 Nanna Inie, Leon Derczynski

The framework is constructed by following an interdisciplinary research-model (IDR), combining field-specific knowledge with existing work in the two fields.

The Lacunae of Danish Natural Language Processing

no code implementations WS (NoDaLiDa) 2019 Andreas Kirkedal, Barbara Plank, Leon Derczynski, Natalie Schluter

Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation.

Bornholmsk Natural Language Processing: Resources and Tools

1 code implementation WS (NoDaLiDa) 2019 Leon Derczynski, Alex Speed Kjeldsen

This paper introduces language processing resources and tools for Bornholmsk, a language spoken on the island of Bornholm, with roots in Danish and closely related to Scanian.

Joint Rumour Stance and Veracity Prediction

2 code implementations WS (NoDaLiDa) 2019 Anders Edelbo Lillie, Emil Refsgaard Middelboe, Leon Derczynski

In our experiments, monolinugal scores reach stance-based veracity accuracy of 0. 83 (F1 0. 68); applying the model across languages predicts veracity of claims with an accuracy of 0. 82 (F1 0. 67).

Rumour Detection Stance Classification

Political Stance in Danish

1 code implementation WS (NoDaLiDa) 2019 Rasmus Lehmann, Leon Derczynski

Furthermore, three models based on an LSTM architecture are designed, implemented and optimized to perform the task of stance detection for the generated dataset.

Stance Detection Word Embeddings

Nemotron-4 340B Technical Report

1 code implementation17 Jun 2024 Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick Legresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward.

Synthetic Data Generation

garak: A Framework for Security Probing Large Language Models

1 code implementation16 Jun 2024 Leon Derczynski, Erick Galinkin, Jeffrey Martin, Subho Majumdar, Nanna Inie

As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly.

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation18 Apr 2024 Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Srijan Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Sarah Luger, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

no code implementations10 Nov 2023 Nanna Inie, Jonathan Stray, Leon Derczynski

As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

no code implementations29 Jun 2023 Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge

Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters.

Assessing Language Model Deployment with Risk Cards

2 code implementations31 Mar 2023 Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad

However, there is no risk-centric framework for documenting the complexity of a landscape in which some risks are shared across models and contexts, while others are specific, and where certain conditions may be required for risks to manifest as harms.

Language Modelling Text Generation

Training a T5 Using Lab-sized Resources

no code implementations25 Aug 2022 Manuel R. Ciosici, Leon Derczynski

Training large neural language models on large datasets is resource- and time-intensive.

Language Modelling Large Language Model

Sparse Probability of Agreement

no code implementations12 Aug 2022 Jeppe Nørregaard, Leon Derczynski

Measuring inter-annotator agreement is important for annotation tasks, but many metrics require a fully-annotated set of data, where all annotators annotate all samples.

Single Particle Analysis

The ITU Faroese Pairs Dataset

no code implementations17 Jun 2022 Leon Derczynski, Annika Solveig Hedegaard Isfeldt, Signhild Djurhuus

This article documents a dataset of sentence pairs between Faroese and Danish, produced at ITU Copenhagen.

Machine Translation Sentence +1

Bridging the Domain Gap for Stance Detection for the Zulu language

no code implementations6 May 2022 Gcinizwe Dlamini, Imad Eddine Ibrahim Bekkouch, Adil Khan, Leon Derczynski

This allows us to rapidly achieve similar results for stance detection for the Zulu language, the target language in this work, as are found for English.

Domain Adaptation Machine Translation +2

Detecting Abusive Albanian

no code implementations28 Jul 2021 Erida Nurce, Jorgel Keci, Leon Derczynski

The ever growing usage of social media in the recent years has had a direct impact on the increased presence of hate speech and offensive speech in online platforms.

Hate Speech Detection

Optimal Size-Performance Tradeoffs: Weighing PoS Tagger Models

no code implementations16 Apr 2021 Magnus Jacobsen, Mikkel H. Sørensen, Leon Derczynski

Improvement in machine learning-based NLP performance are often presented with bigger models and more complex code.

Part-Of-Speech Tagging POS

Detection and Resolution of Rumors and Misinformation with NLP

no code implementations COLING 2020 Leon Derczynski, Arkaitz Zubiaga

Detecting and grounding false and misleading claims on the web has grown to form a substantial sub-field of NLP.

Misinformation

Maintaining Quality in FEVER Annotation

no code implementations WS 2020 Leon Derczynski, Julie Binau, Henri Schulte

We propose two measures for measuring the quality of constructed claims in the FEVER task.

Power Consumption Variation over Activation Functions

1 code implementation12 Jun 2020 Leon Derczynski

The power that machine learning models consume when making predictions can be affected by a model's architecture.

BIG-bench Machine Learning

Accelerated High-Quality Mutual-Information Based Word Clustering

1 code implementation LREC 2020 Manuel R. Ciosici, Ira Assent, Leon Derczynski

We present efficient implementations of Brown clustering and the alternative Exchange clustering as well as a number of methods to accelerate the computation of both hierarchical and flat clusters.

Clustering Vocal Bursts Intensity Prediction

Directions in Abusive Language Training Data: Garbage In, Garbage Out

no code implementations3 Apr 2020 Bertie Vidgen, Leon Derczynski

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies.

Abusive Language

Offensive Language and Hate Speech Detection for Danish

no code implementations LREC 2020 Gudbjartur Ingi Sigurbergsson, Leon Derczynski

It contains user generated comments from various social media platforms, and to our knowledge, it is the first of its kind.

Hate Speech Detection

Simple Natural Language Processing Tools for Danish

1 code implementation27 Jun 2019 Leon Derczynski

This technical note describes a set of baseline tools for automatic processing of Danish text.

BIG-bench Machine Learning

Quantifying the morphosyntactic content of Brown Clusters

no code implementations NAACL 2019 Manuel R. Ciosici, Leon Derczynski, Ira Assent

We show that increases in Average Mutual Information, the clustering algorithms{'} optimization goal, are highly correlated with improvements in encoding of morphosyntactic information.

Clustering

Stance Prediction for Russian: Data and Analysis

2 code implementations5 Sep 2018 Nikita Lozhnikov, Leon Derczynski, Manuel Mazzara

As well as presenting this openly-available dataset, the first of its kind for Russian, the paper presents a baseline for stance prediction in the language.

General Classification Stance Classification +2

Helping Crisis Responders Find the Informative Needle in the Tweet Haystack

1 code implementation29 Jan 2018 Leon Derczynski, Kenny Meesters, Kalina Bontcheva, Diana Maynard

Messages are filtered for informativeness based on a definition of the concept drawn from prior research and crisis response experts.

General Classification Informativeness

Tracking the Diffusion of Named Entities

1 code implementation22 Dec 2017 Leon Derczynski, Matthew Rowe

Existing studies of how information diffuses across social networks have thus far concentrated on analysing and recovering the spread of deterministic innovations such as URLs, hashtags, and group membership.

Entity Extraction using GAN

Generalisation in Named Entity Recognition: A Quantitative Analysis

no code implementations11 Jan 2017 Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva

Unseen NEs, in particular, play an important role, which have a higher incidence in diverse genres such as social media than in more regular genres such as newswire.

Diversity named-entity-recognition +2

Broad Twitter Corpus: A Diverse Named Entity Recognition Resource

no code implementations COLING 2016 Leon Derczynski, Kalina Bontcheva, Ian Roberts

One of the main obstacles, hampering method development and comparative evaluation of named entity recognition in social media, is the lack of a sizeable, diverse, high quality annotated corpus, analogous to the CoNLL{'}2003 news dataset.

Diversity named-entity-recognition +2

Representation and Learning of Temporal Relations

no code implementations COLING 2016 Leon Derczynski

Determining the relative order of events and times described in text is an important problem in natural language processing.

BIG-bench Machine Learning

Desiderata for Vector-Space Word Representations

no code implementations6 Aug 2016 Leon Derczynski

A plethora of vector-space representations for words is currently available, which is growing.

Complementarity, F-score, and NLP Evaluation

no code implementations LREC 2016 Leon Derczynski

This paper addresses the problem of quantifying the differences between entity extraction systems, where in general only a small proportion a document should be selected.

Entity Extraction using GAN Information Retrieval +1

USFD: Twitter NER with Drift Compensation and Linked Data

no code implementations WS 2015 Leon Derczynski, Isabelle Augenstein, Kalina Bontcheva

This paper describes a pilot NER system for Twitter, comprising the USFD system entry to the W-NUT 2015 NER shared task.

Clustering NER

Analysis of Named Entity Recognition and Linking for Tweets

no code implementations27 Oct 2014 Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, Kalina Bontcheva

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area.

Entity Disambiguation Language Identification +4

Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines

no code implementations LREC 2014 Marta Sabou, Kalina Bontcheva, Leon Derczynski, Arno Scharl

Crowdsourcing is an emerging collaborative approach that can be used for the acquisition of annotated corpora and a wide range of other linguistic resources.

Domain Adaptation Natural Language Inference +3

Clinical TempEval

no code implementations19 Mar 2014 Steven Bethard, Leon Derczynski, James Pustejovsky, Marc Verhagen

We describe the Clinical TempEval task which is currently in preparation for the SemEval-2015 evaluation exercise.

Relation

TimeML-strict: clarifying temporal annotation

no code implementations26 Apr 2013 Leon Derczynski, Hector Llorens, Naushad UzZaman

To unify the state of current resources, and to make progress toward easy adoption of its current incarnation ISO-TimeML, this paper introduces TimeML-strict: a valid, unambiguous, and easy-to-process subset of TimeML.

valid

Question Answering Against Very-Large Text Collections

no code implementations26 Apr 2013 Leon Derczynski, Richard Shaw, Ben Solway, Jun Wang

Question answering involves developing methods to extract useful information from large collections of documents.

Information Retrieval Question Answering +1

Massively Increasing TIMEX3 Resources: A Transduction Approach

no code implementations LREC 2012 Leon Derczynski, H{\'e}ctor Llorens, Estela Saquete

Automatic annotation of temporal expressions is a research challenge of great interest in the field of information extraction.

Cannot find the paper you are looking for? You can Submit a new open access paper.