Search Results for author: Siyao Peng

Found 22 papers, 10 papers with code

Adpositional Supersenses for Mandarin Chinese

no code implementations • 6 Dec 2018 • YIlun Zhu, Yang Liu, Siyao Peng, Austin Blodgett, Yushi Zhao, Nathan Schneider

This study adapts Semantic Network of Adposition and Case Supersenses (SNACS) annotation to Mandarin Chinese and demonstrates that the same supersense categories are appropriate for Chinese adposition semantics.

Machine Translation Translation

Paper
Add Code

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

1 code implementation • WS 2019 • Yue Yu, YIlun Zhu, Yang Liu, Yan Liu, Siyao Peng, Mackenzie Gong, Amir Zeldes

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection.

Connective Detection Discourse Segmentation +2

Paper
Code

All Roads Lead to UD: Converting Stanford and Penn Parses to English Universal Dependencies with Multilayer Annotations

no code implementations • COLING 2018 • Siyao Peng, Amir Zeldes

We describe and evaluate different approaches to the conversion of gold standard corpus data from Stanford Typed Dependencies (SD) and Penn-style constituent trees to the latest English Universal Dependencies representation (UD 2. 2).

coreference-resolution NER

Paper
Add Code

Modeling Long-Range Context for Concurrent Dialogue Acts Recognition

no code implementations • 2 Sep 2019 • Yue Yu, Siyao Peng, Grace Hui Yang

Previous work on DA recognition either assumes one DA per utterance or fails to realize the sequential nature of dialogues.

Sentence

Paper
Add Code

A Corpus of Adpositional Supersenses for Mandarin Chinese

no code implementations • LREC 2020 • Siyao Peng, Yang Liu, YIlun Zhu, Austin Blodgett, Yushi Zhao, Nathan Schneider

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language.

Translation

Paper
Add Code

AMALGUM -- A Free, Balanced, Multilayer English Web Corpus

1 code implementation • LREC 2020 • Luke Gessler, Siyao Peng, Yang Liu, YIlun Zhu, Shabnam Behzad, Amir Zeldes

We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory.

coreference-resolution

Paper
Code

DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection

1 code implementation • EMNLP (DISRPT) 2021 • Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes

This paper describes our submission to the DISRPT2021 Shared Task on Discourse Unit Segmentation, Connective Detection, and Relation Classification.

Classification Connective Detection +6

Paper
Code

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

1 code implementation • COLING (LAW) 2020 • Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Bradford Salen, Nathan Schneider

We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish.

Paper
Code

Chinese Discourse Annotation Reference Manual

no code implementations • 11 Oct 2022 • Siyao Peng, Yang Janet Liu, Amir Zeldes

This document provides extensive guidelines and examples for Rhetorical Structure Theory (RST) annotation in Mandarin Chinese.

Paper
Add Code

GCDT: A Chinese RST Treebank for Multigenre and Multilingual Discourse Parsing

1 code implementation • 19 Oct 2022 • Siyao Peng, Yang Janet Liu, Amir Zeldes

A lack of large-scale human-annotated data has hampered the hierarchical discourse parsing of Chinese.

Discourse Parsing

Paper
Code

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

1 code implementation • 3 Jun 2023 • Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes

We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

coreference-resolution Dependency Parsing +2

Paper
Code

Incorporating Singletons and Mention-based Features in Coreference Resolution via Multi-task Learning for Better Generalization

1 code implementation • 20 Sep 2023 • YIlun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

Previous attempts to incorporate a mention detection step into end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention span data as well as other entity information.

Ranked #1 on Coreference Resolution on OntoGUM

coreference-resolution Multi-Task Learning

Paper
Code

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

1 code implementation • 2 Feb 2024 • Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank

Named Entity Recognition (NER) is a key information extraction task with a long-standing tradition.

Key Information Extraction named-entity-recognition +2

Paper
Code

EEVEE: An Easy Annotation Tool for Natural Language Processing

no code implementations • 5 Feb 2024 • Axel Sorensen, Siyao Peng, Barbara Plank, Rob van der Goot

Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets.

text-classification Text Classification

Paper
Add Code

VariErr NLI: Separating Annotation Error from Human Label Variation

no code implementations • 4 Mar 2024 • Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank

To fill this gap, we introduce a systematic methodology and a new dataset, VariErr (variation versus error), focusing on the NLI task in English.

valid

Paper
Add Code

MaiBaam Annotation Guidelines

no code implementations • 9 Mar 2024 • Verena Blaschke, Barbara Kovačić, Siyao Peng, Barbara Plank

This document provides the annotation guidelines for MaiBaam, a Bavarian corpus annotated with part-of-speech (POS) tags and syntactic dependencies.

POS

Paper
Add Code

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

no code implementations • 15 Mar 2024 • Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze, Barbara Plank

Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages.

POS POS Tagging

Paper
Add Code

Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

1 code implementation • 19 Mar 2024 • Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank

Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects.

Dialect Identification Multi-Task Learning +3

Paper
Code

eRST: A Signaled Graph Theory of Discourse Relations and Organization

no code implementations • 20 Mar 2024 • Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler

In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST).

Paper
Add Code

SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution

1 code implementation • 25 Mar 2024 • YIlun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

We then propose a two-step neural mention and coreference resolution system, named SPLICE, and compare its performance to the end-to-end approach in two scenarios: the OntoNotes test set and the out-of-domain (OOD) OntoGUM corpus.

Avg coreference-resolution +1

Paper
Code

Tencent submission for WMT20 Quality Estimation Shared Task

no code implementations • WMT (EMNLP) 2020 • Haijiang Wu, Zixuan Wang, Qingsong Ma, Xinjie Wen, Ruichen Wang, Xiaoli Wang, Yulin Zhang, Zhipeng Yao, Siyao Peng

This paper presents Tencent’s submission to the WMT20 Quality Estimation (QE) Shared Task: Sentence-Level Post-editing Effort for English-Chinese in Task 2.

Machine Translation Sentence +2

Paper
Add Code

Overview of AMALGUM – Large Silver Quality Annotations across English Genres

no code implementations • SCiL 2021 • Luke Gessler, Siyao Peng, Yang Liu, YIlun Zhu, Shabnam Behzad, Amir Zeldes

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.