Search Results for author: Zhenghua Li

Found 57 papers, 23 papers with code

数据标注方法比较研究:以依存句法树标注为例(Comparison Study on Data Annotation Approaches: Dependency Tree Annotation as Case Study)

no code implementations • CCL 2021 • Mingyue Zhou, Chen Gong, Zhenghua Li, Min Zhang

“数据标注最重要的考虑因素是数据的质量和标注代价。我们调研发现自然语言处理领域的数据标注工作通常采用机标人校的标注方法以降低代价;同时, 很少有工作严格对比不同标注方法, 以探讨标注方法对标注质量和代价的影响。该文借助一个成熟的标注团队, 以依存句法数据标注为案例, 实验对比了机标人校、双人独立标注、及本文通过融合前两种方法所新提出的人机独立标注方法, 得到了一些初步的结论。”

Paper
Add Code

APGN: Adversarial and Parameter Generation Networks for Multi-Source Cross-Domain Dependency Parsing

no code implementations • Findings (EMNLP) 2021 • Ying Li, Meishan Zhang, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

Thanks to the strong representation learning capability of deep learning, especially pre-training techniques with language model loss, dependency parsing has achieved great performance boost in the in-domain scenario with abundant labeled training data for target domains.

Dependency Parsing Language Modelling +1

Paper
Add Code

A Coarse-to-Fine Labeling Framework for Joint Word Segmentation, POS Tagging, and Constituent Parsing

1 code implementation • CoNLL (EMNLP) 2021 • Yang Hou, Houquan Zhou, Zhenghua Li, Yu Zhang, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i. e., phrase, subphrase, word, subword).

Part-Of-Speech Tagging POS +2

Paper
Code

DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset

no code implementations • EMNLP 2020 • Lijie Wang, Ao Zhang, Kun Wu, Ke Sun, Zhenghua Li, Hua Wu, Min Zhang, Haifeng Wang

This paper describes in detail the construction process and data statistics of DuSQL.

SQL Parsing Text-To-SQL

Paper
Add Code

Stacked AMR Parsing with Silver Data

1 code implementation • Findings (EMNLP) 2021 • Qingrong Xia, Zhenghua Li, Rui Wang, Min Zhang

In particular, one recent seq-to-seq work directly fine-tunes AMR graph sequences on the encoder-decoder pre-trained language model and achieves new state-of-the-art results, outperforming previous works by a large margin.

AMR Parsing Language Modelling

Paper
Code

How Well Do Large Language Models Understand Syntax? An Evaluation by Asking Natural Language Questions

1 code implementation • 14 Nov 2023 • Houquan Zhou, Yang Hou, Zhenghua Li, Xuebin Wang, Zhefeng Wang, Xinyu Duan, Min Zhang

While recent advancements in large language models (LLMs) bring us closer to achieving artificial general intelligence, the question persists: Do LLMs truly understand language, or do they merely mimic comprehension through pattern recognition?

Prepositional Phrase Attachment Question Answering +1

Paper
Code

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions

1 code implementation • 23 Oct 2023 • Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang

In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token.

Ranked #1 on Grammatical Error Correction on MuCGEC

Grammatical Error Correction Language Modelling

Paper
Code

High-order Joint Constituency and Dependency Parsing

1 code implementation • 21 Sep 2023 • Yanggan Gu, Yang Hou, Zhefeng Wang, Xinyu Duan, Zhenghua Li

Compared to their work, we make progress in three aspects: (1) adopting a much more efficient decoding algorithm of $O(n^4)$ time complexity, (2) exploring joint modeling at the training phase, instead of only at the inference phase, (3) proposing high-order scoring components to promote constituent-dependency interaction.

Dependency Parsing Multi-Task Learning

Paper
Code

NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts

1 code implementation • 25 May 2023 • Yue Zhang, Bo Zhang, Haochen Jiang, Zhenghua Li, Chen Li, Fei Huang, Min Zhang

We introduce NaSGEC, a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains.

Grammatical Error Correction

Paper
Code

CopyNE: Better Contextual ASR by Copying Named Entities

no code implementations • 22 May 2023 • Shilin Zhou, Zhenghua Li, Yu Hong, Min Zhang, Zhefeng Wang, Baoxing Huai

However, traditional token-level ASR models have struggled with accurately transcribing entities due to the problem of homophonic and near-homophonic tokens.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

CSynGEC: Incorporating Constituent-based Syntax for Grammatical Error Correction with a Tailored GEC-Oriented Parser

no code implementations • 15 Nov 2022 • Yue Zhang, Zhenghua Li

Recently, Zhang et al. (2022) propose a syntax-aware grammatical error correction (GEC) approach, named SynGEC, showing that incorporating tailored dependency-based syntax of the input sentence is quite beneficial to GEC.

Grammatical Error Correction Sentence

Paper
Add Code

Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data

no code implementations • 31 Oct 2022 • Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Zhefeng Wang, Baoxing Huai, Min Zhang

Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries from parallel speech/text data.

Chinese Word Segmentation

Paper
Add Code

SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser

1 code implementation • 22 Oct 2022 • Yue Zhang, Bo Zhang, Zhenghua Li, Zuyi Bao, Chen Li, Min Zhang

Then, we obtain parse trees of the source incorrect sentences by projecting trees of the target correct sentences.

Ranked #5 on Grammatical Error Correction on CoNLL-2014 Shared Task

Grammatical Error Correction Syntax Representation

Paper
Code

SeSQL: Yet Another Large-scale Session-level Chinese Text-to-SQL Dataset

no code implementations • 26 Aug 2022 • Saihao Huang, Lijie Wang, Zhenghua Li, Zeyang Liu, Chenhui Dou, Fukang Yan, Xinyan Xiao, Hua Wu, Min Zhang

As the first session-level Chinese dataset, CHASE contains two separate parts, i. e., 2, 003 sessions manually constructed from scratch (CHASE-C), and 3, 456 sessions translated from English SParC (CHASE-T).

SQL Parsing Text-To-SQL

Paper
Add Code

Mining Error Templates for Grammatical Error Correction

2 code implementations • 23 Jun 2022 • Yue Zhang, Haochen Jiang, Zuyi Bao, Bo Zhang, Chen Li, Zhenghua Li

We have accumulated 1, 119 error templates for Chinese GEC based on this method.

Grammatical Error Correction Language Modelling

455

Paper
Code

MuCPAD: A Multi-Domain Chinese Predicate-Argument Dataset

1 code implementation • NAACL 2022 • Yahui Liu, Haoping Yang, Chen Gong, Qingrong Xia, Zhenghua Li, Min Zhang

1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates.

Semantic Role Labeling

Paper
Code

Faster and Better Grammar-based Text-to-SQL Parsing via Clause-level Parallel Decoding and Alignment Loss

no code implementations • 26 Apr 2022 • Kun Wu, Lijie Wang, Zhenghua Li, Xinyan Xiao

Grammar-based parsers have achieved high performance in the cross-domain text-to-SQL parsing task, but suffer from low decoding efficiency due to the much larger number of actions for grammar selection than that of tokens in SQL queries.

SQL Parsing Text-To-SQL

Paper
Add Code

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

2 code implementations • NAACL 2022 • Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei Huang, Min Zhang

This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7, 063 sentences collected from three Chinese-as-a-Second-Language (CSL) learner sources.

Grammatical Error Correction Sentence

455

Paper
Code

Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

1 code implementation • Findings (ACL) 2022 • Houquan Zhou, Yang Li, Zhenghua Li, Min Zhang

In recent years, large-scale pre-trained language models (PLMs) have made extraordinary progress in most NLP tasks.

POS POS Tagging

Paper
Code

Fast and Accurate End-to-End Span-based Semantic Role Labeling as Word-based Graph Parsing

1 code implementation • COLING 2022 • Shilin Zhou, Qingrong Xia, Zhenghua Li, Yu Zhang, Yu Hong, Min Zhang

Moreover, we propose a simple constrained Viterbi procedure to ensure the legality of the output graph according to the constraints of the SRL structure.

Chinese Word Segmentation named-entity-recognition +3

Paper
Code

An In-depth Study on Internal Structure of Chinese Words

1 code implementation • ACL 2021 • Chen Gong, Saihao Huang, Houquan Zhou, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information.

Sentence

Paper
Code

A Unified Span-Based Approach for Opinion Mining with Syntactic Constituents

1 code implementation • NAACL 2021 • Qingrong Xia, Bo Zhang, Rui Wang, Zhenghua Li, Yue Zhang, Fei Huang, Luo Si, Min Zhang

Fine-grained opinion mining (OM) has achieved increasing attraction in the natural language processing (NLP) community, which aims to find the opinion structures of {``}Who expressed what opinions towards what{''} in one sentence.

Multi-Task Learning Opinion Mining +1

Paper
Code

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

1 code implementation • EMNLP 2021 • Kun Wu, Lijie Wang, Zhenghua Li, Ao Zhang, Xinyan Xiao, Hua Wu, Min Zhang, Haifeng Wang

For better distribution matching, we require that at least 80% of SQL patterns in the training data are covered by generated queries.

Data Augmentation Question Generation +3

1,694

Paper
Code

Semi-supervised Domain Adaptation for Dependency Parsing via Improved Contextualized Word Representations

no code implementations • COLING 2020 • Ying Li, Zhenghua Li, Min Zhang

The major challenge for current parsing research is to improve parsing performance on out-of-domain texts that are very different from the in-domain training data when there is only a small-scale out-domain labeled data.

Dependency Parsing Domain Adaptation +2

Paper
Add Code

Semantic Role Labeling with Heterogeneous Syntactic Knowledge

1 code implementation • COLING 2020 • Qingrong Xia, Rui Wang, Zhenghua Li, Yue Zhang, Min Zhang

Recently, due to the interplay between syntax and semantics, incorporating syntactic knowledge into neural semantic role labeling (SRL) has achieved much attention.

Semantic Role Labeling

Paper
Code

Multi-grained Chinese Word Segmentation with Weakly Labeled Data

no code implementations • COLING 2020 • Chen Gong, Zhenghua Li, Bowei Zou, Min Zhang

Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1. 12 and 5. 97 on NEWS and BAIKE data in F1.

Chinese Word Segmentation Sentence

Paper
Add Code

Fast and Accurate Neural CRF Constituency Parsing

2 code implementations • IJCAI 2020 • Yu Zhang, Houquan Zhou, Zhenghua Li

Estimating probability distribution is one of the core issues in the NLP field.

Ranked #1 on Constituency Parsing on CTB7

Constituency Parsing Dependency Parsing

811

Paper
Code

Syntax-Aware Opinion Role Labeling with Dependency Graph Convolutional Networks

no code implementations • ACL 2020 • Bo Zhang, Yue Zhang, Rui Wang, Zhenghua Li, Min Zhang

The experimental results show that syntactic information is highly valuable for ORL, and our final MTL model effectively boosts the F1 score by 9. 29 over the syntax-agnostic baseline.

Fine-Grained Opinion Analysis Multi-Task Learning

Paper
Add Code

Efficient Second-Order TreeCRF for Neural Dependency Parsing

2 code implementations • ACL 2020 • Yu Zhang, Zhenghua Li, Min Zhang

Experiments and analysis on 27 datasets from 13 languages clearly show that techniques developed before the DL era, such as structural learning (global TreeCRF loss) and high-order modeling are still useful, and can further boost parsing performance over the state-of-the-art biaffine parser, especially for partially annotated training data.

Ranked #1 on Dependency Parsing on CoNLL-2009

Chinese Dependency Parsing Dependency Parsing

811

Paper
Code

Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing?

1 code implementation • 6 Mar 2020 • Houquan Zhou, Yu Zhang, Zhenghua Li, Min Zhang

In the pre deep learning era, part-of-speech tags have been considered as indispensable ingredients for feature engineering in dependency parsing.

Dependency Parsing Feature Engineering +4

Paper
Code

A Syntax-aware Multi-task Learning Framework for Chinese Semantic Role Labeling

1 code implementation • IJCNLP 2019 • Qingrong Xia, Zhenghua Li, Min Zhang

In this paper, we adopt a simple unified span-based model for both span-based and word-based Chinese SRL as a strong baseline.

Chinese Semantic Role Labeling Multi-Task Learning +2

Paper
Code

SUDA-Alibaba at MRP 2019: Graph-Based Models with BERT

no code implementations • CONLL 2019 • Yue Zhang, Wei Jiang, Qingrong Xia, Junjie Cao, Rui Wang, Zhenghua Li, Min Zhang

Our final submission ranks the third on the overall MRP evaluation metric, the first on EDS and the second on UCCA.

Multi-Task Learning POS +1

Paper
Add Code

Syntax-aware Neural Semantic Role Labeling

1 code implementation • 22 Jul 2019 • Qingrong Xia, Zhenghua Li, Min Zhang, Meishan Zhang, Guohong Fu, Rui Wang, Luo Si

Semantic role labeling (SRL), also known as shallow semantic parsing, is an important yet challenging task in NLP.

Semantic Parsing Semantic Role Labeling +1

Paper
Code

Semi-supervised Domain Adaptation for Dependency Parsing

1 code implementation • ACL 2019 • Zhenghua Li, Xue Peng, Min Zhang, Rui Wang, Luo Si

During the past decades, due to the lack of sufficient labeled data, most studies on cross-domain parsing focus on unsupervised domain adaptation, assuming there is no target-domain training data.

Chinese Dependency Parsing Dependency Parsing +3

Paper
Code

HLT@SUDA at SemEval-2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing

no code implementations • SEMEVAL 2019 • Wei Jiang, Zhenghua Li, Yu Zhang, Min Zhang

The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery.

Ranked #1 on UCCA Parsing on SemEval 2019 Task 1

General Classification Multi-Task Learning +1

Paper
Add Code

Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations

no code implementations • NAACL 2019 • Meishan Zhang, Zhenghua Li, Guohong Fu, Min Zhang

Syntax has been demonstrated highly effective in neural machine translation (NMT).

Ranked #8 on Machine Translation on IWSLT2015 English-Vietnamese

Machine Translation NMT +2

Paper
Add Code

HLT@SUDA at SemEval 2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing

no code implementations • 11 Mar 2019 • Wei Jiang, Zhenghua Li, Yu Zhang, Min Zhang

The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery.

General Classification UCCA Parsing