no code implementations • EMNLP 2020 • Lijie Wang, Ao Zhang, Kun Wu, Ke Sun, Zhenghua Li, Hua Wu, Min Zhang, Haifeng Wang
This paper describes in detail the construction process and data statistics of DuSQL.
no code implementations • Findings (EMNLP) 2021 • Ying Li, Meishan Zhang, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan
Thanks to the strong representation learning capability of deep learning, especially pre-training techniques with language model loss, dependency parsing has achieved great performance boost in the in-domain scenario with abundant labeled training data for target domains.
1 code implementation • Findings (EMNLP) 2021 • Qingrong Xia, Zhenghua Li, Rui Wang, Min Zhang
In particular, one recent seq-to-seq work directly fine-tunes AMR graph sequences on the encoder-decoder pre-trained language model and achieves new state-of-the-art results, outperforming previous works by a large margin.
1 code implementation • CoNLL (EMNLP) 2021 • Yang Hou, Houquan Zhou, Zhenghua Li, Yu Zhang, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan
In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i. e., phrase, subphrase, word, subword).
no code implementations • CCL 2021 • Mingyue Zhou, Chen Gong, Zhenghua Li, Min Zhang
“数据标注最重要的考虑因素是数据的质量和标注代价。我们调研发现自然语言处理领域的数据标注工作通常采用机标人校的标注方法以降低代价;同时, 很少有工作严格对比不同标注方法, 以探讨标注方法对标注质量和代价的影响。该文借助一个成熟的标注团队, 以依存句法数据标注为案例, 实验对比了机标人校、双人独立标注、及本文通过融合前两种方法所新提出的人机独立标注方法, 得到了一些初步的结论。”
1 code implementation • 21 Feb 2025 • Houquan Zhou, Bo Zhang, Zhenghua Li, Ming Yan, Min Zhang
To address this issue, we introduce the task of General Chinese Character Error Correction (C2EC), which focuses on all three types of character errors.
1 code implementation • 20 Feb 2025 • Yanggan Gu, Junzhuo Li, Sirui Huang, Xin Zou, Zhenghua Li, Xuming Hu
Based on this, PAD comprises three key steps: (1) sampling diverse responses using high-temperature; (2) computing rewards for both teacher and student to construct their intrinsic preference; and (3) training the student's intrinsic preference distribution to align with the teacher's.
no code implementations • 17 Dec 2024 • Ziheng Qiao, Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang
One key characteristic of the Chinese spelling check (CSC) task is that incorrect characters are usually similar to the correct ones in either phonetics or glyph.
1 code implementation • 12 Dec 2024 • Xuebin Wang, Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Yang Hou
Inspired by early research on exploring naturally annotated data for Chinese Word Segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to explicitly mine word boundaries from speech-text parallel data.
1 code implementation • 5 Oct 2024 • Houquan Zhou, Zhenghua Li, Bo Zhang, Chen Li, Shaopeng Lai, Ji Zhang, Fei Huang, Min Zhang
This work proposes a simple training-free prompt-free approach to leverage large language models (LLMs) for the Chinese spelling correction (CSC) task, which is totally different from all previous CSC approaches.
1 code implementation • 6 Jun 2024 • Yang Hou, Zhenghua Li
Revealing the syntactic structure of sentences in Chinese poses significant challenges for word-level parsers due to the absence of clear word boundaries.
1 code implementation • 14 Nov 2023 • Houquan Zhou, Yang Hou, Zhenghua Li, Xuebin Wang, Zhefeng Wang, Xinyu Duan, Min Zhang
While recent advancements in large language models (LLMs) bring us closer to achieving artificial general intelligence, the question persists: Do LLMs truly understand language, or do they merely mimic comprehension through pattern recognition?
1 code implementation • 23 Oct 2023 • Houquan Zhou, Yumeng Liu, Zhenghua Li, Min Zhang, Bo Zhang, Chen Li, Ji Zhang, Fei Huang
In this paper, we propose a unified decoding intervention framework that employs an external critic to assess the appropriateness of the token to be generated incrementally, and then dynamically influence the choice of the next token.
Ranked #1 on
Grammatical Error Correction
on MuCGEC
1 code implementation • 21 Sep 2023 • Yanggan Gu, Yang Hou, Zhefeng Wang, Xinyu Duan, Zhenghua Li
Compared to their work, we make progress in three aspects: (1) adopting a much more efficient decoding algorithm of $O(n^4)$ time complexity, (2) exploring joint modeling at the training phase, instead of only at the inference phase, (3) proposing high-order scoring components to promote constituent-dependency interaction.
1 code implementation • 25 May 2023 • Yue Zhang, Bo Zhang, Haochen Jiang, Zhenghua Li, Chen Li, Fei Huang, Min Zhang
We introduce NaSGEC, a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains.
1 code implementation • 22 May 2023 • Shilin Zhou, Zhenghua Li, Yu Hong, Min Zhang, Zhefeng Wang, Baoxing Huai
Previous approaches have attempted to address this by utilizing the NE dictionary.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 15 Nov 2022 • Yue Zhang, Zhenghua Li
Recently, Zhang et al. (2022) propose a syntax-aware grammatical error correction (GEC) approach, named SynGEC, showing that incorporating tailored dependency-based syntax of the input sentence is quite beneficial to GEC.
no code implementations • 31 Oct 2022 • Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Zhefeng Wang, Baoxing Huai, Min Zhang
Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries from parallel speech/text data.
1 code implementation • 22 Oct 2022 • Yue Zhang, Bo Zhang, Zhenghua Li, Zuyi Bao, Chen Li, Min Zhang
Then, we obtain parse trees of the source incorrect sentences by projecting trees of the target correct sentences.
no code implementations • 26 Aug 2022 • Saihao Huang, Lijie Wang, Zhenghua Li, Zeyang Liu, Chenhui Dou, Fukang Yan, Xinyan Xiao, Hua Wu, Min Zhang
As the first session-level Chinese dataset, CHASE contains two separate parts, i. e., 2, 003 sessions manually constructed from scratch (CHASE-C), and 3, 456 sessions translated from English SParC (CHASE-T).
2 code implementations • 23 Jun 2022 • Yue Zhang, Haochen Jiang, Zuyi Bao, Bo Zhang, Chen Li, Zhenghua Li
We have accumulated 1, 119 error templates for Chinese GEC based on this method.
1 code implementation • NAACL 2022 • Yahui Liu, Haoping Yang, Chen Gong, Qingrong Xia, Zhenghua Li, Min Zhang
1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates.
no code implementations • 26 Apr 2022 • Kun Wu, Lijie Wang, Zhenghua Li, Xinyan Xiao
Grammar-based parsers have achieved high performance in the cross-domain text-to-SQL parsing task, but suffer from low decoding efficiency due to the much larger number of actions for grammar selection than that of tokens in SQL queries.
2 code implementations • NAACL 2022 • Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei Huang, Min Zhang
This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7, 063 sentences collected from three Chinese-as-a-Second-Language (CSL) learner sources.
1 code implementation • Findings (ACL) 2022 • Houquan Zhou, Yang Li, Zhenghua Li, Min Zhang
In recent years, large-scale pre-trained language models (PLMs) have made extraordinary progress in most NLP tasks.
1 code implementation • COLING 2022 • Shilin Zhou, Qingrong Xia, Zhenghua Li, Yu Zhang, Yu Hong, Min Zhang
Moreover, we propose a simple constrained Viterbi procedure to ensure the legality of the output graph according to the constraints of the SRL structure.
1 code implementation • NAACL 2021 • Qingrong Xia, Bo Zhang, Rui Wang, Zhenghua Li, Yue Zhang, Fei Huang, Luo Si, Min Zhang
Fine-grained opinion mining (OM) has achieved increasing attraction in the natural language processing (NLP) community, which aims to find the opinion structures of {``}Who expressed what opinions towards what{''} in one sentence.
1 code implementation • ACL 2021 • Chen Gong, Saihao Huang, Houquan Zhou, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan
Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information.
1 code implementation • EMNLP 2021 • Kun Wu, Lijie Wang, Zhenghua Li, Ao Zhang, Xinyan Xiao, Hua Wu, Min Zhang, Haifeng Wang
For better distribution matching, we require that at least 80% of SQL patterns in the training data are covered by generated queries.
no code implementations • COLING 2020 • Ying Li, Zhenghua Li, Min Zhang
The major challenge for current parsing research is to improve parsing performance on out-of-domain texts that are very different from the in-domain training data when there is only a small-scale out-domain labeled data.
no code implementations • COLING 2020 • Chen Gong, Zhenghua Li, Bowei Zou, Min Zhang
Detailed evaluation shows that our proposed model with weakly labeled data significantly outperforms the state-of-the-art MWS model by 1. 12 and 5. 97 on NEWS and BAIKE data in F1.
1 code implementation • COLING 2020 • Qingrong Xia, Rui Wang, Zhenghua Li, Yue Zhang, Min Zhang
Recently, due to the interplay between syntax and semantics, incorporating syntactic knowledge into neural semantic role labeling (SRL) has achieved much attention.
2 code implementations • IJCAI 2020 • Yu Zhang, Houquan Zhou, Zhenghua Li
Estimating probability distribution is one of the core issues in the NLP field.
Ranked #1 on
Constituency Parsing
on CTB7
no code implementations • ACL 2020 • Bo Zhang, Yue Zhang, Rui Wang, Zhenghua Li, Min Zhang
The experimental results show that syntactic information is highly valuable for ORL, and our final MTL model effectively boosts the F1 score by 9. 29 over the syntax-agnostic baseline.
2 code implementations • ACL 2020 • Yu Zhang, Zhenghua Li, Min Zhang
Experiments and analysis on 27 datasets from 13 languages clearly show that techniques developed before the DL era, such as structural learning (global TreeCRF loss) and high-order modeling are still useful, and can further boost parsing performance over the state-of-the-art biaffine parser, especially for partially annotated training data.
Ranked #1 on
Dependency Parsing
on CoNLL-2009
1 code implementation • 6 Mar 2020 • Houquan Zhou, Yu Zhang, Zhenghua Li, Min Zhang
In the pre deep learning era, part-of-speech tags have been considered as indispensable ingredients for feature engineering in dependency parsing.
1 code implementation • IJCNLP 2019 • Qingrong Xia, Zhenghua Li, Min Zhang
In this paper, we adopt a simple unified span-based model for both span-based and word-based Chinese SRL as a strong baseline.
no code implementations • CONLL 2019 • Yue Zhang, Wei Jiang, Qingrong Xia, Junjie Cao, Rui Wang, Zhenghua Li, Min Zhang
Our final submission ranks the third on the overall MRP evaluation metric, the first on EDS and the second on UCCA.
1 code implementation • 22 Jul 2019 • Qingrong Xia, Zhenghua Li, Min Zhang, Meishan Zhang, Guohong Fu, Rui Wang, Luo Si
Semantic role labeling (SRL), also known as shallow semantic parsing, is an important yet challenging task in NLP.
1 code implementation • ACL 2019 • Zhenghua Li, Xue Peng, Min Zhang, Rui Wang, Luo Si
During the past decades, due to the lack of sufficient labeled data, most studies on cross-domain parsing focus on unsupervised domain adaptation, assuming there is no target-domain training data.
no code implementations • SEMEVAL 2019 • Wei Jiang, Zhenghua Li, Yu Zhang, Min Zhang
The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery.
Ranked #1 on
UCCA Parsing
on SemEval 2019 Task 1
no code implementations • NAACL 2019 • Meishan Zhang, Zhenghua Li, Guohong Fu, Min Zhang
Syntax has been demonstrated highly effective in neural machine translation (NMT).
Ranked #8 on
Machine Translation
on IWSLT2015 English-Vietnamese
no code implementations • 11 Mar 2019 • Wei Jiang, Zhenghua Li, Yu Zhang, Min Zhang
The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery.
1 code implementation • COLING 2018 • Yaosheng Yang, Wenliang Chen, Zhenghua Li, Zhengqiu He, Min Zhang
A bottleneck problem with Chinese named entity recognition (NER) in new domains is the lack of annotated data.
Chinese Named Entity Recognition
named-entity-recognition
+6
no code implementations • ACL 2018 • Xinzhou Jiang, Zhenghua Li, Bo Zhang, Min Zhang, Sheng Li, Luo Si
Treebank conversion is a straightforward and effective way to exploit various heterogeneous treebanks for boosting parsing performance.
no code implementations • 11 Jan 2018 • Zhengqiu He, Wenliang Chen, Zhenghua Li, Meishan Zhang, Wei zhang, Min Zhang
First, we encode the context of entities on a dependency tree as sentence-level entity embedding based on tree-GRU.
no code implementations • IJCNLP 2017 • Yue Zhang, Zhenghua Li, Jun Lang, Qingrong Xia, Min Zhang
This paper describes and compares two straightforward approaches for dependency parsing with partial annotations (PA).
no code implementations • EMNLP 2017 • Chen Gong, Zhenghua Li, Min Zhang, Xinzhou Jiang
Traditionally, word segmentation (WS) adopts the single-grained formalism, where a sentence corresponds to a single word sequence.
no code implementations • COLING 2016 • Wenliang Chen, Zhenjie Zhang, Zhenghua Li, Min Zhang
In this paper, we propose an approach to learn distributed representations of users and items from text comments for recommendation systems.
no code implementations • 29 Sep 2016 • Zhenghua Li, Yue Zhang, Jiayuan Chao, Min Zhang
The first approach is previously proposed to directly train a log-linear graph-based parser (LLGPar) with PA based on a forest-based objective.
no code implementations • 4 Aug 2016 • Qingrong Xia, Zhenghua Li, Jiayuan Chao, Min Zhang
This paper describes our system designed for the NLPCC 2016 shared task on word segmentation on micro-blog texts.