However, previous work only used the biaffine method at the end of the dependency parser as a scorer, and its application in multi-layer form is ignored.
And their experiments show that leveraging the answer summaries helps to attend the essential information in original lengthy answers and improve the answer selection performance under certain circumstances.
Modern writing assistance applications are always equipped with a Grammatical Error Correction (GEC) model to correct errors in user-entered sentences.
In this paper, we propose Preference Ranking Optimization (PRO) as an alternative to PPO for directly aligning LLMs with the Bradley-Terry comparison.
We demonstrate that SeMem improves the scalability of semiparametric LMs for continual learning over streaming data in two ways: (1) data-wise scalability: as the model becomes stronger through continual learning, it will encounter fewer difficult cases that need to be memorized, causing the growth of the non-parametric memory to slow down over time rather than growing at a linear rate with the size of training data; (2) model-wise scalability: SeMem allows a larger model to memorize fewer samples than its smaller counterpart because it is rarer for a larger model to encounter incomprehensible cases, resulting in a non-parametric memory that does not scale linearly with model size.
Multi-intent Spoken Language Understanding has great potential for widespread implementation.
However, in this paradigm, there exists a huge gap between the classification tasks with sophisticated label hierarchy and the masked language model (MLM) pretraining tasks of PLMs and thus the potentials of PLMs can not be fully tapped.
Experiments on the open datasets verify that our model outperforms the existing calibration methods and achieves a significant improvement on the calibration metric.
Hierarchical text classification is a challenging subtask of multi-label classification due to its complex label hierarchy.
Machine Reading Comprehension(MRC) has achieved a remarkable result since some powerful models, such as BERT, are proposed.
To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.
To handle this problem, we propose a density-based dynamic curriculum learning model.
In this paper, we make the first step towards controllable generation of comments, by building a system that can explicitly control the emotion of the generated comments.
In this paper, we attempt to bridge these two lines of research and propose a joint and domain adaptive approach to SLU.
Recently, researchers have explored using the encoder-decoder framework to tackle dialogue state tracking (DST), which is a key component of task-oriented dialogue systems.
In this paper, we propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC).
Aspect Sentiment Triplet Extraction (ASTE) aims to extract triplets from a sentence, including target entities, associated sentiment polarities, and opinion spans which rationalize the polarities.
Ranked #8 on Aspect Sentiment Triplet Extraction on ASTE-Data-V2
To improve the efficiency of trying different skip connection architectures, we apply the idea of network morphism to add skip connections as a procedure of fine-tuning.
In this paper, we exploit syntactic awareness to the model by the graph attention network on the dependency tree structure and external pre-training knowledge by BERT language model, which helps to model the interaction between the context and aspect words better.
Recently, researches have explored the graph neural network (GNN) techniques on text classification, since GNN does well in handling complex structures and preserving global information.
Ranked #2 on Text Classification on Ohsumed
Aspect term extraction (ATE) aims at identifying all aspect terms in a sentence and is usually modeled as a sequence labeling problem.
Ranked #1 on Term Extraction on SemEval 2014 Task 4 Laptop
As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level.
The lack of labeled data is one of the main challenges when building a task-oriented dialogue system.
In the experiments, we take a real-world sememe knowledge base HowNet and the corresponding descriptions of the words in Baidu Wiki for training and evaluation.
Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer.
Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.
We evaluate our approach on two review datasets, Yelp and Amazon.
Ranked #6 on Unsupervised Text Style Transfer on Yelp
In this work, we supervise the learning of the representation of the source content with that of the summary.
Identifying implicit discourse relations between text spans is a challenging task because it requires understanding the meaning of the text.
The decoding of the complex structure model is regularized by the additionally trained simple structure model.
Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications.
Document-level sentiment classification aims to assign the user reviews a sentiment polarity.
Ranked #5 on Sentiment Analysis on User and product information
Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance.
In this paper, we argue that both targets and contexts deserve special treatment and need to be learned their own representations via interactive learning.
For the task of relation extraction, distant supervision is an efficient approach to generate labeled data by aligning knowledge base with free texts.
Previous work introduced transition-based algorithms to form a unified architecture of parsing rhetorical structures (including span, nuclearity and relation), but did not achieve satisfactory performance.
Ranked #5 on Discourse Parsing on RST-DT
In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization.
Grammatical Error Diagnosis for Chinese has always been a challenge for both foreign learners and NLP researchers, for the variousity of grammar and the flexibility of expression.
To the best of our knowledge, we are the first to tackle the imbalance problem in multi-label classification with many labels.
Previous research on relation classification has verified the effectiveness of using dependency shortest paths or subtrees.
Ranked #5 on Relation Classification on SemEval 2010 Task 8