Knowledge-enriched text generation poses unique challenges in modeling and learning, driving active research in several core directions, ranging from integrated modeling of neural representations and symbolic information in the sequential/hierarchical/graphical structures, learning without direct supervisions due to the cost of structured annotation, efficient optimization and inference with massive and global constraints, to language grounding on multiple modalities, and generative reasoning with implicit commonsense knowledge and background knowledge.
Knowledge in natural language processing (NLP) has been a rising trend especially after the advent of large scale pre-trained models.
The AI inequality is caused by (1) the technology divide in who has access to AI technologies in gig work; and (2) the data divide in who owns the data in gig work leads to unfair working conditions, growing pay gap, neglect of workers' diverse preferences, and workers' lack of trust in the platforms.
Multi-task learning (MTL) has become increasingly popular in natural language processing (NLP) because it improves the performance of related tasks by exploiting their commonalities and differences.
A set of knowledge experts seek diverse reasoning on KG to encourage various generation outputs.
In this paper, we present a comprehensive and systematic survey of graph data augmentation that summarizes the literature in a structured manner.
In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary.
Automatic construction of a taxonomy supports many applications in e-commerce, web search, and question answering.
In this work, we propose a novel link prediction method that enhances graph learning by the counterfactual inference.
Ranked #1 on Link Property Prediction on ogbl-ddi
Existing work linearize table cells and heavily rely on modifying deep language models such as BERT which only captures related cells information in the same table.
The recent success of graph neural networks has significantly boosted molecular property prediction, advancing activities such as drug discovery.
Ranked #1 on Molecular Property Prediction (1-shot)) on Tox21
In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts.
Transfer Learning Software Engineering
It can be used to validate the label consistency (or catches the inconsistency) in multiple sets of NER data annotation.
The success of gragh neural networks (GNNs) in the past years has aroused grow-ing interest and effort in designing best models to handle graph-structured data.
Multi-hop relation reasoning over knowledge base is to generate effective and interpretable relation prediction through reasoning paths.
The training process of scientific NER models is commonly performed in two steps: i) Pre-training a language model by self-supervised tasks on huge data and ii) fine-tune training with small labelled data.
With Eland, anomaly detection performance at an earlier stage is better than non-augmented methods that need significantly more observed data by up to 15% on the Area under the ROC curve.
In this paper, we propose a novel framework of deep transfer learning to effectively address technical QA across tasks and domains.
To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
In recent years, the need for community technical question-answering sites has increased significantly.
Given video data from multiple personal devices or street cameras, can we exploit the structural and dynamic information to learn dynamic representation of objects for applications such as distributed surveillance, without storing data at a central server that leads to a violation of user privacy?
In this work, we present a novel framework called CoEvoGNN for modeling dynamic attributed graph sequence.
For knowledge representation, we use a graph-based spatial temporal logic (GSTL) to capture spatial and temporal information of related skills demonstrated by demo videos.
Noun phrases and relational phrases in Open Knowledge Bases are often not canonical, leading to redundant and ambiguous facts.
The user embeddings preserve spatial patterns and temporal patterns of a variety of periodicity (e. g., hourly, weekly, and weekday patterns).
Our work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra-class edges and demote inter-class edges in given graph structure, and our main contribution introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction.
Ranked #1 on Node Classification on Flickr
Textual patterns (e. g., Country's president Person) are specified and/or generated for extracting factual information from unstructured data.
Automatic abstractive summaries are found to often distort or fabricate facts in the article.
Path-based relational reasoning over knowledge graphs has become increasingly popular due to a variety of downstream applications such as question answering in dialogue systems, fact prediction, and recommender systems.
Recently, due to the booming influence of online social networks, detecting fake news is drawing significant attention from both academic communities and general public.
Knowledge graphs (KGs) serve as useful resources for various natural language processing applications.
On a scientific concept hierarchy, a parent concept may have a few attributes, each of which has multiple values being a group of child concepts.
In this work, we propose a new sequence labeling framework (as well as a new tag schema) to jointly extract the fact and condition tuples from statement sentences.
Towards the challenging problem of semi-supervised node classification, there have been extensive studies.
Cardiac magnetic resonance imaging (MRI) is an essential tool for MRI-guided surgery and real-time intervention.
Conditions are essential in the statements of biological literature.
Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion.
This work explores the binarization of the deconvolution-based generator in a GAN for memory saving and speedup of image construction.
We propose an efficient framework, called MetaPAD, which discovers meta patterns from massive corpora with three techniques: (1) it develops a context-aware segmentation method to carefully determine the boundaries of patterns with a learnt pattern quality assessment function, which avoids costly dependency parsing and generates high-quality patterns; (2) it identifies and groups synonymous meta patterns from multiple facets---their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patterns, and looks for appropriate type levels to make discovered patterns precise.
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus.
In the literature, two series of models have been proposed to address prediction problems including classification and regression.