no code implementations • ACL 2022 • Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han
Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.
1 code implementation • ICML 2020 • chengyu dong, Liyuan Liu, Zichao Li, Jingbo Shang
Serving as a crucial factor, the depth of residual networks balances model capacity, performance, and training efficiency.
1 code implementation • EMNLP 2020 • Dheeraj Mekala, Xinyang Zhang, Jingbo Shang
Based on seed words, we rank and filter motif instances to distill highly label-indicative ones as {``}seed motifs{''}, which provide additional weak supervision.
no code implementations • EMNLP 2021 • Zihan Wang, chengyu dong, Jingbo Shang
In this paper, we present an empirical property of these representations—”average” approximates “first principal component”.
no code implementations • Findings (ACL) 2022 • Zihan Wang, Jiuxiang Gu, Jason Kuen, Handong Zhao, Vlad Morariu, Ruiyi Zhang, Ani Nenkova, Tong Sun, Jingbo Shang
We present a comprehensive study of sparse attention patterns in Transformer models.
no code implementations • Findings (ACL) 2022 • Zi Lin, Jeremiah Zhe Liu, Jingbo Shang
Recent work in task-independent graph semantic parsing has shifted from grammar-based symbolic approaches to neural models, showing strong performance on different types of meaning representations.
no code implementations • 1 Jan 2023 • Jiayun Zhang, Xiyuan Zhang, Xinyang Zhang, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang
In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i. e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes.
no code implementations • 1 Jan 2023 • Xiyuan Zhang, Ranak Roy Chowdhury, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang
We find that many activities in the current HAR datasets have shared label names, e. g., "open door" and "open fridge", "walk upstairs" and "walk downstairs".
no code implementations • 27 Nov 2022 • Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu
In contrast, region-level models attempt to encode regions corresponding to paragraphs or text blocks into a single embedding, but they perform worse with additional word-level features.
1 code implementation • 25 Oct 2022 • Sudhanshu Ranjan, Dheeraj Mekala, Jingbo Shang
Instead of training on the entire code-switched corpus at once, we create buckets based on the fraction of words in the resource-rich language and progressively train from resource-rich language dominated samples to low-resource language dominated samples.
no code implementations • 5 Oct 2022 • Yufan Zhuang, Zihan Wang, Fangbo Tao, Jingbo Shang
We propose Waveformer that learns attention mechanism in the wavelet coefficient space, requires only linear time complexity, and enjoys universal approximating power.
no code implementations • 28 Sep 2022 • Jiacheng Li, Zhankui He, Jingbo Shang, Julian McAuley
In this paper, we propose UCEpic, an explanation generation model that unifies aspect planning and lexical constraints for controllable personalized generation.
no code implementations • 14 Jun 2022 • chengyu dong, Liyuan Liu, Jingbo Shang
To fill this gap, we propose a novel student-oriented teacher network training framework SoTeacher, inspired by recent findings that student performance hinges on teacher's capability to approximate the true label distribution of training samples.
1 code implementation • 25 May 2022 • Dheeraj Mekala, chengyu dong, Jingbo Shang
Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels.
2 code implementations • 25 May 2022 • Dheeraj Mekala, Tu Vu, Timo Schick, Jingbo Shang
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation.
no code implementations • 25 May 2022 • William Hogan, Jiacheng Li, Jingbo Shang
Recent relation extraction (RE) works have shown encouraging improvements by conducting contrastive learning on silver labels generated by distant supervision before fine-tuning on gold labels.
Ranked #33 on
Relation Extraction
on DocRED
no code implementations • 24 May 2022 • Lesheng Jin, Zihan Wang, Jingbo Shang
Inspired by this observation, in WeDef, we define the reliability of samples based on whether the predictions of the weak classifier agree with their labels in the poisoned training set.
1 code implementation • 24 May 2022 • Zihan Wang, Kewen Zhao, Zilong Wang, Jingbo Shang
Fine-tuning pre-trained language models has recently become a common practice in building NLP models for various tasks, especially few-shot tasks.
1 code implementation • 29 Apr 2022 • Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han
Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data.
1 code implementation • Findings (ACL) 2022 • Zilong Wang, Jingbo Shang
To overcome the data limitation, we propose to leverage the label surface names to better inform the model of the target entity type semantics and also embed the labels into the spatial embedding space to capture the spatial correspondence between regions and labels.
no code implementations • 7 Oct 2021 • chengyu dong, Liyuan Liu, Jingbo Shang
We show that label noise exists in adversarial training.
no code implementations • 29 Sep 2021 • Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang
While this phenomenon is commonly explained as overfitting, we observe that it is a twin process: not only does the model catastrophic overfits to one type of perturbation, but also the perturbation deteriorates into random noise.
no code implementations • EMNLP 2021 • Dheeraj Mekala, Varun Gangal, Jingbo Shang
Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases.
no code implementations • Findings (EMNLP) 2021 • Zichao Li, Dheeraj Mekala, chengyu dong, Jingbo Shang
To recognize the poisoned subset, we examine the training samples with these identified triggers as the most suspicious token, and check if removing the trigger will change the poisoned model's prediction.
1 code implementation • EMNLP 2021 • Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, Furu Wei
Reading order detection is the cornerstone to understanding visually-rich documents (e. g., receipts and forms).
2 code implementations • ACL 2021 • Jiacheng Li, Haibo Ding, Jingbo Shang, Julian McAuley, Zhe Feng
We study the problem of building entity tagging systems by using a few rules as weak supervision.
no code implementations • NAACL 2021 • Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy.
2 code implementations • 28 May 2021 • Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang
Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.
Ranked #1 on
Phrase Tagging
on KPTimes
1 code implementation • 18 Apr 2021 • Zihan Wang, chengyu dong, Jingbo Shang
In this paper, we present an empirical property of these representations -- "average" approximates "first principal component".
1 code implementation • 18 Apr 2021 • Xiuwen Zheng, Dheeraj Mekala, Amarnath Gupta, Jingbo Shang
Hashtag annotation for microblog posts has been recently formulated as a sequence generation problem to handle emerging hashtags that are unseen in the training set.
1 code implementation • 18 Apr 2021 • Xianjie Shen, Yinghan Wang, Rui Meng, Jingbo Shang
Keyphrase generation aims to summarize long documents with a collection of salient phrases.
no code implementations • 23 Feb 2021 • Xinyang Zhang, Chenwei Zhang, Luna Xin Dong, Jingbo Shang, Jiawei Han
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
1 code implementation • 15 Feb 2021 • chengyu dong, Liyuan Liu, Jingbo Shang
Specifically, we first propose a strategy to measure the data quality based on the learning behaviors of the data during adversarial training and find that low-quality data may not be useful and even detrimental to the adversarial robustness.
1 code implementation • Findings (ACL) 2021 • Jiaman Wu, Dezhi Hong, Rajesh Gupta, Jingbo Shang
A sensor name, typically an alphanumeric string, encodes the key context (e. g., function and location) of a sensor needed for deploying smart building applications.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Yang Jiao, Jiacheng Li, Jiaman Wu, Dezhi Hong, Rajesh Gupta, Jingbo Shang
Sensor metadata tagging, akin to the named entity recognition task, provides key contextual information (e. g., measurement type and location) about sensors for running smart building applications.
3 code implementations • NAACL 2021 • Zihan Wang, Dheeraj Mekala, Jingbo Shang
Finally, we pick the most confident documents from each cluster to train a text classifier.
2 code implementations • 15 Oct 2020 • Zichao Li, Liyuan Liu, chengyu dong, Jingbo Shang
Our goal is to understand why the robustness drops after conducting adversarial training for too long.
no code implementations • EMNLP 2020 • Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han
To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing.
1 code implementation • ACL 2020 • Dheeraj Mekala, Jingbo Shang
Weakly supervised text classification based on a few user-provided seed words has recently attracted much attention from researchers.
1 code implementation • 30 Apr 2020 • Peiran Li, Fang Guo, Jingbo Shang
Aspect classification, identifying aspects of text segments, facilitates numerous applications, such as sentiment analysis and review summarization.
1 code implementation • ACL 2020 • Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
1 code implementation • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han
In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features.
1 code implementation • 10 Oct 2019 • Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han
In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.
1 code implementation • IJCNLP 2019 • Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han
Therefore, we manually correct these label mistakes and form a cleaner test set.
Ranked #3 on
Named Entity Recognition
on CoNLL++
(using extra training data)
1 code implementation • 14 Aug 2019 • Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han
Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.
no code implementations • WS 2019 • Liyuan Liu, Jingbo Shang, Jiawei Han
This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder. com.
1 code implementation • EMNLP 2018 • Jingbo Shang, Liyuan Liu, Xiang Ren, Xiaotao Gu, Teng Ren, Jiawei Han
Recent advances in deep neural models allow us to build reliable named entity recognition (NER) systems without handcrafting features.
1 code implementation • 29 Apr 2018 • Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, Jiawei Han
Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types.
1 code implementation • 26 Apr 2018 • Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han
However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions.
1 code implementation • EMNLP 2018 • Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng, Jiawei Han
Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications.
Ranked #45 on
Named Entity Recognition
on CoNLL 2003 (English)
1 code implementation • 21 Feb 2018 • Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han
We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences.
2 code implementations • 30 Jan 2018 • Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, Jiawei Han
Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases.
1 code implementation • 19 Sep 2017 • Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, Jiawei Han
Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network.
3 code implementations • 13 Sep 2017 • Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, Jiawei Han
In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task.
Ranked #13 on
Part-Of-Speech Tagging
on Penn Treebank
no code implementations • 13 Mar 2017 • Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M. Kaplan, Timothy P. Hanratty, Jiawei Han
We propose an efficient framework, called MetaPAD, which discovers meta patterns from massive corpora with three techniques: (1) it develops a context-aware segmentation method to carefully determine the boundaries of patterns with a learnt pattern quality assessment function, which avoids costly dependency parsing and generates high-quality patterns; (2) it identifies and groups synonymous meta patterns from multiple facets---their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patterns, and looks for appropriate type levels to make discovered patterns precise.
4 code implementations • 15 Feb 2017 • Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R. Voss, Jiawei Han
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus.
no code implementations • 31 Oct 2016 • Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han
In the literature, two series of models have been proposed to address prediction problems including classification and regression.
1 code implementation • 31 Oct 2016 • Jingbo Shang, Meng Qu, Jialu Liu, Lance M. Kaplan, Jiawei Han, Jian Peng
It models vertices as low-dimensional vectors to explore network structure-embedded similarity.
no code implementations • 22 Oct 2014 • Jingbo Shang, Tianqi Chen, Hang Li, Zhengdong Lu, Yong Yu
In this paper, we tackle this challenge with a novel parallel and efficient algorithm for feature-based matrix factorization.