Search Results for author: Jiawei Han

Found 227 papers, 135 papers with code

Phrase-aware Unsupervised Constituency Parsing

no code implementations • ACL 2022 • Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han

Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.

Constituency Parsing Language Modelling +1

Paper
Add Code

RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios

1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji

We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.

Event Extraction

Paper
Code

ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision

1 code implementation • EMNLP 2021 • Xuan Wang, Vivian Hu, Xiangchen Song, Shweta Garg, Jinfeng Xiao, Jiawei Han

For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts.

named-entity-recognition Named Entity Recognition +1

Paper
Code

Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation • Findings (ACL) 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Open-Domain Question Answering

Paper
Code

Few-Shot Named Entity Recognition: An Empirical Baseline Study

no code implementations • EMNLP 2021 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents an empirical study to efficiently build named entity recognition (NER) systems when a small amount of in-domain labeled data is available.

Few-Shot Learning named-entity-recognition +2

Paper
Add Code

MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

1 code implementation • 18 Apr 2024 • Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

Generative models for structure-based drug design (SBDD) have shown promising results in recent years.

Paper
Code

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

1 code implementation • 10 Apr 2024 • Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, Jiawei Han

Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.

Paper
Code

TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale

no code implementations • 15 Mar 2024 • Pengcheng Jiang, Cao Xiao, Zifeng Wang, Parminder Bhatia, Jimeng Sun, Jiawei Han

To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model.

Text Summarization

Paper
Add Code

ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation

1 code implementation • 13 Mar 2024 • Sayar Ghosh Roy, Jiawei Han

We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans.

Citation Recommendation Re-Ranking

Paper
Code

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy

1 code implementation • 7 Mar 2024 • SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, Jiawei Han

Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs).

Retrieval

Paper
Code

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision

no code implementations • 29 Feb 2024 • Yunyi Zhang, Ruozhen Yang, Xueqiang Xu, Jinfeng Xiao, Jiaming Shen, Jiawei Han

On the other hand, previous weakly-supervised hierarchical text classification methods only utilize the raw taxonomy skeleton and ignore the rich information hidden in the text corpus that can serve as additional class-indicative features.

text-classification Text Classification

Paper
Add Code

Multi-LoRA Composition for Image Generation

no code implementations • 26 Feb 2024 • Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images.

Denoising Image Generation

Paper
Add Code

A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion

1 code implementation • 20 Feb 2024 • Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han

Entity Set Expansion, Taxonomy Expansion, and Seed-Guided Taxonomy Construction are three representative tasks that can be used to automatically populate an existing taxonomy with new entities.

Language Modelling Large Language Model +1

Paper
Code

Grasping the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction

no code implementations • 17 Feb 2024 • Sizhe Zhou, Yu Meng, Bowen Jin, Jiawei Han

(2) We fine-tune a bidirectional Small Language Model (SLM) using these initial seeds to learn the relations for the target domain.

Few-Shot Learning Language Modelling +3

Paper
Add Code

GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models

1 code implementation • 16 Feb 2024 • Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, Jiawei Han

The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs).

Relation Relation Extraction +1

Paper
Code

Similarity-based Neighbor Selection for Graph LLMs

1 code implementation • 6 Feb 2024 • Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang

Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification.

Node Classification

Paper
Code

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

1 code implementation • 23 Jan 2024 • Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, Jiawei Han

In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i. e., those without seed entities).

Entity Typing Natural Language Inference

Paper
Code

Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction

1 code implementation • 18 Jan 2024 • Qingyun Wang, Zixuan Zhang, Hongxiang Li, Xuan Liu, Jiawei Han, Huimin Zhao, Heng Ji

Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges.

Chemical Entity Recognition Few-shot NER +1

Paper
Code

Investigating Data Contamination for Pre-training Language Models

no code implementations • 11 Jan 2024 • Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo

Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks.

Language Modelling

Paper
Add Code

TrustLLM: Trustworthiness in Large Language Models

1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao

This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.

Ethics Fairness

265

Paper
Code

RESIN-EDITOR: A Schema-guided Hierarchical Event Graph Visualizer and Editor

1 code implementation • 5 Dec 2023 • Khanh Duy Nguyen, Zixuan Zhang, Reece Suchocki, Sha Li, Martha Palmer, Susan Brown, Jiawei Han, Heng Ji

In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events.

Paper
Code

Large Language Models on Graphs: A Comprehensive Survey

1 code implementation • 5 Dec 2023 • Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, Jiawei Han

Besides, although LLMs have shown their pure text-based reasoning ability, it is underexplored whether such ability can be generalized to graphs (i. e., graph-based reasoning).

Language Modelling

515

Paper
Code

SCStory: Self-supervised and Continual Online Story Discovery

1 code implementation • 27 Nov 2023 • Susik Yoon, Yu Meng, Dongha Lee, Jiawei Han

With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories.

Continual Learning Contrastive Learning +1

Paper
Code

Structured Chemistry Reasoning with Large Language Models

1 code implementation • 16 Nov 2023 • Siru Ouyang, Zhuosheng Zhang, Bing Yan, Xuan Liu, Yejin Choi, Jiawei Han, Lianhui Qin

Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in the field of chemistry.

General Knowledge

Paper
Code

MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

no code implementations • 13 Nov 2023 • Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao

Specifically, an adversarial LLM and a target LLM interplay with each other in an iterative manner, where the adversarial LLM aims to generate challenging prompts that elicit unsafe responses from the target LLM, while the target LLM is fine-tuned with safety aligned data on these adversarial prompts.

Instruction Following Response Generation

Paper
Add Code

Don't Make Your LLM an Evaluation Benchmark Cheater

no code implementations • 3 Nov 2023 • Kun Zhou, Yutao Zhu, Zhipeng Chen, Wentong Chen, Wayne Xin Zhao, Xu Chen, Yankai Lin, Ji-Rong Wen, Jiawei Han

Large language models~(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity.

Paper
Add Code

Instruct and Extract: Instruction Tuning for On-Demand Information Extraction

1 code implementation • 24 Oct 2023 • Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han

However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users.

Instruction Following

Paper
Code

"Why Should I Review This Paper?" Unifying Semantic, Topic, and Citation Factors for Paper-Reviewer Matching

no code implementations • 23 Oct 2023 • Yu Zhang, Yanzhen Shen, Xiusi Chen, Bowen Jin, Jiawei Han

As many academic conferences are overwhelmed by a rapidly increasing number of paper submissions, automatically finding appropriate reviewers for each submission becomes a more urgent need than ever.

Information Retrieval Language Modelling +1

Paper
Add Code

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

1 code implementation • 19 Oct 2023 • Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han

Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks.

Paper
Code

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

no code implementations • 17 Oct 2023 • Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.

Transfer Learning

Paper
Add Code

Language Models As Semantic Indexers

no code implementations • 11 Oct 2023 • Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, Xianfeng Tang

Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs.

Contrastive Learning Information Retrieval +2

Paper
Add Code

Ontology Enrichment for Effective Fine-grained Entity Typing

no code implementations • 11 Oct 2023 • Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, Jiawei Han

In this study, we propose OnEFET, where we (1) enrich each node in the ontology structure with two types of extra information: instance information for training sample augmentation and topic information to relate types to contexts, and (2) develop a coarse-to-fine typing algorithm that exploits the enriched information by training an entailment model with contrasting topics and instance-based augmented training samples.

Entity Typing

Paper
Add Code

Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder

no code implementations • 10 Oct 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, Jiawei Han

Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings.

Representation Learning

Paper
Add Code

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

no code implementations • 3 Oct 2023 • Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao

In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs).

Paper
Add Code

Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification

1 code implementation • 5 Jul 2023 • Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch, Jiawei Han

Event schemas are a form of world knowledge about the typical progression of events.

Event Expansion World Knowledge

Paper
Code

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

no code implementations • 4 Jul 2023 • Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han

Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.

Paper
Add Code

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

1 code implementation • 24 Jun 2023 • Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han

Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).

Multi-Label Classification

Paper
Code

Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond

1 code implementation • 16 Jun 2023 • Fangzhi Xu, Qika Lin, Jiawei Han, Tianzhe Zhao, Jun Liu, Erik Cambria

Firstly, to offer systematic evaluations, we select fifteen typical logical reasoning datasets and organize them into deductive, inductive, abductive and mixed-form reasoning settings.

Benchmarking Evidence Selection +2

Paper
Code

Explaining and Adapting Graph Conditional Shift

no code implementations • 5 Jun 2023 • Qi Zhu, Yizhu Jiao, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

Graph Neural Networks (GNNs) have shown remarkable performance on graph-structured data.

Graph Classification Node Classification +1

Paper
Add Code

Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language Models

1 code implementation • 24 May 2023 • Pengcheng Jiang, Shivam Agarwal, Bowen Jin, Xuan Wang, Jimeng Sun, Jiawei Han

The mission of open knowledge graph (KG) completion is to draw new findings from known facts.

Knowledge Graph Completion Language Modelling

Paper
Code

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

1 code implementation • 23 May 2023 • Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.

Pseudo Label Sentiment Analysis +3

Paper
Code

Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

1 code implementation • 23 May 2023 • Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.

Continual Learning

Paper
Code

OntoType: Ontology-Guided Zero-Shot Fine-Grained Entity Typing with Weak Supervision from Pre-Trained Language Models

no code implementations • 21 May 2023 • Tanay Komarlu, Minhao Jiang, Xuan Wang, Jiawei Han

In this study, we vision that an ontology provides a semantics-rich, hierarchical structure, which will help select the best results generated by multiple PLM models and head words.

Entity Typing Natural Language Inference +1

Paper
Add Code

Patton: Language Model Pretraining on Text-Rich Networks

no code implementations • 20 May 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, Jiawei Han

A real-world text corpus sometimes comprises not only text documents but also semantic links between them (e. g., academic papers in a bibliographic network are linked by citations and co-authorships).

Language Modelling Masked Language Modeling +1

Paper
Add Code

Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding

1 code implementation • 8 Apr 2023 • Susik Yoon, Dongha Lee, Yunyi Zhang, Jiawei Han

Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.

Sentence

Paper
Code

MEGClass: Extremely Weakly Supervised Text Classification via Mutually-Enhancing Text Granularities

1 code implementation • 4 Apr 2023 • Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han

By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier.

text-classification Text Classification

Paper
Code

GLEN: General-Purpose Event Detection for Thousands of Types

1 code implementation • 16 Mar 2023 • Qiusi Zhan, Sha Li, Kathryn Conger, Martha Palmer, Heng Ji, Jiawei Han

Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset.

Event Detection Event Extraction

Paper
Code

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks

1 code implementation • 21 Feb 2023 • Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han

Edges in many real-world social/information networks are associated with rich text information (e. g., user-user communications or user-product reviews).

Edge Classification Link Prediction +1

Paper
Code

PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream

1 code implementation • 10 Feb 2023 • Susik Yoon, Hou Pong Chan, Jiawei Han

Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set.

Document Summarization Multi-Document Summarization

Paper
Code

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

1 code implementation • 7 Feb 2023 • Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, Jiawei Han

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature.

Language Modelling Multi Label Text Classification +3

Paper
Code

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories

no code implementations • 7 Feb 2023 • Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time.

Retrieval Zero-shot Generalization

Paper
Add Code

Representation Deficiency in Masked Language Modeling

1 code implementation • 4 Feb 2023 • Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.

Language Modelling Masked Language Modeling

Paper
Code

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

1 code implementation • 12 Dec 2022 • Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.

Language Modelling Word Embeddings

Paper
Code

Entity Set Co-Expansion in StackOverflow

no code implementations • 5 Dec 2022 • Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han

Given a few seed entities of a certain type (e. g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.

graph construction Management

Paper
Add Code

Open Relation and Event Type Discovery with Type Abstraction

1 code implementation • 30 Nov 2022 • Sha Li, Heng Ji, Jiawei Han

To tackle this problem, we introduce the idea of type abstraction, where the model is prompted to generalize and name the type.

Event Extraction Relation +2

Paper
Code

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

1 code implementation • 6 Nov 2022 • Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han

In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set.

Few-Shot Learning

Paper
Code

Open-Vocabulary Argument Role Prediction for Event Extraction

1 code implementation • 3 Nov 2022 • Yizhu Jiao, Sha Li, Yiqing Xie, Ming Zhong, Heng Ji, Jiawei Han

Specifically, we formulate the role prediction problem as an in-filling task and construct prompts for a pre-trained language model to generate candidate roles.

Event Extraction Language Modelling

Paper
Code

PALT: Parameter-Lite Transfer of Language Models for Knowledge Graph Completion

1 code implementation • 25 Oct 2022 • Jianhao Shen, Chenguang Wang, Ye Yuan, Jiawei Han, Heng Ji, Koushik Sen, Ming Zhang, Dawn Song

For instance, we outperform the fully finetuning approaches on a KG completion benchmark by tuning only 1% of the parameters.

Ranked #5 on Link Prediction on UMLS

Knowledge Graph Completion Link Prediction +1

Paper
Code

Large Language Models Can Self-Improve

no code implementations • 20 Oct 2022 • Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han

We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74. 4%->82. 1% on GSM8K, 78. 2%->83. 0% on DROP, 90. 0%->94. 4% on OpenBookQA, and 63. 4%->67. 9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label.

Ranked #1 on Question Answering on DROP

Arithmetic Reasoning Common Sense Reasoning +3

Paper
Add Code

Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation

no code implementations • 18 Oct 2022 • Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu, Jiawei Han

Topic taxonomies display hierarchical topic structures of a text corpus and provide topical knowledge to enhance various NLP applications.

Relation Taxonomy Expansion

Paper
Add Code

Towards a Unified Multi-Dimensional Evaluator for Text Generation

2 code implementations • 13 Oct 2022 • Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, PengFei Liu, Chenguang Zhu, Heng Ji, Jiawei Han

We re-frame NLG evaluation as a Boolean Question Answering (QA) task, and by guiding the model with different questions, we can use one evaluator to evaluate from multiple dimensions.

nlg evaluation Question Answering +4

161

Paper
Code

Few-shot Text Classification with Dual Contrastive Consistency

no code implementations • 29 Sep 2022 • Liwen Sun, Jiawei Han

In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification where only a few annotated examples are given for each class.

Contrastive Learning Few-Shot Text Classification +3

Paper
Add Code

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter

1 code implementation • 15 Sep 2022 • Xinyang Zhang, Yury Malkov, Omar Florez, Serim Park, Brian McWilliams, Jiawei Han, Ahmed El-Kishky

Most existing PLMs are not tailored to the noisy user-generated text on social media, and the pre-training does not factor in the valuable social engagement logs available in a social network.

Language Modelling

Paper
Code

MentorGNN: Deriving Curriculum for Pre-Training GNNs

1 code implementation • 21 Aug 2022 • Dawei Zhou, Lecheng Zheng, Dongqi Fu, Jiawei Han, Jingrui He

To comprehend heterogeneous graph signals at different granularities, we propose a curriculum learning paradigm that automatically re-weighs graph signals in order to ensure a good generalization in the target domain.

Domain Adaptation Graph Mining

Paper
Code

Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

1 code implementation • 28 Jun 2022 • Jiaxin Huang, Yu Meng, Jiawei Han

We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type.

Entity Typing Language Modelling +1

Paper
Code

TeKo: Text-Rich Graph Neural Networks with External Knowledge

no code implementations • 15 Jun 2022 • Zhizhi Yu, Di Jin, Jianguo Wei, Ziyang Liu, Yue Shang, Yun Xiao, Jiawei Han, Lingfei Wu

Graph Neural Networks (GNNs) have gained great popularity in tackling various analytical tasks on graph-structured data (i. e., networks).

Paper
Add Code

Unsupervised Key Event Detection from Massive Text Corpora

1 code implementation • 8 Jun 2022 • Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han

Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge.

Event Detection

Paper
Code

Schema-Guided Event Graph Completion

no code implementations • 6 Jun 2022 • Hongwei Wang, Zixuan Zhang, Sha Li, Jiawei Han, Yizhou Sun, Hanghang Tong, Joseph P. Olive, Heng Ji

Existing link prediction or graph completion methods have difficulty dealing with event graphs because they are usually designed for a single large graph such as a social network or a knowledge graph, rather than multiple small dynamic event graphs.

Link Prediction

Paper
Add Code

All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

no code implementations • 22 May 2022 • Jiaxin Huang, Tianqi Liu, Jialu Liu, Adam D. Lelkes, Cong Yu, Jiawei Han

Multi-Task Learning (MTL) models have shown their robustness, effectiveness, and efficiency for transferring learned knowledge across tasks.

Multi-Task Learning text-classification +1

Paper
Add Code

Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks

1 code implementation • 20 May 2022 • Bowen Jin, Yu Zhang, Qi Zhu, Jiawei Han

In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure.

Clustering Graph Attention +5

Paper
Code

CiteSum: Citation Text-guided Scientific Extreme Summarization and Domain Adaptation with Limited Supervision

1 code implementation • 12 May 2022 • Yuning Mao, Ming Zhong, Jiawei Han

Scientific extreme summarization (TLDR) aims to form ultra-short summaries of scientific papers.

Ranked #1 on Extreme Summarization on CiteSum

Domain Adaptation Extreme Summarization +1

Paper
Code

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

1 code implementation • NAACL 2022 • Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han

Discovering latent topics from text corpora has been studied for decades.

General Knowledge Topic Models

Paper
Code

OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

1 code implementation • 29 Apr 2022 • Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han

Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data.

Attribute Language Modelling

Paper
Code

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

1 code implementation • ICLR 2022 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.

Paper
Code

Shift-Robust Node Classification via Graph Adversarial Clustering

no code implementations • 7 Mar 2022 • Qi Zhu, Chao Zhang, Chanyoung Park, Carl Yang, Jiawei Han

Then a shift-robust classifier is optimized on training graph and adversarial samples on target graph, which are generated by cluster GNN.

Classification Clustering +2

Paper
Add Code

P4E: Few-Shot Event Detection as Prompt-Guided Identification and Localization

no code implementations • 15 Feb 2022 • Sha Li, Liyuan Liu, Yiqing Xie, Heng Ji, Jiawei Han

Our framework decomposes event detection into an identification task and a localization task.

Event Detection Event Extraction +3

Paper
Add Code

Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification

1 code implementation • 11 Feb 2022 • Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, Jiawei Han

Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set.

Contrastive Learning Multi Label Text Classification +3

Paper
Code

TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic Representations

no code implementations • 10 Feb 2022 • Minhao Jiang, Xiangchen Song, Jieyu Zhang, Jiawei Han

Taxonomies are fundamental to many real-world applications in various domains, serving as structural representations of knowledge.

Position

Paper
Add Code

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

1 code implementation • 9 Feb 2022 • Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e. g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e. g., BERT) have been the prominent choice for natural language understanding (NLU) tasks.

Ranked #5 on Zero-Shot Text Classification on AG News

Few-Shot Learning MNLI-m +5

Paper
Code

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

1 code implementation • 9 Feb 2022 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han

Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models.

Clustering Language Modelling +1

Paper
Code

Unsupervised Multi-Granularity Summarization

2 code implementations • 29 Jan 2022 • Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng, Jiawei Han

In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum.

Abstractive Text Summarization

Paper
Code

TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters

no code implementations • 18 Jan 2022 • Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu

Topic taxonomies, which represent the latent topic (or category) structure of document collections, provide valuable knowledge of contents in many applications such as web search and information filtering.

Clustering Topic coverage

Paper
Add Code

Universal Graph Convolutional Networks

1 code implementation • NeurIPS 2021 • Di Jin, Zhizhi Yu, Cuiying Huo, Rui Wang, Xiao Wang, Dongxiao He, Jiawei Han

So can we reasonably utilize these segmentation rules to design a universal propagation mechanism independent of the network structural assumption?

Paper
Code

Out-of-Category Document Identification Using Target-Category Names as Weak Supervision

no code implementations • 24 Nov 2021 • Dongha Lee, Dongmin Hyun, Jiawei Han, Hwanjo Yu

To address this challenge, we introduce a new task referred to as out-of-category detection, which aims to distinguish the documents according to their semantic relevance to the inlier (or target) categories by using the category names as weak supervision.

Paper
Add Code

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

1 code implementation • 7 Nov 2021 • Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han

We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided.

text-classification Text Classification

Paper
Code

Fine-Grained Opinion Summarization with Minimal Supervision

no code implementations • 17 Oct 2021 • Suyu Ge, Jiaxin Huang, Yu Meng, Sharon Wang, Jiawei Han

Opinion summarization aims to profile a target by extracting opinions from multiple documents.

Fine-Grained Opinion Analysis Opinion Summarization

Paper
Add Code

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

1 code implementation • ACL 2022 • Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa

Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited.

Language Modelling Model Selection

Paper
Code

Entity Linking Meets Deep Learning: Techniques and Solutions

no code implementations • 26 Sep 2021 • Wei Shen, Yuhan Li, Yinan Liu, Jiawei Han, Jianyong Wang, Xiaojie Yuan

Entity linking (EL) is the process of linking entity mentions appearing in web text with their corresponding entities in a knowledge base.

Entity Linking Knowledge Base Population +2

Paper
Add Code

SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction

1 code implementation • NAACL 2022 • Yuxin Xiao, Zecheng Zhang, Yuning Mao, Carl Yang, Jiawei Han

Consequently, it is more challenging to encode the key information sources--relevant contexts and entity types.

Ranked #1 on Relation Extraction on CDR

Data Augmentation Document-level Relation Extraction +3

Paper
Code

Chemical-Reaction-Aware Molecule Representation Learning

1 code implementation • ICLR 2022 • Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, Martin D. Burke

Molecule representation learning (MRL) methods aim to embed molecules into a real vector space.

Chemical Reaction Prediction Property Prediction +1

Paper
Code

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

1 code implementation • EMNLP 2021 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, Jiawei Han

We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base.

Language Modelling named-entity-recognition +2

Paper
Code

Corpus-based Open-Domain Event Type Induction

1 code implementation • EMNLP 2021 • Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han

As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs.

Event Extraction Object +1

Paper
Code

Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data

1 code implementation • NeurIPS 2021 • Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and the graph's true inference distribution.

Paper
Code

Multi-head or Single-head? An Empirical Comparison for Transformer Training

1 code implementation • 17 Jun 2021 • Liyuan Liu, Jialu Liu, Jiawei Han

Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications.

Paper
Code

Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion

1 code implementation • Findings (ACL) 2022 • Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han

Typical DocRE methods blindly take the full document as input, while a subset of the sentences in the document, noted as the evidence, are often sufficient for humans to predict the relation of an entity pair.

Ranked #5 on Relation Extraction on DocRED

Document-level Relation Extraction Relation

Paper
Code

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System

1 code implementation • NAACL 2021 • Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).

coreference-resolution Event Extraction +1

Paper
Code

TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names

no code implementations • NAACL 2021 • Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han

Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy.

Multi Label Text Classification Multi-Label Text Classification +3

Paper
Add Code

Event Time Extraction and Propagation via Graph Attention Networks

1 code implementation • NAACL 2021 • Haoyang Wen, Yanru Qu, Heng Ji, Qiang Ning, Jiawei Han, Avi Sil, Hanghang Tong, Dan Roth

Grounding events into a precise timeline is important for natural language understanding but has received limited attention in recent work.

Graph Attention Natural Language Understanding +3

Paper
Code

Training ELECTRA Augmented with Multi-word Selection

no code implementations • Findings (ACL) 2021 • Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han

In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.

Binary Classification Multi-Task Learning

Paper
Add Code

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

2 code implementations • 28 May 2021 • Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang

Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.

Ranked #1 on Phrase Tagging on KPTimes

Keyphrase Extraction Language Modelling +3

164

Paper
Code

Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation

2 code implementations • EMNLP 2021 • Yuning Mao, Wenchang Ma, Deren Lei, Jiawei Han, Xiang Ren

In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial.

Conditional Text Generation Denoising

Paper
Code

The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

1 code implementation • EMNLP 2021 • Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss

We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations.

Paper
Code

Document-Level Event Argument Extraction by Conditional Generation

1 code implementation • NAACL 2021 • Sha Li, Heng Ji, Jiawei Han

On the task of argument extraction, we achieve an absolute gain of 7. 6% F1 and 5. 7% F1 over the next best model on the RAMS and WikiEvents datasets respectively.

Document-level Event Extraction Event Argument Extraction +2

111

Paper
Code

Who Should Go First? A Self-Supervised Concept Sorting Model for Improving Taxonomy Expansion

no code implementations • 8 Apr 2021 • Xiangchen Song, Jiaming Shen, Jieyu Zhang, Jiawei Han

Taxonomies have been widely used in various machine learning and text mining systems to organize knowledge and facilitate downstream tasks.

Taxonomy Expansion

Paper
Add Code

Toward Tweet Entity Linking with Heterogeneous Information Networks

1 code implementation • IEEE Transactions on Knowledge and Data Engineering 2021 • Wei Shen, Yuwei Yin, Yang Yang, Jiawei Han, Jianyong Wang, Xiaojie Yuan

The task of linking an entity mention in a tweet with its corresponding entity in a heterogeneous information network is of great importance, for the purpose of enriching heterogeneous information networks with the abundant and fresh knowledge embedded in tweets.

Entity Linking Metric Learning

Paper
Code

Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

no code implementations • 23 Feb 2021 • Xinyang Zhang, Chenwei Zhang, Luna Xin Dong, Jingbo Shang, Jiawei Han

Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.

Product Categorization Text Categorization

Paper
Add Code

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

2 code implementations • NeurIPS 2021 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.

Contrastive Learning Language Modelling +1

120

Paper
Code

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

1 code implementation • 15 Feb 2021 • Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, Jiawei Han

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set.

General Classification Multi Label Text Classification +2

136

Paper
Code

Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

5 code implementations • ACL (RepL4NLP) 2021 • Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan

Contrastive learning has been applied successfully to learn vector representations of text.

Contrastive Learning

304

Paper
Code

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Natural Questions Open-Domain Question Answering +2

Paper
Code

Few-Shot Named Entity Recognition: A Comprehensive Study

2 code implementations • 29 Dec 2020 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents a comprehensive study to efficiently build named entity recognition (NER) systems when a small number of in-domain labeled data is available.

Few-Shot Learning named-entity-recognition +2

Paper
Code

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

1 code implementation • 26 Oct 2020 • Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han

Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.

Data Augmentation Document Classification +1

Paper
Code

Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation

2 code implementations • 24 Oct 2020 • Yuning Mao, Xiang Ren, Heng Ji, Jiawei Han

Despite significant progress, state-of-the-art abstractive summarization methods are still prone to hallucinate content inconsistent with the source document.

Abstractive Text Summarization Keyphrase Extraction

Paper
Code

On the Transformer Growth for Progressive BERT Training

no code implementations • NAACL 2021 • Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han

Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity.

Language Modelling

Paper
Add Code

BiTe-GCN: A New GCN Architecture via BidirectionalConvolution of Topology and Features on Text-Rich Networks

no code implementations • 23 Oct 2020 • Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, Jiawei Han

We propose BiTe-GCN, a novel GCN architecture with bidirectional convolution of both topology and features on text-rich networks to solve these limitations.

Paper
Add Code

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

no code implementations • ICLR 2021 • Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen

To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks.

Data Augmentation Natural Language Understanding

Paper
Add Code

Text Classification Using Label Names Only: A Language Model Self-Training Approach

2 code implementations • EMNLP 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han

In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents.

Document Classification General Classification +6

293

Paper
Code

Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

1 code implementation • EMNLP 2020 • Jiaxin Huang, Yu Meng, Fang Guo, Heng Ji, Jiawei Han

Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring

1 code implementation • 13 Oct 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han

Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search.

Question Answering Relation

Paper
Code

A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling

1 code implementation • 5 Oct 2020 • Wanzheng Zhu, Chao Zhang, Shuochao Yao, Xiaobin Gao, Jiawei Han

We propose SHMM, a multi-modal spherical hidden Markov model for semantics-rich human mobility modeling.

Paper
Code

Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding

1 code implementation • EMNLP 2020 • Jiaming Shen, Heng Ji, Jiawei Han

Linguistic steganography studies how to hide secret messages in natural language cover texts.

Language Modelling Linguistic steganography

Paper
Code

Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning

1 code implementation • EMNLP 2020 • Yuning Mao, Yanru Qu, Yiqing Xie, Xiang Ren, Jiawei Han

Additionally, the explicit redundancy measure in MMR helps the neural representation of the summary to better capture redundancy.

Document Summarization Multi-Document Summarization +3

Paper
Code

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

no code implementations • EMNLP 2020 • Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han

To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing.

Paper
Add Code

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Ranked #9 on Passage Retrieval on Natural Questions

Natural Questions Open-Domain Question Answering +4

Paper
Code

Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization

1 code implementation • NeurIPS 2021 • Qi Zhu, Carl Yang, Yidan Xu, Haonan Wang, Chao Zhang, Jiawei Han

Graph neural networks (GNNs) have achieved superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs.

Knowledge Graphs Transfer Learning

Paper
Code

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

1 code implementation • 18 Jul 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora.

Ranked #1 on Topic Models on Arxiv HEP-TH citation graph

text-classification Topic Models

Paper
Code

GCN for HIN via Implicit Utilization of Attention and Meta-paths

no code implementations • 6 Jul 2020 • Di Jin, Zhizhi Yu, Dongxiao He, Carl Yang, Philip S. Yu, Jiawei Han

Graph neural networks for HIN embeddings typically adopt a hierarchical attention (including node-level and meta-path-level attentions) to capture the information from meta-path-based neighbors.

Paper
Add Code

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

no code implementations • NAACL 2021 • Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Yi R. Fung, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, Jasmine Rah, David Liem, Ahmed Elsayed, Martha Palmer, Clare Voss, Cynthia Schneider, Boyan Onyshkevych

To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions.

graph construction Knowledge Graphs +1

Paper
Add Code

EVIDENCEMINER: Textual Evidence Discovery for Life Sciences

no code implementations • ACL 2020 • Xuan Wang, Yingjun Guan, Weili Liu, Aabhas Chauhan, Enyi Jiang, Qi Li, David Liem, Dibakar Sigdel, John Caufield, Peipei Ping, Jiawei Han

EVIDENCEMINER is constructed in a completely automated way without any human effort for training data annotation.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

no code implementations • 24 Jun 2020 • Xin Luna Dong, Xiang He, Andrey Kan, Xi-An Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, Jiawei Han

Can one build a knowledge graph (KG) for all products in the world?

Anomaly Detection Knowledge Graphs +2

Paper
Add Code

Octet: Online Catalog Taxonomy Enrichment with Self-Supervision

no code implementations • 18 Jun 2020 • Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han

We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment.

Term Extraction

Paper
Add Code

Unsupervised Differentiable Multi-aspect Network Embedding

1 code implementation • 7 Jun 2020 • Chanyoung Park, Carl Yang, Qi Zhu, Donghyun Kim, Hwanjo Yu, Jiawei Han

To capture the multiple aspects of each node, existing studies mainly rely on offline graph clustering performed prior to the actual embedding, which results in the cluster membership of each node (i. e., node aspect distribution) fixed throughout training of the embedding model.

Clustering Graph Clustering +2

Paper
Code

Open-Domain Question Answering with Pre-Constructed Question Spaces

no code implementations • NAACL 2021 • Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han

Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.

Information Retrieval Knowledge Graphs +2

Paper
Add Code

Minimally Supervised Categorization of Text with Metadata

1 code implementation • 1 May 2020 • Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang, Jiawei Han

Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity.

Document Classification

Paper
Code

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

no code implementations • 1 May 2020 • Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Empower Entity Set Expansion via Language Model Probing

1 code implementation • ACL 2020 • Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han

Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.

Language Modelling Question Answering

Paper
Code

Automatic Textual Evidence Mining in COVID-19 Literature

no code implementations • 27 Apr 2020 • Xuan Wang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Jiawei Han

We created this EVIDENCEMINER system for automatic textual evidence mining in COVID-19 literature.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

Understanding the Difficulty of Training Transformers

2 code implementations • EMNLP 2020 • Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han

Transformers have proved effective in many NLP tasks.

Ranked #5 on Machine Translation on WMT2014 English-French

Machine Translation

322

Paper
Code

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

1 code implementation • 1 Apr 2020 • Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han

Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms.

Attribute Network Embedding

244

Paper
Code

Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision

no code implementations • 27 Mar 2020 • Xuan Wang, Xiangchen Song, Bangzheng Li, Yingjun Guan, Jiawei Han

We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13).

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

1 code implementation • 27 Jan 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han

Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.

Paper
Code

TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

3 code implementations • 26 Jan 2020 • Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, Jiawei Han

Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.

Position Product Recommendation +1

Paper
Code

Generating Representative Headlines for News Stories

2 code implementations • 26 Jan 2020 • Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, Hongkun Yu, You Wu, Cong Yu, Daniel Finnie, Jiaqi Zhai, Nicholas Zukoski

In this work, we study the problem of generating representative headlines for news stories.

75,763

Paper
Code

cube2net: Efficient Query-Specific Network Construction with Data Cube Organization

no code implementations • 18 Jan 2020 • Carl Yang, Mengxiong Liu, Frank He, Jian Peng, Jiawei Han

With extensive experiments of two classic network mining tasks on different real-world large datasets, we show that our proposed cube2net pipeline is general, and much more effective and efficient in query-specific network construction, compared with other methods without the leverage of data cube or reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction

2 code implementations • 1 Jan 2020 • Aravind Sankar, Xinyang Zhang, Adit Krishnan, Jiawei Han

Recent years have witnessed tremendous interest in understanding and predicting information spread on social media platforms such as Twitter, Facebook, etc.

Paper
Code

Unsupervised Attributed Multiplex Network Embedding

2 code implementations • 15 Nov 2019 • Chanyoung Park, Donghyun Kim, Jiawei Han, Hwanjo Yu

Even for those that consider the multiplexity of a network, they overlook node attributes, resort to node labels for training, and fail to model the global properties of a graph.

Network Embedding Relation

136

Paper
Code

Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach

no code implementations • 14 Nov 2019 • Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, Jiawei Han

This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections.

Attribute News Summarization

Paper
Add Code

Relation Learning on Social Networks with Multi-Modal Graph Edge Variational Autoencoders

no code implementations • 4 Nov 2019 • Carl Yang, Jieyu Zhang, Haonan Wang, Sha Li, Myungwan Kim, Matt Walker, Yiou Xiao, Jiawei Han

While node semantics have been extensively explored in social networks, little research attention has been paid to profile edge semantics, i. e., social relations.

Relation

Paper
Add Code

Spherical Text Embedding

1 code implementation • NeurIPS 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han

While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding.

Clustering Riemannian optimization +1

175

Paper
Code

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

1 code implementation • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han

In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features.

feature selection Question Answering

Paper
Code

HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion

no code implementations • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, Jiawei Han

Taxonomies are of great value to many knowledge-rich applications.

Relation Relation Extraction

Paper
Add Code

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

2 code implementations • 16 Oct 2019 • Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, Jiawei Han

With the massive number of repositories available, there is a pressing need for topic-based search.

Classification General Classification +1

Paper
Code

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams

1 code implementation • 10 Oct 2019 • Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han

In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.

Clustering Language Modelling

Paper
Code

Place Deduplication with Embeddings

no code implementations • 29 Sep 2019 • Carl Yang, Do Huy Hoang, Tomas Mikolov, Jiawei Han

Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go.

Paper
Add Code

Neural Embedding Propagation on Heterogeneous Networks

1 code implementation • 29 Sep 2019 • Carl Yang, Jieyu Zhang, Jiawei Han

While generalizing LP as a simple instance, NEP is far more powerful in its natural awareness of different types of objects and links, and the ability to automatically capture their important interaction patterns.

Network Embedding

Paper
Code

Meta-Graph Based HIN Spectral Embedding: Methods, Analyses, and Insights

no code implementations • 29 Sep 2019 • Carl Yang, Yichen Feng, Pan Li, Yu Shi, Jiawei Han

In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner.

Paper
Add Code

I Know You'll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application

no code implementations • 29 Sep 2019 • Carl Yang, Xiaolin Shi, Jie Luo, Jiawei Han

Then we design a novel deep learning pipeline based on LSTM and attention to accurately predict user churn with very limited initial behavior data, by leveraging the correlations among users' multi-dimensional activities and the underlying user types.

Clustering

Paper
Add Code

Query-Specific Knowledge Summarization with Entity Evolutionary Networks

no code implementations • 29 Sep 2019 • Carl Yang, Lingrui Gan, Zongyi Wang, Jiaming Shen, Jinfeng Xiao, Jiawei Han

Given a query, unlike traditional IR that finds relevant documents or entities, in this work, we focus on retrieving both entities and their connections for insightful knowledge summarization.

Paper
Add Code

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

1 code implementation • 4 Sep 2019 • Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han

Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity.

Knowledge Graphs

Paper
Code

CrossWeigh: Training Named Entity Tagger from Imperfect Annotations

1 code implementation • IJCNLP 2019 • Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han

Therefore, we manually correct these label mistakes and form a cleaner test set.

Ranked #3 on Named Entity Recognition (NER) on CoNLL++ (using extra training data)

named-entity-recognition Named Entity Recognition +1

172

Paper
Code

Hierarchical Text Classification with Reinforced Label Assignment

1 code implementation • IJCNLP 2019 • Yuning Mao, Jingjing Tian, Jiawei Han, Xiang Ren

While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference.

Ranked #1 on Text Classification on RCV1 (Macro F1 metric)

General Classification text-classification +1

139

Paper
Code

Facet-Aware Evaluation for Extractive Summarization

1 code implementation • ACL 2020 • Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han

In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries.

Extractive Summarization Sentence +1

Paper
Code

Discriminative Topic Mining via Category-Name Guided Text Embedding

1 code implementation • 20 Aug 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han

We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora.

Document Classification General Classification +3

Paper
Code

Parsimonious Morpheme Segmentation with an Application to Enriching Word Embeddings

no code implementations • 18 Aug 2019 • Ahmed El-Kishky, Frank Xu, Aston Zhang, Jiawei Han

However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures.

Language Modelling Segmentation +1

Paper
Add Code

Raw-to-End Name Entity Recognition in Social Media

1 code implementation • 14 Aug 2019 • Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han

Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.

named-entity-recognition Named Entity Recognition +1

Paper
Code

On the Variance of the Adaptive Learning Rate and Beyond

21 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

47,519

Paper
Code

Arabic Named Entity Recognition: What Works and What's Next

no code implementations • WS 2019 • Liyuan Liu, Jingbo Shang, Jiawei Han

This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder. com.

Ensemble Learning Feature Engineering +4

Paper
Add Code

Constrained Sequence-to-sequence Semitic Root Extraction for Enriching Word Embeddings

no code implementations • WS 2019 • Ahmed El-Kishky, Xingyu Fu, Aseel Addawood, Nahil Sobh, Clare Voss, Jiawei Han

In this paper, we tackle the problem of {``}root extraction{''} from words in the Semitic language family.

Language Modelling Word Embeddings +1

Paper
Add Code

Reliability-aware Dynamic Feature Composition for Name Tagging

1 code implementation • ACL 2019 • Ying Lin, Liyuan Liu, Heng Ji, Dong Yu, Jiawei Han

We design a set of word frequency-based reliability signals to indicate the quality of each word embedding.

Named Entity Recognition (NER) Word Embeddings

Paper
Code

Task-Guided Pair Embedding in Heterogeneous Network

1 code implementation • 4 Jun 2019 • Chanyoung Park, Donghyun Kim, Qi Zhu, Jiawei Han, Hwanjo Yu

In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e. g., paper-author relationship in author identification).

Network Embedding

Paper
Code

Biomedical Event Extraction based on Knowledge-driven Tree-LSTM

no code implementations • NAACL 2019 • Diya Li, Lifu Huang, Heng Ji, Jiawei Han

Event extraction for the biomedical domain is more challenging than that in the general news domain since it requires broader acquisition of domain-specific knowledge and deeper understanding of complex contexts.

Entity Linking Event Extraction

Paper
Add Code

STFNets: Learning Sensing Signals from the Time-Frequency Perspective with Short-Time Fourier Neural Networks

1 code implementation • 21 Feb 2019 • Shuochao Yao, Ailing Piao, Wenjun Jiang, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Jinyang Li, Tianshi Wang, Shaohan Hu, Lu Su, Jiawei Han, Tarek Abdelzaher

IoT applications, however, often measure physical phenomena, where the underlying physics (such as inertia, wireless signal propagation, or the natural frequency of oscillation) are fundamentally a function of signal frequencies, offering better features in the frequency domain.

speech-recognition Speech Recognition

Paper
Code

Weakly-Supervised Hierarchical Text Classification

1 code implementation • 29 Dec 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism.

Blocking Feature Engineering +3

Paper
Code

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

2 code implementations • 22 Dec 2018 • Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, Jiawei Han

Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion.

Databases

Paper
Code

User-Guided Clustering in Heterogeneous Information Networks via Motif-Based Comprehensive Transcription

1 code implementation • 28 Nov 2018 • Yu Shi, Xinwei He, Naijing Zhang, Carl Yang, Jiawei Han

We therefore approach the problem of user-guided clustering in HINs with network motifs.

Clustering

Paper
Code

Mining Entity Synonyms with Efficient Neural Set Generation

1 code implementation • 16 Nov 2018 • Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, Jiawei Han

Mining entity synonym sets (i. e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications.

Paper
Code

End-to-End Hierarchical Text Classification with Label Assignment Policy

no code implementations • 27 Sep 2018 • Yuning Mao, Jingjing Tian, Jiawei Han, Xiang Ren

We present an end-to-end reinforcement learning approach to hierarchical text classification where documents are labeled by placing them at the right positions in a given hierarchy.

text-classification Text Classification

Paper
Add Code

Learning Named Entity Tagger using Domain-Specific Dictionary

1 code implementation • EMNLP 2018 • Jingbo Shang, Liyuan Liu, Xiang Ren, Xiaotao Gu, Teng Ren, Jiawei Han

Recent advances in deep neural models allow us to build reliable named entity recognition (NER) systems without handcrafting features.

named-entity-recognition Named Entity Recognition +1

483

Paper
Code

Weakly-Supervised Neural Text Classification

1 code implementation • 2 Sep 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types.

Feature Engineering General Classification +2

Paper
Code

Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks

1 code implementation • 10 Jul 2018 • Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, Jiawei Han

To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics.

Feature Engineering Network Embedding

Paper
Code

Entropy-Based Subword Mining with an Application to Word Embeddings

no code implementations • WS 2018 • Ahmed El-Kishky, Frank Xu, Aston Zhang, Stephen Macke, Jiawei Han

Recent literature has shown a wide variety of benefits to mapping traditional one-hot representations of words and phrases to lower-dimensional real-valued vectors known as word embeddings.

Language Modelling Machine Translation +3

Paper
Add Code

End-to-End Reinforcement Learning for Automatic Taxonomy Induction

1 code implementation • ACL 2018 • Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, Jiawei Han

We present a novel end-to-end reinforcement learning approach to automatic taxonomy induction from a set of terms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach

1 code implementation • 29 Apr 2018 • Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, Jiawei Han

Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types.

Model Selection

Paper
Code

Integrating Local Context and Global Cohesiveness for Open Information Extraction

1 code implementation • 26 Apr 2018 • Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han

However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions.

Open Information Extraction Relation +1

Paper
Code

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

1 code implementation • EMNLP 2018 • Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng, Jiawei Han

Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications.

Ranked #47 on Named Entity Recognition (NER) on CoNLL 2003 (English)

Language Modelling Named Entity Recognition (NER)

146

Paper
Code

Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings

no code implementations • 9 Mar 2018 • Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han

We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking.

Paper
Add Code

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

no code implementations • 5 Mar 2018 • Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han

Therefore, we are motivated to propose a novel embedding learning framework---AspEm---to preserve the semantic information in HINs based on multiple aspects.

Link Prediction Network Embedding

Paper
Add Code

Investigating Rumor News Using Agreement-Aware Search

1 code implementation • 21 Feb 2018 • Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han

We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences.

Paper
Code

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

2 code implementations • 30 Jan 2018 • Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, Jiawei Han

Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases.

Feature Engineering Multi-Task Learning +4

129

Paper
Code

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

1 code implementation • 19 Jan 2018 • Yu Shi, Fangqiu Han, Xinwei He, Xinran He, Carl Yang, Jie Luo, Jiawei Han

With experiments on a series of synthetic datasets, a large-scale internal Snapchat dataset, and two public datasets, we confirm the validity and importance of preservation and collaboration as two objectives for multi-view network embedding.

Network Embedding

Paper
Code

Graph Clustering with Dynamic Embedding

1 code implementation • 21 Dec 2017 • Carl Yang, Mengxiong Liu, Zongyi Wang, Liyuan Liu, Jiawei Han

Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner.

Social and Information Networks Physics and Society

Paper
Code

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning

no code implementations • 9 Nov 2017 • Meng Qu, Xiang Ren, Yu Zhang, Jiawei Han

We propose a novel co-training framework with a distributional module and a pattern module.

Knowledge Base Completion Relation +1

Paper
Add Code

Indirect Supervision for Relation Extraction using Question-Answer Pairs

2 code implementations • 30 Oct 2017 • Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li, Jiawei Han

However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy.

Question Answering Relation +1

418

Paper
Code

An Attention-based Collaboration Framework for Multi-View Network Representation Learning

1 code implementation • 19 Sep 2017 • Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, Jiawei Han

Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network.

Representation Learning

Paper
Code

Empower Sequence Labeling with Task-Aware Neural Language Model

3 code implementations • 13 Sep 2017 • Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, Jiawei Han

In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task.

Ranked #13 on Part-Of-Speech Tagging on Penn Treebank

Language Modelling named-entity-recognition +5

845

Paper
Code

Identifying Semantically Deviating Outlier Documents

no code implementations • EMNLP 2017 • Honglei Zhuang, Chi Wang, Fangbo Tao, Lance Kaplan, Jiawei Han

A document outlier is a document that substantially deviates in semantics from the majority ones in a corpus.

Outlier Detection

Paper
Add Code

Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach

1 code implementation • EMNLP 2017 • Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han

These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance.

Relation Relation Extraction +1

Paper
Code

Life-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences

no code implementations • ACL 2017 • Xiang Ren, Jiaming Shen, Meng Qu, Xuan Wang, Zeqiu Wu, Qi Zhu, Meng Jiang, Fangbo Tao, Saurabh Sinha, David Liem, Peipei Ping, Richard Weinshilboum, Jiawei Han

Efficient Exploration

Paper
Add Code

Automatic Synonym Discovery with Knowledge Bases

1 code implementation • 25 Jun 2017 • Meng Qu, Xiang Ren, Jiawei Han

In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus.

Paper
Code

PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks

no code implementations • 5 Jun 2017 • Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui, Jiawei Han

We also identify, from real-world data, and propose to model cross-meta-path synergy, which is a characteristic important for defining path-based HIN relevance and has not been modeled by existing methods.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.