Search Results for author: Jiawei Han

Found 161 papers, 89 papers with code

ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision

no code implementations EMNLP 2021 Xuan Wang, Vivian Hu, Xiangchen Song, Shweta Garg, Jinfeng Xiao, Jiawei Han

For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts.

Named Entity Recognition NER

Phrase-aware Unsupervised Constituency Parsing

no code implementations ACL 2022 Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han

Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.

Constituency Parsing Language Modelling +1

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

no code implementations4 May 2022 Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han

Discovering latent topics from text corpora has been studied for decades.

Topic Models

OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

1 code implementation29 Apr 2022 Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han

Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data.

Language Modelling

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

1 code implementation ICLR 2022 Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.

Shift-Robust Node Classification via Graph Adversarial Clustering

no code implementations7 Mar 2022 Qi Zhu, Chao Zhang, Chanyoung Park, Carl Yang, Jiawei Han

Then a shift-robust classifier is optimized on training graph and adversarial samples on target graph, which are generated by cluster GNN.

Classification Domain Adaptation +1

PILED: An Identify-and-Localize Framework for Few-Shot Event Detection

no code implementations15 Feb 2022 Sha Li, Liyuan Liu, Yiqing Xie, Heng Ji, Jiawei Han

Practical applications of event extraction systems have long been hindered by their need for heavy human annotation.

Event Detection Event Extraction +2

TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic Representations

no code implementations10 Feb 2022 Minhao Jiang, Xiangchen Song, Jieyu Zhang, Jiawei Han

Taxonomies are fundamental to many real-world applications in various domains, serving as structural representations of knowledge.

Pretrained Language Models

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

1 code implementation9 Feb 2022 Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e. g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e. g., BERT) have been the prominent choice for natural language understanding (NLU) tasks.

Few-Shot Learning Natural Language Understanding +3

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

1 code implementation9 Feb 2022 Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han

Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models.

Language Modelling Pretrained Language Models +1

Unsupervised Summarization with Customized Granularities

no code implementations29 Jan 2022 Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng, Jiawei Han

We take events as the basic semantic units of the source documents and propose to rank these events by their salience.

Abstractive Text Summarization

TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters

no code implementations18 Jan 2022 Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu

Topic taxonomies, which represent the latent topic (or category) structure of document collections, provide valuable knowledge of contents in many applications such as web search and information filtering.

Topic coverage

Universal Graph Convolutional Networks

1 code implementation NeurIPS 2021 Di Jin, Zhizhi Yu, Cuiying Huo, Rui Wang, Xiao Wang, Dongxiao He, Jiawei Han

So can we reasonably utilize these segmentation rules to design a universal propagation mechanism independent of the network structural assumption?

Out-of-Category Document Identification Using Target-Category Names as Weak Supervision

no code implementations24 Nov 2021 Dongha Lee, Dongmin Hyun, Jiawei Han, Hwanjo Yu

To address this challenge, we introduce a new task referred to as out-of-category detection, which aims to distinguish the documents according to their semantic relevance to the inlier (or target) categories by using the category names as weak supervision.

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

1 code implementation7 Nov 2021 Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han

We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided.

Text Classification

Fine-Grained Opinion Summarization with Minimal Supervision

no code implementations17 Oct 2021 Suyu Ge, Jiaxin Huang, Yu Meng, Sharon Wang, Jiawei Han

Opinion summarization aims to profile a target by extracting opinions from multiple documents.

Fine-Grained Opinion Analysis

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

1 code implementation ACL 2022 Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa

Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited.

Language Modelling Model Selection

Entity Linking Meets Deep Learning: Techniques and Solutions

no code implementations26 Sep 2021 Wei Shen, Yuhan Li, Yinan Liu, Jiawei Han, Jianyong Wang, Xiaojie Yuan

Entity linking (EL) is the process of linking entity mentions appearing in web text with their corresponding entities in a knowledge base.

Entity Linking Knowledge Base Population +2

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

1 code implementation EMNLP 2021 Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, Jiawei Han

We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base.

Language Modelling Named Entity Recognition +1

Corpus-based Open-Domain Event Type Induction

1 code implementation EMNLP 2021 Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han

As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs.

Event Extraction

Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data

1 code implementation NeurIPS 2021 Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and the graph's true inference distribution.

Multi-head or Single-head? An Empirical Comparison for Transformer Training

1 code implementation17 Jun 2021 Liyuan Liu, Jialu Liu, Jiawei Han

Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications.

Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion

1 code implementation Findings (ACL) 2022 Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han

Typical DocRE methods blindly take the full document as input, while a subset of the sentences in the document, noted as the evidence, are often sufficient for humans to predict the relation of an entity pair.

Relation Extraction

Event Time Extraction and Propagation via Graph Attention Networks

1 code implementation NAACL 2021 Haoyang Wen, Yanru Qu, Heng Ji, Qiang Ning, Jiawei Han, Avi Sil, Hanghang Tong, Dan Roth

Grounding events into a precise timeline is important for natural language understanding but has received limited attention in recent work.

Graph Attention Natural Language Understanding +1

RESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System

1 code implementation NAACL 2021 Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).

Coreference Resolution Event Extraction

Training ELECTRA Augmented with Multi-word Selection

no code implementations Findings (ACL) 2021 Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han

In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.

Multi-Task Learning

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

2 code implementations28 May 2021 Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang

Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.

Keyphrase Extraction Language Modelling +2

Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation

2 code implementations EMNLP 2021 Yuning Mao, Wenchang Ma, Deren Lei, Jiawei Han, Xiang Ren

In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial.

Conditional Text Generation Denoising

The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction

1 code implementation EMNLP 2021 Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss

We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations.

Document-Level Event Argument Extraction by Conditional Generation

1 code implementation NAACL 2021 Sha Li, Heng Ji, Jiawei Han

On the task of argument extraction, we achieve an absolute gain of 7. 6% F1 and 5. 7% F1 over the next best model on the RAMS and WikiEvents datasets respectively.

Document-level Event Extraction Event Extraction

Who Should Go First? A Self-Supervised Concept Sorting Model for Improving Taxonomy Expansion

no code implementations8 Apr 2021 Xiangchen Song, Jiaming Shen, Jieyu Zhang, Jiawei Han

Taxonomies have been widely used in various machine learning and text mining systems to organize knowledge and facilitate downstream tasks.

Toward Tweet Entity Linking with Heterogeneous Information Networks

1 code implementation IEEE Transactions on Knowledge and Data Engineering 2021 Wei Shen, Yuwei Yin, Yang Yang, Jiawei Han, Jianyong Wang, Xiaojie Yuan

The task of linking an entity mention in a tweet with its corresponding entity in a heterogeneous information network is of great importance, for the purpose of enriching heterogeneous information networks with the abundant and fresh knowledge embedded in tweets.

Entity Linking Metric Learning

Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks

no code implementations23 Feb 2021 Xinyang Zhang, Chenwei Zhang, Luna Xin Dong, Jingbo Shang, Jiawei Han

Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.

Product Categorization Text Categorization

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

2 code implementations NeurIPS 2021 Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.

Contrastive Learning Language Modelling +1

MATCH: Metadata-Aware Text Classification in A Large Hierarchy

1 code implementation15 Feb 2021 Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, Jiawei Han

Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set.

Classification General Classification +2

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation1 Jan 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Open-Domain Question Answering

Few-Shot Named Entity Recognition: A Comprehensive Study

2 code implementations29 Dec 2020 Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents a comprehensive study to efficiently build named entity recognition (NER) systems when a small number of in-domain labeled data is available.

Few-Shot Learning Named Entity Recognition +1

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

1 code implementation26 Oct 2020 Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han

Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.

Data Augmentation Document Classification +1

Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation

2 code implementations24 Oct 2020 Yuning Mao, Xiang Ren, Heng Ji, Jiawei Han

Despite significant progress, state-of-the-art abstractive summarization methods are still prone to hallucinate content inconsistent with the source document.

Abstractive Text Summarization Keyphrase Extraction

BiTe-GCN: A New GCN Architecture via BidirectionalConvolution of Topology and Features on Text-Rich Networks

no code implementations23 Oct 2020 Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, Jiawei Han

We propose BiTe-GCN, a novel GCN architecture with bidirectional convolution of both topology and features on text-rich networks to solve these limitations.

On the Transformer Growth for Progressive BERT Training

no code implementations NAACL 2021 Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han

Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity.

Language Modelling

Text Classification Using Label Names Only: A Language Model Self-Training Approach

1 code implementation EMNLP 2020 Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han

In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents.

Classification Document Classification +4

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring

1 code implementation13 Oct 2020 Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han

Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search.

Question Answering

A Spherical Hidden Markov Model for Semantics-Rich Human Mobility Modeling

1 code implementation5 Oct 2020 Wanzheng Zhu, Chao Zhang, Shuochao Yao, Xiaobin Gao, Jiawei Han

We propose SHMM, a multi-modal spherical hidden Markov model for semantics-rich human mobility modeling.

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

no code implementations EMNLP 2020 Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han

To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing.

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation ACL 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Open-Domain Question Answering Passage Retrieval +1

Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization

1 code implementation NeurIPS 2021 Qi Zhu, Carl Yang, Yidan Xu, Haonan Wang, Chao Zhang, Jiawei Han

Graph neural networks (GNNs) have achieved superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs.

Knowledge Graphs Transfer Learning

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

1 code implementation18 Jul 2020 Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora.

Topic Models

GCN for HIN via Implicit Utilization of Attention and Meta-paths

no code implementations6 Jul 2020 Di Jin, Zhizhi Yu, Dongxiao He, Carl Yang, Philip S. Yu, Jiawei Han

Graph neural networks for HIN embeddings typically adopt a hierarchical attention (including node-level and meta-path-level attentions) to capture the information from meta-path-based neighbors.

Octet: Online Catalog Taxonomy Enrichment with Self-Supervision

no code implementations18 Jun 2020 Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han

We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment.

Term Extraction

Unsupervised Differentiable Multi-aspect Network Embedding

1 code implementation7 Jun 2020 Chanyoung Park, Carl Yang, Qi Zhu, Donghyun Kim, Hwanjo Yu, Jiawei Han

To capture the multiple aspects of each node, existing studies mainly rely on offline graph clustering performed prior to the actual embedding, which results in the cluster membership of each node (i. e., node aspect distribution) fixed throughout training of the embedding model.

Graph Clustering Graph Mining +1

Open-Domain Question Answering with Pre-Constructed Question Spaces

no code implementations NAACL 2021 Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han

Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.

Information Retrieval Knowledge Graphs +1

Partially-Typed NER Datasets Integration: Connecting Practice to Theory

no code implementations1 May 2020 Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han

While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.

Named Entity Recognition NER

Minimally Supervised Categorization of Text with Metadata

1 code implementation1 May 2020 Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang, Jiawei Han

Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity.

Document Classification

Empower Entity Set Expansion via Language Model Probing

1 code implementation ACL 2020 Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han

Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.

Language Modelling Question Answering

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

1 code implementation1 Apr 2020 Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han

Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms.

Network Embedding

Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision

no code implementations27 Mar 2020 Xuan Wang, Xiangchen Song, Bangzheng Li, Yingjun Guan, Jiawei Han

We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13).

Named Entity Recognition NER

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

1 code implementation27 Jan 2020 Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han

Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.

Generating Representative Headlines for News Stories

2 code implementations26 Jan 2020 Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, Hongkun Yu, You Wu, Cong Yu, Daniel Finnie, Jiaqi Zhai, Nicholas Zukoski

In this work, we study the problem of generating representative headlines for news stories.

TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

2 code implementations26 Jan 2020 Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, Jiawei Han

Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.

Product Recommendation

cube2net: Efficient Query-Specific Network Construction with Data Cube Organization

no code implementations18 Jan 2020 Carl Yang, Mengxiong Liu, Frank He, Jian Peng, Jiawei Han

With extensive experiments of two classic network mining tasks on different real-world large datasets, we show that our proposed cube2net pipeline is general, and much more effective and efficient in query-specific network construction, compared with other methods without the leverage of data cube or reinforcement learning.


Inf-VAE: A Variational Autoencoder Framework to Integrate Homophily and Influence in Diffusion Prediction

2 code implementations1 Jan 2020 Aravind Sankar, Xinyang Zhang, Adit Krishnan, Jiawei Han

Recent years have witnessed tremendous interest in understanding and predicting information spread on social media platforms such as Twitter, Facebook, etc.

Unsupervised Attributed Multiplex Network Embedding

1 code implementation15 Nov 2019 Chanyoung Park, Donghyun Kim, Jiawei Han, Hwanjo Yu

Even for those that consider the multiplexity of a network, they overlook node attributes, resort to node labels for training, and fail to model the global properties of a graph.

Network Embedding

Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach

no code implementations14 Nov 2019 Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, Jiawei Han

This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections.

News Summarization

Spherical Text Embedding

1 code implementation NeurIPS 2019 Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han

While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding.

Riemannian optimization Word Similarity

Relation Learning on Social Networks with Multi-Modal Graph Edge Variational Autoencoders

no code implementations4 Nov 2019 Carl Yang, Jieyu Zhang, Haonan Wang, Sha Li, Myungwan Kim, Matt Walker, Yiou Xiao, Jiawei Han

While node semantics have been extensively explored in social networks, little research attention has been paid to profile edge semantics, i. e., social relations.

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

1 code implementation17 Oct 2019 Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han

In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features.

Question Answering

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams

1 code implementation10 Oct 2019 Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han

In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.

Language Modelling

Meta-Graph Based HIN Spectral Embedding: Methods, Analyses, and Insights

no code implementations29 Sep 2019 Carl Yang, Yichen Feng, Pan Li, Yu Shi, Jiawei Han

In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner.

Query-Specific Knowledge Summarization with Entity Evolutionary Networks

no code implementations29 Sep 2019 Carl Yang, Lingrui Gan, Zongyi Wang, Jiaming Shen, Jinfeng Xiao, Jiawei Han

Given a query, unlike traditional IR that finds relevant documents or entities, in this work, we focus on retrieving both entities and their connections for insightful knowledge summarization.

Neural Embedding Propagation on Heterogeneous Networks

1 code implementation29 Sep 2019 Carl Yang, Jieyu Zhang, Jiawei Han

While generalizing LP as a simple instance, NEP is far more powerful in its natural awareness of different types of objects and links, and the ability to automatically capture their important interaction patterns.

Network Embedding

I Know You'll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application

no code implementations29 Sep 2019 Carl Yang, Xiaolin Shi, Jie Luo, Jiawei Han

Then we design a novel deep learning pipeline based on LSTM and attention to accurately predict user churn with very limited initial behavior data, by leveraging the correlations among users' multi-dimensional activities and the underlying user types.

Place Deduplication with Embeddings

no code implementations29 Sep 2019 Carl Yang, Do Huy Hoang, Tomas Mikolov, Jiawei Han

Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go.

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

1 code implementation4 Sep 2019 Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han

Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity.

Knowledge Graphs

Facet-Aware Evaluation for Extractive Summarization

1 code implementation ACL 2020 Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han

In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries.

Extractive Summarization Text Summarization

Hierarchical Text Classification with Reinforced Label Assignment

1 code implementation IJCNLP 2019 Yuning Mao, Jingjing Tian, Jiawei Han, Xiang Ren

While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference.

 Ranked #1 on Text Classification on RCV1 (Macro F1 metric)

Classification General Classification +1

Discriminative Topic Mining via Category-Name Guided Text Embedding

1 code implementation20 Aug 2019 Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han

We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora.

Document Classification General Classification +3

Parsimonious Morpheme Segmentation with an Application to Enriching Word Embeddings

no code implementations18 Aug 2019 Ahmed El-Kishky, Frank Xu, Aston Zhang, Jiawei Han

However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures.

Language Modelling Word Embeddings

Raw-to-End Name Entity Recognition in Social Media

1 code implementation14 Aug 2019 Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han

Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.

Named Entity Recognition NER

On the Variance of the Adaptive Learning Rate and Beyond

20 code implementations ICLR 2020 Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

Arabic Named Entity Recognition: What Works and What's Next

no code implementations WS 2019 Liyuan Liu, Jingbo Shang, Jiawei Han

This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder. com.

Ensemble Learning Feature Engineering +2

Task-Guided Pair Embedding in Heterogeneous Network

1 code implementation4 Jun 2019 Chanyoung Park, Donghyun Kim, Qi Zhu, Jiawei Han, Hwanjo Yu

In this paper, we propose a novel task-guided pair embedding framework in heterogeneous network, called TaPEm, that directly models the relationship between a pair of nodes that are related to a specific task (e. g., paper-author relationship in author identification).

Network Embedding

Biomedical Event Extraction based on Knowledge-driven Tree-LSTM

no code implementations NAACL 2019 Diya Li, Lifu Huang, Heng Ji, Jiawei Han

Event extraction for the biomedical domain is more challenging than that in the general news domain since it requires broader acquisition of domain-specific knowledge and deeper understanding of complex contexts.

Entity Linking Event Extraction

STFNets: Learning Sensing Signals from the Time-Frequency Perspective with Short-Time Fourier Neural Networks

no code implementations21 Feb 2019 Shuochao Yao, Ailing Piao, Wenjun Jiang, Yiran Zhao, Huajie Shao, Shengzhong Liu, Dongxin Liu, Jinyang Li, Tianshi Wang, Shaohan Hu, Lu Su, Jiawei Han, Tarek Abdelzaher

IoT applications, however, often measure physical phenomena, where the underlying physics (such as inertia, wireless signal propagation, or the natural frequency of oscillation) are fundamentally a function of signal frequencies, offering better features in the frequency domain.

Speech Recognition

Weakly-Supervised Hierarchical Text Classification

1 code implementation29 Dec 2018 Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism.

Classification Feature Engineering +2

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

1 code implementation22 Dec 2018 Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, Jiawei Han

Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion.


Mining Entity Synonyms with Efficient Neural Set Generation

1 code implementation16 Nov 2018 Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, Jiawei Han

Mining entity synonym sets (i. e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications.

End-to-End Hierarchical Text Classification with Label Assignment Policy

no code implementations27 Sep 2018 Yuning Mao, Jingjing Tian, Jiawei Han, Xiang Ren

We present an end-to-end reinforcement learning approach to hierarchical text classification where documents are labeled by placing them at the right positions in a given hierarchy.

Classification Text Classification

Learning Named Entity Tagger using Domain-Specific Dictionary

1 code implementation EMNLP 2018 Jingbo Shang, Liyuan Liu, Xiang Ren, Xiaotao Gu, Teng Ren, Jiawei Han

Recent advances in deep neural models allow us to build reliable named entity recognition (NER) systems without handcrafting features.

Named Entity Recognition NER

Weakly-Supervised Neural Text Classification

1 code implementation2 Sep 2018 Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types.

Classification Feature Engineering +2

Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks

1 code implementation10 Jul 2018 Yu Shi, Qi Zhu, Fang Guo, Chao Zhang, Jiawei Han

To cope with the challenges in the comprehensive transcription of HINs, we propose the HEER algorithm, which embeds HINs via edge representations that are further coupled with properly-learned heterogeneous metrics.

Feature Engineering Network Embedding

Entropy-Based Subword Mining with an Application to Word Embeddings

no code implementations WS 2018 Ahmed El-Kishky, Frank Xu, Aston Zhang, Stephen Macke, Jiawei Han

Recent literature has shown a wide variety of benefits to mapping traditional one-hot representations of words and phrases to lower-dimensional real-valued vectors known as word embeddings.

Language Modelling Machine Translation +3

End-to-End Reinforcement Learning for Automatic Taxonomy Induction

1 code implementation ACL 2018 Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, Jiawei Han

We present a novel end-to-end reinforcement learning approach to automatic taxonomy induction from a set of terms.


Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach

1 code implementation29 Apr 2018 Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, Jiawei Han

Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types.

Model Selection

Integrating Local Context and Global Cohesiveness for Open Information Extraction

1 code implementation26 Apr 2018 Qi Zhu, Xiang Ren, Jingbo Shang, Yu Zhang, Ahmed El-Kishky, Jiawei Han

However, current Open IE systems focus on modeling local context information in a sentence to extract relation tuples, while ignoring the fact that global statistics in a large corpus can be collectively leveraged to identify high-quality sentence-level extractions.

Open Information Extraction

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

1 code implementation EMNLP 2018 Liyuan Liu, Xiang Ren, Jingbo Shang, Jian Peng, Jiawei Han

Many efforts have been made to facilitate natural language processing tasks with pre-trained language models (LMs), and brought significant improvements to various applications.

Language Modelling Named Entity Recognition

Expert Finding in Heterogeneous Bibliographic Networks with Locally-trained Embeddings

no code implementations9 Mar 2018 Huan Gui, Qi Zhu, Liyuan Liu, Aston Zhang, Jiawei Han

We study the task of expert finding in heterogeneous bibliographical networks based on two aspects: textual content analysis and authority ranking.

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

no code implementations5 Mar 2018 Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han

Therefore, we are motivated to propose a novel embedding learning framework---AspEm---to preserve the semantic information in HINs based on multiple aspects.

Link Prediction Network Embedding

Investigating Rumor News Using Agreement-Aware Search

1 code implementation21 Feb 2018 Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han

We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences.

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

2 code implementations30 Jan 2018 Xuan Wang, Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, Jiawei Han

Motivation: State-of-the-art biomedical named entity recognition (BioNER) systems often require handcrafted features specific to each entity type, such as genes, chemicals and diseases.

Feature Engineering Multi-Task Learning +1

mvn2vec: Preservation and Collaboration in Multi-View Network Embedding

1 code implementation19 Jan 2018 Yu Shi, Fangqiu Han, Xinwei He, Xinran He, Carl Yang, Jie Luo, Jiawei Han

With experiments on a series of synthetic datasets, a large-scale internal Snapchat dataset, and two public datasets, we confirm the validity and importance of preservation and collaboration as two objectives for multi-view network embedding.

Network Embedding

Graph Clustering with Dynamic Embedding

1 code implementation21 Dec 2017 Carl Yang, Mengxiong Liu, Zongyi Wang, Liyuan Liu, Jiawei Han

Unlike most existing embedding methods that are task-agnostic, we simultaneously solve for the underlying node representations and the optimal clustering assignments in an end-to-end manner.

Social and Information Networks Physics and Society

Indirect Supervision for Relation Extraction using Question-Answer Pairs

2 code implementations30 Oct 2017 Zeqiu Wu, Xiang Ren, Frank F. Xu, Ji Li, Jiawei Han

However, due to the incompleteness of knowledge bases and the context-agnostic labeling, the training data collected via distant supervision (DS) can be very noisy.

Question Answering Relation Extraction

An Attention-based Collaboration Framework for Multi-View Network Representation Learning

1 code implementation19 Sep 2017 Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, Jiawei Han

Existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network.

Representation Learning

Empower Sequence Labeling with Task-Aware Neural Language Model

3 code implementations13 Sep 2017 Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng, Jiawei Han

In this study, we develop a novel neural framework to extract abundant knowledge hidden in raw texts to empower the sequence labeling task.

Language Modelling Named Entity Recognition +4

Identifying Semantically Deviating Outlier Documents

no code implementations EMNLP 2017 Honglei Zhuang, Chi Wang, Fangbo Tao, Lance Kaplan, Jiawei Han

A document outlier is a document that substantially deviates in semantics from the majority ones in a corpus.

Outlier Detection

Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach

1 code implementation EMNLP 2017 Liyuan Liu, Xiang Ren, Qi Zhu, Shi Zhi, Huan Gui, Heng Ji, Jiawei Han

These annotations, referred as heterogeneous supervision, often conflict with each other, which brings a new challenge to the original relation extraction task: how to infer the true label from noisy labels for a given instance.

Relation Extraction Representation Learning

Automatic Synonym Discovery with Knowledge Bases

1 code implementation25 Jun 2017 Meng Qu, Xiang Ren, Jiawei Han

In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus.

PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks

no code implementations5 Jun 2017 Yu Shi, Po-Wei Chan, Honglei Zhuang, Huan Gui, Jiawei Han

We also identify, from real-world data, and propose to model cross-meta-path synergy, which is a characteristic important for defining path-based HIN relevance and has not been modeled by existing methods.

MetaPAD: Meta Pattern Discovery from Massive Text Corpora

no code implementations13 Mar 2017 Meng Jiang, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M. Kaplan, Timothy P. Hanratty, Jiawei Han

We propose an efficient framework, called MetaPAD, which discovers meta patterns from massive corpora with three techniques: (1) it develops a context-aware segmentation method to carefully determine the boundaries of patterns with a learnt pattern quality assessment function, which avoids costly dependency parsing and generates high-quality patterns; (2) it identifies and groups synonymous meta patterns from multiple facets---their types, contexts, and extractions; and (3) it examines type distributions of entities in the instances extracted by each group of patterns, and looks for appropriate type levels to make discovered patterns precise.

Dependency Parsing

Automated Phrase Mining from Massive Text Corpora

4 code implementations15 Feb 2017 Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R. Voss, Jiawei Han

As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality phrases from a text corpus.


FastHybrid: A Hybrid Model for Efficient Answer Selection

no code implementations COLING 2016 Lidan Wang, Ming Tan, Jiawei Han

In this paper, we propose an extremely efficient hybrid model (FastHybrid) that tackles the problem from both an accuracy and scalability point of view.

Answer Selection Information Retrieval +1

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

no code implementations31 Oct 2016 Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han

In the literature, two series of models have been proposed to address prediction problems including classification and regression.

CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases

2 code implementations27 Oct 2016 Xiang Ren, Zeqiu Wu, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Tarek F. Abdelzaher, Jiawei Han

We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations.

Joint Entity and Relation Extraction Text Segmentation

pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

no code implementations22 Oct 2016 Julie Yixuan Zhu, Chao Zhang, Huichu Zhang, Shi Zhi, Victor O. K. Li, Jiawei Han, Yu Zheng

Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}.

World Knowledge as Indirect Supervision for Document Clustering

no code implementations30 Jul 2016 Chenguang Wang, Yangqiu Song, Dan Roth, Ming Zhang, Jiawei Han

We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network.

Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding

3 code implementations17 Feb 2016 Xiang Ren, Wenqi He, Meng Qu, Clare R. Voss, Heng Ji, Jiawei Han

Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions.

Entity Typing Semantic Similarity +1

Robust Tensor Decomposition with Gross Corruption

no code implementations NeurIPS 2014 Quanquan Gu, Huan Gui, Jiawei Han

In this paper, we study the statistical performance of robust tensor decomposition with gross corruption.

Tensor Decomposition

Scalable Topical Phrase Mining from Text Corpora

no code implementations24 Jun 2014 Ahmed El-Kishky, Yanglei Song, Chi Wang, Clare Voss, Jiawei Han

Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition.

Topic Models

Scalable and Robust Construction of Topical Hierarchies

no code implementations13 Mar 2014 Chi Wang, Xueqing Liu, Yanglei Song, Jiawei Han

Automated generation of high-quality topical hierarchies for a text collection is a dream problem in knowledge engineering with many valuable applications.

Selective Labeling via Error Bound Minimization

no code implementations NeurIPS 2012 Quanquan Gu, Tong Zhang, Jiawei Han, Chris H. Ding

In particular, we derive a deterministic generalization error bound for LapRLS trained on subsampled data, and propose to select a subset of data points to label by minimizing this upper bound.

Generalized Fisher Score for Feature Selection

1 code implementation14 Feb 2012 Quanquan Gu, Zhenhui Li, Jiawei Han

Fisher score is one of the most widely used supervised feature selection methods.

Graph Regularized Nonnegative Matrix Factorization for Data Representation

no code implementations IEEE Transactions on Pattern Analysis and Machine Intelligence 2011 Deng Cai, Xiaofei He, Jiawei Han, Thomas S. Huang

In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure.

Information Retrieval