no code implementations • ACL 2022 • Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han
Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.
no code implementations • EMNLP 2021 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han
This paper presents an empirical study to efficiently build named entity recognition (NER) systems when a small amount of in-domain labeled data is available.
1 code implementation • NAACL (ACL) 2022 • Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Jie Lei, Hyounghun Kim, Rotem Dror, Haoyu Wang, Michael Regan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer, Heng Ji
We introduce RESIN-11, a new schema-guided event extraction&prediction framework that can be applied to a large variety of newsworthy scenarios.
1 code implementation • EMNLP 2021 • Xuan Wang, Vivian Hu, Xiangchen Song, Shweta Garg, Jinfeng Xiao, Jiawei Han
For example, chemistry research needs to study dozens to hundreds of distinct, fine-grained entity types, making consistent and accurate annotation difficult even for crowds of domain experts.
1 code implementation • 12 Mar 2025 • Bowen Jin, Hansi Zeng, Zhenrui Yue, Dong Wang, Hamed Zamani, Jiawei Han
Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs).
no code implementations • 24 Feb 2025 • Ayush Kumar Shah, Abhisek Dey, Leo Luo, Bryan Amador, Patrick Philippy, Ming Zhong, Siru Ouyang, David Mark Friday, David Bianchi, Nick Jackson, Richard Zanibbi, Jiawei Han
We present a multimodal search tool that facilitates retrieval of chemical reactions, molecular structures, and associated text from scientific literature.
1 code implementation • 20 Feb 2025 • Priyanka Kargupta, Ishika Agarwal, Tal August, Jiawei Han
With the exponential growth of research facilitated by modern technology and improved accessibility, scientific discoveries have become increasingly fragmented within and across fields.
no code implementations • 17 Feb 2025 • Yi Fang, Bowen Jin, Jiacheng Shen, Sirui Ding, Qiaoyu Tan, Jiawei Han
The rapid development of Multimodal Large Language Models (MLLMs) has enabled the integration of multiple modalities, including texts and images, within the large language model (LLM) framework.
no code implementations • 16 Feb 2025 • SeongKu Kang, Bowen Jin, Wonbin Kweon, Yu Zhang, Dongha Lee, Jiawei Han, Hwanjo Yu
In specialized fields like the scientific domain, constructing large-scale human-annotated datasets poses a significant challenge due to the need for domain expertise.
1 code implementation • 16 Feb 2025 • Pengcheng Jiang, Lang Cao, Ruike Zhu, Minhao Jiang, Yunyi Zhang, Jimeng Sun, Jiawei Han
Retrieval-augmented language models often struggle with knowledge-intensive tasks due to inefficient retrieval, unstructured knowledge integration, and single-pass architectures.
no code implementations • 6 Feb 2025 • Bowen Jin, Jinsung Yoon, Zhen Qin, Ziqi Wang, Wei Xiong, Yu Meng, Jiawei Han, Sercan O. Arik
In this work, we introduce a novel direct optimization approach for LLM alignment by drawing on established Information Retrieval (IR) principles.
1 code implementation • 3 Feb 2025 • Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu
All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge.
no code implementations • 12 Jan 2025 • Jimeng Shi, Azam Shirali, Bowen Jin, Sizhe Zhou, Wei Hu, Rahuul Rangaraj, Shaowen Wang, Jiawei Han, Zhaonan Wang, Upmanu Lall, Yanzhao Wu, Leonardo Bobadilla, Giri Narasimhan
Physics-based numerical models have been the bedrock of atmospheric sciences for decades, offering robust solutions but often at the cost of significant computational resources.
no code implementations • 11 Dec 2024 • Zihao Li, Lecheng Zheng, Bowen Jin, Dongqi Fu, Baoyu Jing, Yikun Ban, Jingrui He, Jiawei Han
In this work, to address these issues, we leverage multi-modal prompt learning to effectively adapt pre-trained GNN to downstream tasks and data, given only a few semantically labeled samples, each with extremely weak text supervision.
no code implementations • 25 Oct 2024 • SeongKu Kang, Yunyi Zhang, Pengcheng Jiang, Dongha Lee, Jiawei Han, Hwanjo Yu
Academic paper search is an essential task for efficient literature discovery and scientific advancement.
no code implementations • 24 Oct 2024 • Sha Li, Revanth Gangi Reddy, Khanh Duy Nguyen, Qingyun Wang, May Fung, Chi Han, Jiawei Han, Kartik Natarajan, Clare R. Voss, Heng Ji
Complex news events, such as natural disasters and socio-political conflicts, require swift responses from the government and society.
1 code implementation • 23 Oct 2024 • Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models.
no code implementations • 14 Oct 2024 • Siru Ouyang, Shuohang Wang, Minhao Jiang, Ming Zhong, Donghan Yu, Jiawei Han, Yelong Shen
This paper delves into the effects of decoding temperatures on speculative decoding's efficacy.
1 code implementation • 9 Oct 2024 • Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han
In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs).
no code implementations • 8 Oct 2024 • Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan O. Arik
Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources.
1 code implementation • 6 Oct 2024 • Pengcheng Jiang, Cao Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han
KARE constructs a comprehensive multi-source KG by integrating biomedical databases, clinical literature, and LLM-generated insights, and organizes it using hierarchical graph community detection and summarization for precise and contextually relevant information retrieval.
1 code implementation • 3 Oct 2024 • Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, Dong Yu
Unlike traditional function-level or file-level coding tasks, AI software engineering requires not only basic coding proficiency but also advanced skills in managing and interacting with code repositories.
no code implementations • 2 Oct 2024 • Suyu Ge, Xihui Lin, Yunan Zhang, Jiawei Han, Hao Peng
To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by training on long-context data, followed by architectural modifications to reduce the overhead of KV cache during serving.
1 code implementation • 30 Sep 2024 • Ming Zhong, Aston Zhang, Xuewei Wang, Rui Hou, Wenhan Xiong, Chenguang Zhu, Zhengxing Chen, Liang Tan, Chloe Bi, Mike Lewis, Sravya Popuri, Sharan Narang, Melanie Kambadur, Dhruv Mahajan, Sergey Edunov, Jiawei Han, Laurens van der Maaten
The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities.
no code implementations • 9 Sep 2024 • Jimeng Shi, Bowen Jin, Jiawei Han, Sundararaman Gopalakrishnan, Giri Narasimhan
Our conditional diffusion model, CoDiCast, can generate 6-day global weather forecasts, at 6-hour steps and $5. 625^\circ$ latitude-longitude resolution, for over 5 variables, in about 12 minutes on a commodity A100 GPU machine with 80GB memory.
1 code implementation • 20 Aug 2024 • Jiawei Han, Kaiqi Liu, Wei Li, Guangzhi Chen
Specifically, the point cloud is initially separated into independent point sets by category to provide initial conditions for the generation of feature subspaces.
Ranked #10 on
Semantic Segmentation
on S3DIS Area5
1 code implementation • 10 Aug 2024 • Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chang, Shou-De Lin, Jiawei Han
Inspired by the recent advancements of Large Language Models (LLMs) in NLP tasks, there's growing interest in applying LLMs to graph-related tasks.
1 code implementation • 9 Aug 2024 • Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao, Siru Ouyang, Jiawei Han
Episodic structures are inherently interpretable and adaptable to evolving large-scale key events.
no code implementations • 17 Jul 2024 • Sizhe Zhou, Sha Li, Yu Meng, Yizhu Jiao, Heng Ji, Jiawei Han
Language models are known to encode a great amount of factual knowledge through pretraining.
no code implementations • 23 Jun 2024 • Zexing Xu, Linjun Zhang, Sitan Yang, Rasoul Etesami, Hanghang Tong, huan zhang, Jiawei Han
In this paper, we propose a novel approach that leverages strategically chosen proxy data reflective of potential sales patterns from similar entities during non-peak periods, enriched by features learned from a graph neural networks (GNNs)-based forecasting model, to predict demand during peak events.
1 code implementation • 17 Jun 2024 • Priyanka Kargupta, Ishika Agarwal, Dilek Hakkani-Tur, Jiawei Han
We tackle this issue in the code debugging domain with TreeInstruct, an Instructor agent guided by a novel state space-based planning algorithm.
1 code implementation • 16 Jun 2024 • Yu Zhang, Xiusi Chen, Bowen Jin, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han
In many scientific fields, large language models (LLMs) have revolutionized the way text and other modalities of data (e. g., molecules and proteins) are handled, achieving superior performance in various applications and augmenting the scientific discovery process.
no code implementations • 3 Jun 2024 • Ryo Kamoi, Yusen Zhang, Nan Zhang, Jiawei Han, Rui Zhang
Our critical survey based on the newly categorized research questions shows that (1) no prior work demonstrates successful self-correction with feedback from prompted LLMs, except for studies in tasks that are exceptionally suited for self-correction, (2) self-correction works well in tasks that can use reliable external feedback, and (3) large-scale fine-tuning enables self-correction.
no code implementations • 29 May 2024 • Fangzhi Xu, Qika Lin, Tianzhe Zhao, Jiawei Han, Jun Liu
In particular, we propose a path-attention module to joint model in-atom and cross-atom relations with the high-order diffusion strategy.
1 code implementation • 26 May 2024 • Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han
Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery.
no code implementations • 29 Apr 2024 • Linyi Ding, Sizhe Zhou, Jinfeng Xiao, Jiawei Han
Specifically, we start with an entity ontology of the theme from Wikipedia, based on which we then generate candidate relations by Large Language Models (LLMs) to construct a relation ontology.
2 code implementations • 18 Apr 2024 • Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma
Generative models for structure-based drug design (SBDD) have shown promising results in recent years.
1 code implementation • 10 Apr 2024 • Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, Jiawei Han
Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
no code implementations • 15 Mar 2024 • Pengcheng Jiang, Cao Xiao, Zifeng Wang, Parminder Bhatia, Jimeng Sun, Jiawei Han
To overcome this, we introduce TriSum, a framework for distilling LLMs' text summarization abilities into a compact, local model.
1 code implementation • 13 Mar 2024 • Sayar Ghosh Roy, Jiawei Han
We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans.
1 code implementation • 7 Mar 2024 • SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, Jiawei Han
Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs).
no code implementations • 29 Feb 2024 • Yunyi Zhang, Ruozhen Yang, Xueqiang Xu, Rui Li, Jinfeng Xiao, Jiaming Shen, Jiawei Han
On the other hand, previous weakly-supervised hierarchical text classification methods only utilize the raw taxonomy skeleton and ignore the rich information hidden in the text corpus that can serve as additional class-indicative features.
no code implementations • 26 Feb 2024 • Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen
Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images.
1 code implementation • 20 Feb 2024 • Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han
To be specific, we identify two common skills needed for entity set expansion, taxonomy expansion, and seed-guided taxonomy construction: finding "siblings" and finding "parents".
1 code implementation • 17 Feb 2024 • Sizhe Zhou, Yu Meng, Bowen Jin, Jiawei Han
(3) We expand pattern coverage and mitigate bias from initial seeds by integrating feedback from the SLM's predictions on the unlabeled corpus and the synthesis history.
1 code implementation • 16 Feb 2024 • Pengcheng Jiang, Jiacheng Lin, Zifeng Wang, Jimeng Sun, Jiawei Han
The field of relation extraction (RE) is experiencing a notable shift towards generative relation extraction (GRE), leveraging the capabilities of large language models (LLMs).
1 code implementation • 6 Feb 2024 • Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang
Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification.
1 code implementation • 23 Jan 2024 • Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, Jiawei Han
In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i. e., those without seed entities).
1 code implementation • 18 Jan 2024 • Qingyun Wang, Zixuan Zhang, Hongxiang Li, Xuan Liu, Jiawei Han, Huimin Zhao, Heng Ji
Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges.
no code implementations • 11 Jan 2024 • Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo
Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks.
1 code implementation • 10 Jan 2024 • Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao
This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.
1 code implementation • 5 Dec 2023 • Khanh Duy Nguyen, Zixuan Zhang, Reece Suchocki, Sha Li, Martha Palmer, Susan Brown, Jiawei Han, Heng Ji
In this paper, we present RESIN-EDITOR, an interactive event graph visualizer and editor designed for analyzing complex events.
1 code implementation • 5 Dec 2023 • Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, Jiawei Han
Besides, although LLMs have shown their pure text-based reasoning ability, it is underexplored whether such ability can be generalized to graphs (i. e., graph-based reasoning).
1 code implementation • 27 Nov 2023 • Susik Yoon, Yu Meng, Dongha Lee, Jiawei Han
With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories.
1 code implementation • 16 Nov 2023 • Siru Ouyang, Zhuosheng Zhang, Bing Yan, Xuan Liu, Yejin Choi, Jiawei Han, Lianhui Qin
Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in the field of chemistry.
no code implementations • 13 Nov 2023 • Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao
Specifically, an adversarial LLM and a target LLM interplay with each other in an iterative manner, where the adversarial LLM aims to generate challenging prompts that elicit unsafe responses from the target LLM, while the target LLM is fine-tuned with safety aligned data on these adversarial prompts.
no code implementations • 3 Nov 2023 • Kun Zhou, Yutao Zhu, Zhipeng Chen, Wentong Chen, Wayne Xin Zhao, Xu Chen, Yankai Lin, Ji-Rong Wen, Jiawei Han
Large language models~(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity.
1 code implementation • 24 Oct 2023 • Yizhu Jiao, Ming Zhong, Sha Li, Ruining Zhao, Siru Ouyang, Heng Ji, Jiawei Han
However, when it comes to information extraction - a classic task in natural language processing - most task-specific systems cannot align well with long-tail ad hoc extraction use cases for non-expert users.
1 code implementation • 23 Oct 2023 • Yu Zhang, Yanzhen Shen, SeongKu Kang, Xiusi Chen, Bowen Jin, Jiawei Han
To address this issue, we propose a unified model for paper-reviewer matching that jointly considers semantic, topic, and citation factors.
1 code implementation • 19 Oct 2023 • Siru Ouyang, Shuohang Wang, Yang Liu, Ming Zhong, Yizhu Jiao, Dan Iter, Reid Pryzant, Chenguang Zhu, Heng Ji, Jiawei Han
Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks.
1 code implementation • 17 Oct 2023 • Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He
In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
no code implementations • 11 Oct 2023 • Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, Jiawei Han
In this study, we propose OnEFET, where we (1) enrich each node in the ontology structure with two types of extra information: instance information for training sample augmentation and topic information to relate types to contexts, and (2) develop a coarse-to-fine typing algorithm that exploits the enriched information by training an entailment model with contrasting topics and instance-based augmented training samples.
1 code implementation • 11 Oct 2023 • Bowen Jin, Hansi Zeng, Guoyin Wang, Xiusi Chen, Tianxin Wei, Ruirui Li, Zhengyang Wang, Zheng Li, Yang Li, Hanqing Lu, Suhang Wang, Jiawei Han, Xianfeng Tang
Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs.
1 code implementation • 10 Oct 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, Jiawei Han
Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings.
2 code implementations • 3 Oct 2023 • Suyu Ge, Yunan Zhang, Liyuan Liu, Minjia Zhang, Jiawei Han, Jianfeng Gao
In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs).
1 code implementation • 5 Jul 2023 • Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch, Jiawei Han
Event schemas are a form of world knowledge about the typical progression of events.
no code implementations • 4 Jul 2023 • Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han
Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design.
1 code implementation • 24 Jun 2023 • Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).
1 code implementation • 16 Jun 2023 • Fangzhi Xu, Qika Lin, Jiawei Han, Tianzhe Zhao, Jun Liu, Erik Cambria
Logical reasoning consistently plays a fundamental and significant role in the domains of knowledge engineering and artificial intelligence.
no code implementations • 5 Jun 2023 • Qi Zhu, Yizhu Jiao, Natalia Ponomareva, Jiawei Han, Bryan Perozzi
Graph Neural Networks (GNNs) have shown remarkable performance on graph-structured data.
no code implementations • 2 Jun 2023 • Pengcheng Jiang, Cao Xiao, Tianfan Fu, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, Jiawei Han
In this paper, we introduce a novel method called GODE, which accounts for the dual-level structure inherent in molecules.
1 code implementation • 24 May 2023 • Pengcheng Jiang, Shivam Agarwal, Bowen Jin, Xuan Wang, Jimeng Sun, Jiawei Han
The mission of open knowledge graph (KG) completion is to draw new findings from known facts.
1 code implementation • 23 May 2023 • Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang
Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.
1 code implementation • 23 May 2023 • Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.
no code implementations • 21 May 2023 • Tanay Komarlu, Minhao Jiang, Xuan Wang, Jiawei Han
Fine-grained entity typing (FET), which assigns entities in text with context-sensitive, fine-grained semantic types, is a basic but important task for knowledge extraction from unstructured text.
no code implementations • 20 May 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, Jiawei Han
A real-world text corpus sometimes comprises not only text documents but also semantic links between them (e. g., academic papers in a bibliographic network are linked by citations and co-authorships).
1 code implementation • 8 Apr 2023 • Susik Yoon, Dongha Lee, Yunyi Zhang, Jiawei Han
Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.
1 code implementation • 4 Apr 2023 • Priyanka Kargupta, Tanay Komarlu, Susik Yoon, Xuan Wang, Jiawei Han
By preserving the heterogeneity of potential classes, MEGClass can select the most informative class-indicative documents as iterative feedback to enhance the initial word-based class representations and ultimately fine-tune a pre-trained text classifier.
1 code implementation • 16 Mar 2023 • Qiusi Zhan, Sha Li, Kathryn Conger, Martha Palmer, Heng Ji, Jiawei Han
Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset.
1 code implementation • 21 Feb 2023 • Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han
Edges in many real-world social/information networks are associated with rich text information (e. g., user-user communications or user-product reviews).
1 code implementation • 10 Feb 2023 • Susik Yoon, Hou Pong Chan, Jiawei Han
Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set.
1 code implementation • 7 Feb 2023 • Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, Jiawei Han
Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature.
no code implementations • 7 Feb 2023 • Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han, Paul Bennett
In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the option to "plug in" new memory at inference time.
1 code implementation • 4 Feb 2023 • Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer
In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.
1 code implementation • 12 Dec 2022 • Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.
no code implementations • 5 Dec 2022 • Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han
Given a few seed entities of a certain type (e. g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.
1 code implementation • 30 Nov 2022 • Sha Li, Heng Ji, Jiawei Han
To tackle this problem, we introduce the idea of type abstraction, where the model is prompted to generalize and name the type.
1 code implementation • 6 Nov 2022 • Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han
In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set.
1 code implementation • 3 Nov 2022 • Yizhu Jiao, Sha Li, Yiqing Xie, Ming Zhong, Heng Ji, Jiawei Han
Specifically, we formulate the role prediction problem as an in-filling task and construct prompts for a pre-trained language model to generate candidate roles.
1 code implementation • 25 Oct 2022 • Jianhao Shen, Chenguang Wang, Ye Yuan, Jiawei Han, Heng Ji, Koushik Sen, Ming Zhang, Dawn Song
For instance, we outperform the fully finetuning approaches on a KG completion benchmark by tuning only 1% of the parameters.
Ranked #5 on
Link Prediction
on UMLS
no code implementations • 20 Oct 2022 • Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han
We show that our approach improves the general reasoning ability of a 540B-parameter LLM (74. 4%->82. 1% on GSM8K, 78. 2%->83. 0% on DROP, 90. 0%->94. 4% on OpenBookQA, and 63. 4%->67. 9% on ANLI-A3) and achieves state-of-the-art-level performance, without any ground truth label.
Ranked #1 on
Question Answering
on DROP
no code implementations • 18 Oct 2022 • Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu, Jiawei Han
Topic taxonomies display hierarchical topic structures of a text corpus and provide topical knowledge to enhance various NLP applications.
2 code implementations • 13 Oct 2022 • Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, PengFei Liu, Chenguang Zhu, Heng Ji, Jiawei Han
We re-frame NLG evaluation as a Boolean Question Answering (QA) task, and by guiding the model with different questions, we can use one evaluator to evaluate from multiple dimensions.
no code implementations • 29 Sep 2022 • Liwen Sun, Jiawei Han
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification where only a few annotated examples are given for each class.
no code implementations • 17 Sep 2022 • Hang Yu, Jiawei Han
Query-based text summarization is an important real world problem that requires to condense the prolix text data into a summary under the guidance of the query information provided by users.
1 code implementation • 15 Sep 2022 • Xinyang Zhang, Yury Malkov, Omar Florez, Serim Park, Brian McWilliams, Jiawei Han, Ahmed El-Kishky
Most existing PLMs are not tailored to the noisy user-generated text on social media, and the pre-training does not factor in the valuable social engagement logs available in a social network.
1 code implementation • 21 Aug 2022 • Dawei Zhou, Lecheng Zheng, Dongqi Fu, Jiawei Han, Jingrui He
To comprehend heterogeneous graph signals at different granularities, we propose a curriculum learning paradigm that automatically re-weighs graph signals in order to ensure a good generalization in the target domain.
1 code implementation • 28 Jun 2022 • Jiaxin Huang, Yu Meng, Jiawei Han
We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type.
no code implementations • 15 Jun 2022 • Zhizhi Yu, Di Jin, Jianguo Wei, Ziyang Liu, Yue Shang, Yun Xiao, Jiawei Han, Lingfei Wu
Graph Neural Networks (GNNs) have gained great popularity in tackling various analytical tasks on graph-structured data (i. e., networks).
1 code implementation • 8 Jun 2022 • Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han
Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge.
no code implementations • 6 Jun 2022 • Hongwei Wang, Zixuan Zhang, Sha Li, Jiawei Han, Yizhou Sun, Hanghang Tong, Joseph P. Olive, Heng Ji
Existing link prediction or graph completion methods have difficulty dealing with event graphs because they are usually designed for a single large graph such as a social network or a knowledge graph, rather than multiple small dynamic event graphs.
no code implementations • 22 May 2022 • Jiaxin Huang, Tianqi Liu, Jialu Liu, Adam D. Lelkes, Cong Yu, Jiawei Han
Multi-Task Learning (MTL) models have shown their robustness, effectiveness, and efficiency for transferring learned knowledge across tasks.
1 code implementation • 20 May 2022 • Bowen Jin, Yu Zhang, Qi Zhu, Jiawei Han
In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure.
1 code implementation • 12 May 2022 • Yuning Mao, Ming Zhong, Jiawei Han
Scientific extreme summarization (TLDR) aims to form ultra-short summaries of scientific papers.
Ranked #1 on
Extreme Summarization
on CiteSum
1 code implementation • NAACL 2022 • Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han
Discovering latent topics from text corpora has been studied for decades.
1 code implementation • 29 Apr 2022 • Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, Jingbo Shang, Christos Faloutsos, Jiawei Han
Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that arose from constantly changing data.
1 code implementation • ICLR 2022 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song
We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.
no code implementations • 7 Mar 2022 • Qi Zhu, Chao Zhang, Chanyoung Park, Carl Yang, Jiawei Han
Then a shift-robust classifier is optimized on training graph and adversarial samples on target graph, which are generated by cluster GNN.
no code implementations • 15 Feb 2022 • Sha Li, Liyuan Liu, Yiqing Xie, Heng Ji, Jiawei Han
Our framework decomposes event detection into an identification task and a localization task.
1 code implementation • 11 Feb 2022 • Yu Zhang, Zhihong Shen, Chieh-Han Wu, Boya Xie, Junheng Hao, Ye-Yi Wang, Kuansan Wang, Jiawei Han
Large-scale multi-label text classification (LMTC) aims to associate a document with its relevant labels from a large candidate set.
no code implementations • 10 Feb 2022 • Minhao Jiang, Xiangchen Song, Jieyu Zhang, Jiawei Han
Taxonomies are fundamental to many real-world applications in various domains, serving as structural representations of knowledge.
1 code implementation • 9 Feb 2022 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han
Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models.
1 code implementation • 9 Feb 2022 • Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e. g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e. g., BERT) have been the prominent choice for natural language understanding (NLU) tasks.
Ranked #5 on
Zero-Shot Text Classification
on AG News
2 code implementations • 29 Jan 2022 • Ming Zhong, Yang Liu, Suyu Ge, Yuning Mao, Yizhu Jiao, Xingxing Zhang, Yichong Xu, Chenguang Zhu, Michael Zeng, Jiawei Han
In this paper, we propose the first unsupervised multi-granularity summarization framework, GranuSum.
no code implementations • 18 Jan 2022 • Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu
Topic taxonomies, which represent the latent topic (or category) structure of document collections, provide valuable knowledge of contents in many applications such as web search and information filtering.
1 code implementation • NeurIPS 2021 • Di Jin, Zhizhi Yu, Cuiying Huo, Rui Wang, Xiao Wang, Dongxiao He, Jiawei Han
So can we reasonably utilize these segmentation rules to design a universal propagation mechanism independent of the network structural assumption?
no code implementations • 24 Nov 2021 • Dongha Lee, Dongmin Hyun, Jiawei Han, Hwanjo Yu
To address this challenge, we introduce a new task referred to as out-of-category detection, which aims to distinguish the documents according to their semantic relevance to the inlier (or target) categories by using the category names as weak supervision.
1 code implementation • 7 Nov 2021 • Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided.
no code implementations • 17 Oct 2021 • Suyu Ge, Jiaxin Huang, Yu Meng, Sharon Wang, Jiawei Han
Opinion summarization aims to profile a target by extracting opinions from multiple documents.
1 code implementation • ACL 2022 • Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa
Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited.
no code implementations • 26 Sep 2021 • Wei Shen, Yuhan Li, Yinan Liu, Jiawei Han, Jianyong Wang, Xiaojie Yuan
Entity linking (EL) is the process of linking entity mentions appearing in web text with their corresponding entities in a knowledge base.
1 code implementation • NAACL 2022 • Yuxin Xiao, Zecheng Zhang, Yuning Mao, Carl Yang, Jiawei Han
Consequently, it is more challenging to encode the key information sources--relevant contexts and entity types.
Ranked #1 on
Relation Extraction
on CDR
1 code implementation • ICLR 2022 • Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, Martin D. Burke
Molecule representation learning (MRL) methods aim to embed molecules into a real vector space.
1 code implementation • EMNLP 2021 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, Jiawei Han
We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base.
1 code implementation • EMNLP 2021 • Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han
As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs.
1 code implementation • NeurIPS 2021 • Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi
In this work we present a method, Shift-Robust GNN (SR-GNN), designed to account for distributional differences between biased training data and the graph's true inference distribution.
1 code implementation • 17 Jun 2021 • Liyuan Liu, Jialu Liu, Jiawei Han
Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications.
1 code implementation • Findings (ACL) 2022 • Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han
Typical DocRE methods blindly take the full document as input, while a subset of the sentences in the document, noted as the evidence, are often sufficient for humans to predict the relation of an entity pair.
Ranked #5 on
Relation Extraction
on DocRED
no code implementations • NAACL 2021 • Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy.
Multi Label Text Classification
Multi-Label Text Classification
+3
1 code implementation • NAACL 2021 • Haoyang Wen, Yanru Qu, Heng Ji, Qiang Ning, Jiawei Han, Avi Sil, Hanghang Tong, Dan Roth
Grounding events into a precise timeline is important for natural language understanding but has received limited attention in recent work.
1 code implementation • NAACL 2021 • Haoyang Wen, Ying Lin, Tuan Lai, Xiaoman Pan, Sha Li, Xudong Lin, Ben Zhou, Manling Li, Haoyu Wang, Hongming Zhang, Xiaodong Yu, Alexander Dong, Zhenhailong Wang, Yi Fung, Piyush Mishra, Qing Lyu, D{\'\i}dac Sur{\'\i}s, Brian Chen, Susan Windisch Brown, Martha Palmer, Chris Callison-Burch, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Heng Ji
We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video).
no code implementations • Findings (ACL) 2021 • Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han
In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
2 code implementations • 28 May 2021 • Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang
Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.
Ranked #1 on
Phrase Tagging
on KPTimes
2 code implementations • EMNLP 2021 • Yuning Mao, Wenchang Ma, Deren Lei, Jiawei Han, Xiang Ren
In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial.
1 code implementation • EMNLP 2021 • Manling Li, Sha Li, Zhenhailong Wang, Lifu Huang, Kyunghyun Cho, Heng Ji, Jiawei Han, Clare Voss
We introduce a new concept of Temporal Complex Event Schema: a graph-based schema representation that encompasses events, arguments, temporal connections and argument relations.
1 code implementation • NAACL 2021 • Sha Li, Heng Ji, Jiawei Han
On the task of argument extraction, we achieve an absolute gain of 7. 6% F1 and 5. 7% F1 over the next best model on the RAMS and WikiEvents datasets respectively.
Document-level Event Extraction
Event Argument Extraction
+2
no code implementations • 8 Apr 2021 • Xiangchen Song, Jiaming Shen, Jieyu Zhang, Jiawei Han
Taxonomies have been widely used in various machine learning and text mining systems to organize knowledge and facilitate downstream tasks.
1 code implementation • IEEE Transactions on Knowledge and Data Engineering 2021 • Wei Shen, Yuwei Yin, Yang Yang, Jiawei Han, Jianyong Wang, Xiaojie Yuan
The task of linking an entity mention in a tweet with its corresponding entity in a heterogeneous information network is of great importance, for the purpose of enriching heterogeneous information networks with the abundant and fresh knowledge embedded in tweets.
no code implementations • 23 Feb 2021 • Xinyang Zhang, Chenwei Zhang, Luna Xin Dong, Jingbo Shang, Jiawei Han
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
2 code implementations • NeurIPS 2021 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song
The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.
1 code implementation • 15 Feb 2021 • Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, Jiawei Han
Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set.
6 code implementations • ACL (RepL4NLP) 2021 • Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan
Contrastive learning has been applied successfully to learn vector representations of text.
1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.
2 code implementations • 29 Dec 2020 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han
This paper presents a comprehensive study to efficiently build named entity recognition (NER) systems when a small number of in-domain labeled data is available.
1 code implementation • 26 Oct 2020 • Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han
Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.
2 code implementations • 24 Oct 2020 • Yuning Mao, Xiang Ren, Heng Ji, Jiawei Han
Despite significant progress, state-of-the-art abstractive summarization methods are still prone to hallucinate content inconsistent with the source document.
no code implementations • NAACL 2021 • Xiaotao Gu, Liyuan Liu, Hongkun Yu, Jing Li, Chen Chen, Jiawei Han
Due to the excessive cost of large-scale language model pre-training, considerable efforts have been made to train BERT progressively -- start from an inferior but low-cost model and gradually grow the model to increase the computational complexity.
no code implementations • 23 Oct 2020 • Di Jin, Xiangchen Song, Zhizhi Yu, Ziyang Liu, Heling Zhang, Zhaomeng Cheng, Jiawei Han
We propose BiTe-GCN, a novel GCN architecture with bidirectional convolution of both topology and features on text-rich networks to solve these limitations.
no code implementations • ICLR 2021 • Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen
To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks.
2 code implementations • EMNLP 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han
In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents.
1 code implementation • EMNLP 2020 • Jiaxin Huang, Yu Meng, Fang Guo, Heng Ji, Jiawei Han
Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner.
Aspect-Based Sentiment Analysis
Aspect-Based Sentiment Analysis (ABSA)
+2
1 code implementation • 13 Oct 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han
Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search.
1 code implementation • 5 Oct 2020 • Wanzheng Zhu, Chao Zhang, Shuochao Yao, Xiaobin Gao, Jiawei Han
We propose SHMM, a multi-modal spherical hidden Markov model for semantics-rich human mobility modeling.
1 code implementation • EMNLP 2020 • Jiaming Shen, Heng Ji, Jiawei Han
Linguistic steganography studies how to hide secret messages in natural language cover texts.
1 code implementation • EMNLP 2020 • Yuning Mao, Yanru Qu, Yiqing Xie, Xiang Ren, Jiawei Han
Additionally, the explicit redundancy measure in MMR helps the neural representation of the summary to better capture redundancy.
no code implementations • EMNLP 2020 • Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han
To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing.
1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.
Ranked #9 on
Passage Retrieval
on Natural Questions
1 code implementation • NeurIPS 2021 • Qi Zhu, Carl Yang, Yidan Xu, Haonan Wang, Chao Zhang, Jiawei Han
Graph neural networks (GNNs) have achieved superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs.
1 code implementation • 18 Jul 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora.
Ranked #1 on
Topic Models
on Arxiv HEP-TH citation graph
no code implementations • 6 Jul 2020 • Di Jin, Zhizhi Yu, Dongxiao He, Carl Yang, Philip S. Yu, Jiawei Han
Graph neural networks for HIN embeddings typically adopt a hierarchical attention (including node-level and meta-path-level attentions) to capture the information from meta-path-based neighbors.
no code implementations • NAACL 2021 • Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Yi R. Fung, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, Jasmine Rah, David Liem, Ahmed Elsayed, Martha Palmer, Clare Voss, Cynthia Schneider, Boyan Onyshkevych
To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions.
no code implementations • ACL 2020 • Xuan Wang, Yingjun Guan, Weili Liu, Aabhas Chauhan, Enyi Jiang, Qi Li, David Liem, Dibakar Sigdel, John Caufield, Peipei Ping, Jiawei Han
EVIDENCEMINER is constructed in a completely automated way without any human effort for training data annotation.
no code implementations • 24 Jun 2020 • Xin Luna Dong, Xiang He, Andrey Kan, Xi-An Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, Jiawei Han
Can one build a knowledge graph (KG) for all products in the world?
no code implementations • 18 Jun 2020 • Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han
We propose to distantly train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure as well as the query-item-taxonomy interactions for term attachment.
1 code implementation • 7 Jun 2020 • Chanyoung Park, Carl Yang, Qi Zhu, Donghyun Kim, Hwanjo Yu, Jiawei Han
To capture the multiple aspects of each node, existing studies mainly rely on offline graph clustering performed prior to the actual embedding, which results in the cluster membership of each node (i. e., node aspect distribution) fixed throughout training of the embedding model.
no code implementations • NAACL 2021 • Jinfeng Xiao, Lidan Wang, Franck Dernoncourt, Trung Bui, Tong Sun, Jiawei Han
Our reader-retriever first uses an offline reader to read the corpus and generate collections of all answerable questions associated with their answers, and then uses an online retriever to respond to user queries by searching the pre-constructed question spaces for answers that are most likely to be asked in the given way.
no code implementations • 1 May 2020 • Shi Zhi, Liyuan Liu, Yu Zhang, Shiyin Wang, Qi Li, Chao Zhang, Jiawei Han
While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.
1 code implementation • 1 May 2020 • Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang, Jiawei Han
Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity.
1 code implementation • ACL 2020 • Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
no code implementations • 27 Apr 2020 • Xuan Wang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Jiawei Han
We created this EVIDENCEMINER system for automatic textual evidence mining in COVID-19 literature.
2 code implementations • EMNLP 2020 • Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han
Transformers have proved effective in many NLP tasks.
Ranked #5 on
Machine Translation
on WMT2014 English-French
1 code implementation • 1 Apr 2020 • Carl Yang, Yuxin Xiao, Yu Zhang, Yizhou Sun, Jiawei Han
Since there has already been a broad body of HNE algorithms, as the first contribution of this work, we provide a generic paradigm for the systematic categorization and analysis over the merits of various existing HNE algorithms.
no code implementations • 27 Mar 2020 • Xuan Wang, Xiangchen Song, Bangzheng Li, Yingjun Guan, Jiawei Han
We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13).
1 code implementation • 27 Jan 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han
Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.
2 code implementations • 26 Jan 2020 • Xiaotao Gu, Yuning Mao, Jiawei Han, Jialu Liu, Hongkun Yu, You Wu, Cong Yu, Daniel Finnie, Jiaqi Zhai, Nicholas Zukoski
In this work, we study the problem of generating representative headlines for news stories.
3 code implementations • 26 Jan 2020 • Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, Jiawei Han
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
no code implementations • 18 Jan 2020 • Carl Yang, Mengxiong Liu, Frank He, Jian Peng, Jiawei Han
With extensive experiments of two classic network mining tasks on different real-world large datasets, we show that our proposed cube2net pipeline is general, and much more effective and efficient in query-specific network construction, compared with other methods without the leverage of data cube or reinforcement learning.
2 code implementations • 1 Jan 2020 • Aravind Sankar, Xinyang Zhang, Adit Krishnan, Jiawei Han
Recent years have witnessed tremendous interest in understanding and predicting information spread on social media platforms such as Twitter, Facebook, etc.
2 code implementations • 15 Nov 2019 • Chanyoung Park, Donghyun Kim, Jiawei Han, Hwanjo Yu
Even for those that consider the multiplexity of a network, they overlook node attributes, resort to node labels for training, and fail to model the global properties of a graph.
no code implementations • 14 Nov 2019 • Hyungsul Kim, Ahmed El-Kishky, Xiang Ren, Jiawei Han
This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections.
no code implementations • 4 Nov 2019 • Carl Yang, Jieyu Zhang, Haonan Wang, Sha Li, Myungwan Kim, Matt Walker, Yiou Xiao, Jiawei Han
While node semantics have been extensively explored in social networks, little research attention has been paid to profile edge semantics, i. e., social relations.
1 code implementation • NeurIPS 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han
While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding.
no code implementations • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, Jiawei Han
Taxonomies are of great value to many knowledge-rich applications.
1 code implementation • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han
In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features.
2 code implementations • 16 Oct 2019 • Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, Jiawei Han
With the massive number of repositories available, there is a pressing need for topic-based search.
2 code implementations • 10 Oct 2019 • Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han
In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.
no code implementations • 29 Sep 2019 • Carl Yang, Yichen Feng, Pan Li, Yu Shi, Jiawei Han
In this work, we propose to study the utility of different meta-graphs, as well as how to simultaneously leverage multiple meta-graphs for HIN embedding in an unsupervised manner.
no code implementations • 29 Sep 2019 • Carl Yang, Lingrui Gan, Zongyi Wang, Jiaming Shen, Jinfeng Xiao, Jiawei Han
Given a query, unlike traditional IR that finds relevant documents or entities, in this work, we focus on retrieving both entities and their connections for insightful knowledge summarization.
no code implementations • 29 Sep 2019 • Carl Yang, Do Huy Hoang, Tomas Mikolov, Jiawei Han
Thanks to the advancing mobile location services, people nowadays can post about places to share visiting experience on-the-go.
no code implementations • 29 Sep 2019 • Carl Yang, Xiaolin Shi, Jie Luo, Jiawei Han
Then we design a novel deep learning pipeline based on LSTM and attention to accurately predict user churn with very limited initial behavior data, by leveraging the correlations among users' multi-dimensional activities and the underlying user types.
1 code implementation • 29 Sep 2019 • Carl Yang, Jieyu Zhang, Jiawei Han
While generalizing LP as a simple instance, NEP is far more powerful in its natural awareness of different types of objects and links, and the ability to automatically capture their important interaction patterns.
1 code implementation • 4 Sep 2019 • Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han
Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity.
1 code implementation • IJCNLP 2019 • Zihan Wang, Jingbo Shang, Liyuan Liu, Lihao Lu, Jiacheng Liu, Jiawei Han
Therefore, we manually correct these label mistakes and form a cleaner test set.
Ranked #6 on
Named Entity Recognition (NER)
on CoNLL++
(using extra training data)
1 code implementation • IJCNLP 2019 • Yuning Mao, Jingjing Tian, Jiawei Han, Xiang Ren
While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference.
Ranked #1 on
Text Classification
on RCV1
(Macro F1 metric)
1 code implementation • ACL 2020 • Yuning Mao, Liyuan Liu, Qi Zhu, Xiang Ren, Jiawei Han
In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries.
1 code implementation • 20 Aug 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han
We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora.
no code implementations • 18 Aug 2019 • Ahmed El-Kishky, Frank Xu, Aston Zhang, Jiawei Han
However, in many languages and specialized corpora, words are composed by concatenating semantically meaningful subword structures.
1 code implementation • 14 Aug 2019 • Liyuan Liu, Zihan Wang, Jingbo Shang, Dandong Yin, Heng Ji, Xiang Ren, Shaowen Wang, Jiawei Han
Our model neither requires the conversion from character sequences to word sequences, nor assumes tokenizer can correctly detect all word boundaries.
21 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.