no code implementations • 25 Oct 2024 • SeongKu Kang, Yunyi Zhang, Pengcheng Jiang, Dongha Lee, Jiawei Han, Hwanjo Yu
Academic paper search is an essential task for efficient literature discovery and scientific advancement.
no code implementations • 11 Oct 2024 • Luyu Gao, Yunyi Zhang, Jamie Callan
The short-context LMs are further tuned using these self-generated data to obtain task-specific long-context capabilities.
1 code implementation • 9 Aug 2024 • Priyanka Kargupta, Yunyi Zhang, Yizhu Jiao, Siru Ouyang, Jiawei Han
Episodic structures are inherently interpretable and adaptable to evolving large-scale key events.
1 code implementation • 14 May 2024 • Chenghao Zhu, Nuo Chen, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang
The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing.
no code implementations • 29 Feb 2024 • Yunyi Zhang, Ruozhen Yang, Xueqiang Xu, Rui Li, Jinfeng Xiao, Jiaming Shen, Jiawei Han
On the other hand, previous weakly-supervised hierarchical text classification methods only utilize the raw taxonomy skeleton and ignore the rich information hidden in the text corpus that can serve as additional class-indicative features.
1 code implementation • 20 Feb 2024 • Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han
To be specific, we identify two common skills needed for entity set expansion, taxonomy expansion, and seed-guided taxonomy construction: finding "siblings" and finding "parents".
1 code implementation • 23 Jan 2024 • Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, Jiawei Han
In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i. e., those without seed entities).
no code implementations • 11 Oct 2023 • Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, Jiawei Han
In this study, we propose OnEFET, where we (1) enrich each node in the ontology structure with two types of extra information: instance information for training sample augmentation and topic information to relate types to contexts, and (2) develop a coarse-to-fine typing algorithm that exploits the enriched information by training an entailment model with contrasting topics and instance-based augmented training samples.
1 code implementation • 24 Jun 2023 • Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).
1 code implementation • 23 May 2023 • Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.
1 code implementation • 8 Apr 2023 • Susik Yoon, Dongha Lee, Yunyi Zhang, Jiawei Han
Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations.
1 code implementation • 12 Dec 2022 • Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.
no code implementations • 5 Dec 2022 • Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han
Given a few seed entities of a certain type (e. g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.
1 code implementation • 8 Jun 2022 • Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han
Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge.
1 code implementation • 9 Feb 2022 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han
Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models.
1 code implementation • EMNLP 2021 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, Jiawei Han
We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base.
1 code implementation • EMNLP 2021 • Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han
As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs.
6 code implementations • ACL (RepL4NLP) 2021 • Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan
Contrastive learning has been applied successfully to learn vector representations of text.
2 code implementations • EMNLP 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han
In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents.
1 code implementation • 13 Oct 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han
Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search.
no code implementations • 17 Sep 2020 • Yunyi Zhang, Dimitris N. Politis
Extensive numerical simulations further show that the debiased and thresholded ridge regression has favorable finite sample performance and may be preferable in some settings.
1 code implementation • 18 Jul 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora.
Ranked #1 on
Topic Models
on Arxiv HEP-TH citation graph
1 code implementation • ACL 2020 • Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
1 code implementation • 27 Jan 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han
Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.