1 code implementation • 10 Apr 2024 • Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, Jiawei Han
Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
no code implementations • 17 Feb 2024 • Sizhe Zhou, Yu Meng, Bowen Jin, Jiawei Han
(2) We fine-tune a bidirectional Small Language Model (SLM) using these initial seeds to learn the relations for the target domain.
1 code implementation • 27 Nov 2023 • Susik Yoon, Yu Meng, Dongha Lee, Jiawei Han
With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories.
no code implementations • 25 Oct 2023 • Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, Jingbo Chen, Yu Meng, Junxian Ma
More accurate extraction of invisible building footprints from very-high-resolution (VHR) aerial images relies on roof segmentation and roof-to-footprint offset extraction.
1 code implementation • 11 Oct 2023 • Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen
As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.
no code implementations • 10 Oct 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, Jiawei Han
Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings.
no code implementations • 28 Sep 2023 • Yu Meng, Szabolcs Horvát, Carl D. Modes, Pierre A. Haas
Here, we therefore develop a different approach, of exhaustive analysis of small ecological communities, to show that this arrangement of interactions can influence stability of coexistence more than these general trends.
1 code implementation • NeurIPS 2023 • Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks.
1 code implementation • 24 Jun 2023 • Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han
Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).
1 code implementation • 23 May 2023 • Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.
no code implementations • 20 May 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, Jiawei Han
A real-world text corpus sometimes comprises not only text documents but also semantic links between them (e. g., academic papers in a bibliographic network are linked by citations and co-authorships).
1 code implementation • 18 May 2023 • Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang
With the development of large language models (LLMs), zero-shot learning has attracted much attention for various NLP tasks.
Ranked #1 on Zero-Shot Text Classification on AG News
1 code implementation • 21 Feb 2023 • Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han
Edges in many real-world social/information networks are associated with rich text information (e. g., user-user communications or user-product reviews).
1 code implementation • 7 Feb 2023 • Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, Jiawei Han
Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature.
1 code implementation • 4 Feb 2023 • Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer
In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.
1 code implementation • 12 Dec 2022 • Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han
Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.
1 code implementation • 6 Nov 2022 • Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han
In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set.
1 code implementation • 28 Jun 2022 • Jiaxin Huang, Yu Meng, Jiawei Han
We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type.
1 code implementation • NAACL 2022 • Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han
Discovering latent topics from text corpora has been studied for decades.
1 code implementation • ICLR 2022 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song
We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.
1 code implementation • 9 Feb 2022 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han
Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models.
1 code implementation • 9 Feb 2022 • Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e. g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e. g., BERT) have been the prominent choice for natural language understanding (NLU) tasks.
Ranked #5 on Zero-Shot Text Classification on AG News
no code implementations • 13 Dec 2021 • Tong Su, Yu Meng, Yan Xu
As a core technology of the autonomous driving system, pedestrian trajectory prediction can significantly enhance the function of active vehicle safety and reduce road traffic injuries.
1 code implementation • 7 Nov 2021 • Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided.
no code implementations • 17 Oct 2021 • Suyu Ge, Jiaxin Huang, Yu Meng, Sharon Wang, Jiawei Han
Opinion summarization aims to profile a target by extracting opinions from multiple documents.
1 code implementation • EMNLP 2021 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, Jiawei Han
We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base.
no code implementations • NAACL 2021 • Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy.
Multi Label Text Classification Multi-Label Text Classification +3
2 code implementations • 28 May 2021 • Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang
Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.
Ranked #1 on Phrase Tagging on KPTimes
2 code implementations • NeurIPS 2021 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song
The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.
1 code implementation • 26 Oct 2020 • Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han
Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.
2 code implementations • EMNLP 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han
In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents.
1 code implementation • EMNLP 2020 • Jiaxin Huang, Yu Meng, Fang Guo, Heng Ji, Jiawei Han
Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2
1 code implementation • 13 Oct 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han
Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search.
1 code implementation • 18 Jul 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han
Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora.
Ranked #1 on Topic Models on Arxiv HEP-TH citation graph
1 code implementation • 1 May 2020 • Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang, Jiawei Han
Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity.
1 code implementation • 27 Jan 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han
Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.
no code implementations • 21 Nov 2019 • Yu Meng, Maryam Karimzadehgan, Honglei Zhuang, Donald Metzler
In personal email search, user queries often impose different requirements on different aspects of the retrieved emails.
1 code implementation • NeurIPS 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han
While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding.
2 code implementations • 16 Oct 2019 • Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, Jiawei Han
With the massive number of repositories available, there is a pressing need for topic-based search.
no code implementations • 17 Sep 2019 • Xiaoyu Yu, Yuwei Wang, Jie Miao, Ephrem Wu, Heng Zhang, Yu Meng, Bo Zhang, Biao Min, Dewei Chen, Jianlin Gao
Intensive computation is entering data centers with multiple workloads of deep learning.
1 code implementation • 20 Aug 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han
We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora.
no code implementations • 15 Mar 2019 • Kevin Meng, Yu Meng
Overcoming the visual barrier and developing "see-through vision" has been one of mankind's long-standing dreams.
1 code implementation • 29 Dec 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism.
1 code implementation • 2 Sep 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types.
no code implementations • 28 Aug 2018 • Shang Yu, F. Albarran-Arriagada, J. C. Retamal, Yi-Tao Wang, Wei Liu, Zhi-Jin Ke, Yu Meng, Zhi-Peng Li, Jian-Shun Tang, E. Solano, L. Lamata, Chuan-Feng Li, Guang-Can Guo
An experiment is performed to reconstruct an unknown photonic quantum state with a limited amount of copies.
Quantum Physics