Search Results for author: Yu Meng

Found 45 papers, 33 papers with code

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

1 code implementation • 10 Apr 2024 • Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, Jiawei Han

Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.

Paper
Code

Grasping the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction

no code implementations • 17 Feb 2024 • Sizhe Zhou, Yu Meng, Bowen Jin, Jiawei Han

(2) We fine-tune a bidirectional Small Language Model (SLM) using these initial seeds to learn the relations for the target domain.

Few-Shot Learning Language Modelling +3

Paper
Add Code

SCStory: Self-supervised and Continual Online Story Discovery

1 code implementation • 27 Nov 2023 • Susik Yoon, Yu Meng, Dongha Lee, Jiawei Han

With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories.

Continual Learning Contrastive Learning +1

Paper
Code

Prompt-Driven Building Footprint Extraction in Aerial Images with Offset-Building Model

no code implementations • 25 Oct 2023 • Kai Li, Yupeng Deng, Yunlong Kong, Diyou Liu, Jingbo Chen, Yu Meng, Junxian Ma

More accurate extraction of invisible building footprints from very-high-resolution (VHR) aerial images relies on roof segmentation and roof-to-footprint offset extraction.

Instance Segmentation Region Proposal +1

Paper
Add Code

Evaluating Large Language Models at Evaluating Instruction Following

1 code implementation • 11 Oct 2023 • Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.

Instruction Following

Paper
Code

Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder

no code implementations • 10 Oct 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, Jiawei Han

Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings.

Representation Learning

Paper
Add Code

Impossible ecologies: Interaction networks and stability of coexistence in ecological communities

no code implementations • 28 Sep 2023 • Yu Meng, Szabolcs Horvát, Carl D. Modes, Pierre A. Haas

Here, we therefore develop a different approach, of exhaustive analysis of small ecological communities, to show that this arrangement of interactions can influence stability of coexistence more than these general trends.

Paper
Add Code

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

1 code implementation • NeurIPS 2023 • Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks.

Attribute Language Modelling +1

116

Paper
Code

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

1 code implementation • 24 Jun 2023 • Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han

Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).

Multi-Label Classification

Paper
Code

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

1 code implementation • 23 May 2023 • Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.

Pseudo Label Sentiment Analysis +3

Paper
Code

Patton: Language Model Pretraining on Text-Rich Networks

no code implementations • 20 May 2023 • Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, Jiawei Han

A real-world text corpus sometimes comprises not only text documents but also semantic links between them (e. g., academic papers in a bibliographic network are linked by citations and co-authorships).

Language Modelling Masked Language Modeling +1

Paper
Add Code

ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

1 code implementation • 18 May 2023 • Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang

With the development of large language models (LLMs), zero-shot learning has attracted much attention for various NLP tasks.

Ranked #1 on Zero-Shot Text Classification on AG News

Descriptive Retrieval +6

Paper
Code

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks

1 code implementation • 21 Feb 2023 • Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han

Edges in many real-world social/information networks are associated with rich text information (e. g., user-user communications or user-product reviews).

Edge Classification Link Prediction +1

Paper
Code

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

1 code implementation • 7 Feb 2023 • Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, Jiawei Han

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature.

Language Modelling Multi Label Text Classification +3

Paper
Code

Representation Deficiency in Masked Language Modeling

1 code implementation • 4 Feb 2023 • Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

In this work, we offer a new perspective on the consequence of such a discrepancy: We demonstrate empirically and theoretically that MLM pretraining allocates some model dimensions exclusively for representing $\texttt{[MASK]}$ tokens, resulting in a representation deficiency for real tokens and limiting the pretrained model's expressiveness when it is adapted to downstream data without $\texttt{[MASK]}$ tokens.

Language Modelling Masked Language Modeling

Paper
Code

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

1 code implementation • 12 Dec 2022 • Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.

Language Modelling Word Embeddings

Paper
Code

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

1 code implementation • 6 Nov 2022 • Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han

In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set.

Few-Shot Learning

Paper
Code

Few-Shot Fine-Grained Entity Typing with Automatic Label Interpretation and Instance Generation

1 code implementation • 28 Jun 2022 • Jiaxin Huang, Yu Meng, Jiawei Han

We study the problem of few-shot Fine-grained Entity Typing (FET), where only a few annotated entity mentions with contexts are given for each entity type.

Entity Typing Language Modelling +1

Paper
Code

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

1 code implementation • NAACL 2022 • Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han

Discovering latent topics from text corpora has been studied for decades.

General Knowledge Topic Models

Paper
Code

Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

1 code implementation • ICLR 2022 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators.

Paper
Code

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

1 code implementation • 9 Feb 2022 • Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e. g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e. g., BERT) have been the prominent choice for natural language understanding (NLU) tasks.

Ranked #5 on Zero-Shot Text Classification on AG News

Few-Shot Learning MNLI-m +5

Paper
Code

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

1 code implementation • 9 Feb 2022 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Jiawei Han

Interestingly, there have not been standard approaches to deploy PLMs for topic discovery as better alternatives to topic models.

Clustering Language Modelling +1

Paper
Code

Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network

no code implementations • 13 Dec 2021 • Tong Su, Yu Meng, Yan Xu

As a core technology of the autonomous driving system, pedestrian trajectory prediction can significantly enhance the function of active vehicle safety and reduce road traffic injuries.

Autonomous Driving Pedestrian Trajectory Prediction +1

Paper
Add Code

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

1 code implementation • 7 Nov 2021 • Yu Zhang, Shweta Garg, Yu Meng, Xiusi Chen, Jiawei Han

We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided.

text-classification Text Classification

Paper
Code

Fine-Grained Opinion Summarization with Minimal Supervision

no code implementations • 17 Oct 2021 • Suyu Ge, Jiaxin Huang, Yu Meng, Sharon Wang, Jiawei Han

Opinion summarization aims to profile a target by extracting opinions from multiple documents.

Fine-Grained Opinion Analysis Opinion Summarization

Paper
Add Code

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

1 code implementation • EMNLP 2021 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Xuan Wang, Yu Zhang, Heng Ji, Jiawei Han

We study the problem of training named entity recognition (NER) models using only distantly-labeled data, which can be automatically obtained by matching entity mentions in the raw text with entity types in a knowledge base.

Language Modelling named-entity-recognition +2

Paper
Code

TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names

no code implementations • NAACL 2021 • Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han

Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy.

Multi Label Text Classification Multi-Label Text Classification +3

Paper
Add Code

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

2 code implementations • 28 May 2021 • Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang

Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names.

Ranked #1 on Phrase Tagging on KPTimes

Keyphrase Extraction Language Modelling +3

165

Paper
Code

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

2 code implementations • NeurIPS 2021 • Yu Meng, Chenyan Xiong, Payal Bajaj, Saurabh Tiwary, Paul Bennett, Jiawei Han, Xia Song

The first token-level task, Corrective Language Modeling, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics.

Contrastive Learning Language Modelling +1

120

Paper
Code

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

1 code implementation • 26 Oct 2020 • Yu Zhang, Xiusi Chen, Yu Meng, Jiawei Han

Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.

Data Augmentation Document Classification +1

Paper
Code

Text Classification Using Label Names Only: A Language Model Self-Training Approach

2 code implementations • EMNLP 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Chenyan Xiong, Heng Ji, Chao Zhang, Jiawei Han

In this paper, we explore the potential of only using the label name of each class to train classification models on unlabeled data, without using any labeled documents.

Document Classification General Classification +6

293

Paper
Code

Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding

1 code implementation • EMNLP 2020 • Jiaxin Huang, Yu Meng, Fang Guo, Heng Ji, Jiawei Han

Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Paper
Code

CoRel: Seed-Guided Topical Taxonomy Construction by Concept Learning and Relation Transferring

1 code implementation • 13 Oct 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Yunyi Zhang, Jiawei Han

Taxonomy is not only a fundamental form of knowledge representation, but also crucial to vast knowledge-rich applications, such as question answering and web search.

Question Answering Relation

Paper
Code

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

1 code implementation • 18 Jul 2020 • Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, Jiawei Han

Mining a set of meaningful topics organized into a hierarchy is intuitively appealing since topic correlations are ubiquitous in massive text corpora.

Ranked #1 on Topic Models on Arxiv HEP-TH citation graph

text-classification Topic Models

Paper
Code

Minimally Supervised Categorization of Text with Metadata

1 code implementation • 1 May 2020 • Yu Zhang, Yu Meng, Jiaxin Huang, Frank F. Xu, Xuan Wang, Jiawei Han

Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity.

Document Classification

Paper
Code

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

1 code implementation • 27 Jan 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han

Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.

Paper
Code

Separate and Attend in Personal Email Search

no code implementations • 21 Nov 2019 • Yu Meng, Maryam Karimzadehgan, Honglei Zhuang, Donald Metzler

In personal email search, user queries often impose different requirements on different aspects of the retrieved emails.

Learning-To-Rank

Paper
Add Code

Spherical Text Embedding

1 code implementation • NeurIPS 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, Jiawei Han

While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding.

Clustering Riemannian optimization +1

175

Paper
Code

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

2 code implementations • 16 Oct 2019 • Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, Jiawei Han

With the massive number of repositories available, there is a pressing need for topic-based search.

Classification General Classification +1

Paper
Code

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

no code implementations • 17 Sep 2019 • Xiaoyu Yu, Yuwei Wang, Jie Miao, Ephrem Wu, Heng Zhang, Yu Meng, Bo Zhang, Biao Min, Dewei Chen, Jianlin Gao

Intensive computation is entering data centers with multiple workloads of deep learning.

Paper
Add Code

Discriminative Topic Mining via Category-Name Guided Text Embedding

1 code implementation • 20 Aug 2019 • Yu Meng, Jiaxin Huang, Guangyuan Wang, Zihan Wang, Chao Zhang, Yu Zhang, Jiawei Han

We propose a new task, discriminative topic mining, which leverages a set of user-provided category names to mine discriminative topics from text corpora.

Document Classification General Classification +3

Paper
Code

Through-Wall Pose Imaging in Real-Time with a Many-to-Many Encoder/Decoder Paradigm

no code implementations • 15 Mar 2019 • Kevin Meng, Yu Meng

Overcoming the visual barrier and developing "see-through vision" has been one of mankind's long-standing dreams.

Region Proposal

Paper
Add Code

Weakly-Supervised Hierarchical Text Classification

1 code implementation • 29 Dec 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism.

Blocking Feature Engineering +3

Paper
Code

Weakly-Supervised Neural Text Classification

1 code implementation • 2 Sep 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types.

Feature Engineering General Classification +2

Paper
Code

Reconstruction of a Photonic Qubit State with Reinforcement Learning

no code implementations • 28 Aug 2018 • Shang Yu, F. Albarran-Arriagada, J. C. Retamal, Yi-Tao Wang, Wei Liu, Zhi-Jin Ke, Yu Meng, Zhi-Peng Li, Jian-Shun Tang, E. Solano, L. Lamata, Chuan-Feng Li, Guang-Can Guo

An experiment is performed to reconstruct an unknown photonic quantum state with a limited amount of copies.

Quantum Physics

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.