Search Results for author: Jiaming Shen

Found 45 papers, 25 papers with code

Phrase-aware Unsupervised Constituency Parsing

no code implementations ACL 2022 Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han

Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.

Constituency Parsing Language Modelling +1

Bridging the Gap: Fine-to-Coarse Sketch Interpolation Network for High-Quality Animation Sketch Inbetweening

no code implementations25 Aug 2023 Jiaming Shen, Kun Hu, Wei Bao, Chang Wen Chen, Zhiyong Wang

The 2D animation workflow is typically initiated with the creation of keyframes using sketch-based drawing.

Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

no code implementations30 Jun 2023 Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky

On TREC-DL2019, PRP is only inferior to the GPT-4 solution on the NDCG@5 and NDCG@10 metrics, while outperforming other existing solutions, such as InstructGPT which has 175B parameters, by over 10% for nearly all ranking metrics.

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

2 code implementations28 Jun 2023 Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang

Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter.

Language Modelling Large Language Model

Local Boosting for Weakly-Supervised Learning

no code implementations5 Jun 2023 Rongzhi Zhang, Yue Yu, Jiaming Shen, Xiquan Cui, Chao Zhang

In this work, we show that the standard implementation of the convex combination of base learners can hardly work due to the presence of noisy labels.

Weakly-supervised Learning

Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation

no code implementations8 May 2023 Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Jialu Liu, Michael Bendersky, Marc Najork, Chao Zhang

In this work, we argue that such a learning objective is sub-optimal because there exists a discrepancy between the teacher's output distribution and the ground truth label distribution.

Knowledge Distillation

HiPrompt: Few-Shot Biomedical Knowledge Fusion via Hierarchy-Oriented Prompting

no code implementations12 Apr 2023 Jiaying Lu, Jiaming Shen, Bo Xiong, Wenjing Ma, Steffen Staab, Carl Yang

Medical decision-making processes can be enhanced by comprehensive biomedical knowledge bases, which require fusing knowledge graphs constructed from different sources via a uniform index system.

Decision Making Knowledge Graphs

"Why is this misleading?": Detecting News Headline Hallucinations with Explanations

no code implementations12 Feb 2023 Jiaming Shen, Jialu Liu, Dan Finnie, Negar Rahmati, Michael Bendersky, Marc Najork

With the growing need for news headline generation, we argue that the hallucination issue, namely the generated headlines being not supported by the original news stories, is a critical challenge for the deployment of this feature in web-scale systems Meanwhile, due to the infrequency of hallucination cases and the requirement of careful reading for raters to reach the correct consensus, it is difficult to acquire a large dataset for training a model to detect such hallucinations through human curation.

Headline Generation Natural Language Inference

Towards Disentangling Relevance and Bias in Unbiased Learning to Rank

no code implementations28 Dec 2022 Yunan Zhang, Le Yan, Zhen Qin, Honglei Zhuang, Jiaming Shen, Xuanhui Wang, Michael Bendersky, Marc Najork

We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation.


Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

1 code implementation20 Dec 2022 Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun

Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs).

Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation

no code implementations18 Oct 2022 Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu, Jiawei Han

Topic taxonomies display hierarchical topic structures of a text corpus and provide topical knowledge to enhance various NLP applications.

Taxonomy Expansion

OmniNeRF: Hybriding Omnidirectional Distance and Radiance fields for Neural Surface Reconstruction

no code implementations27 Sep 2022 Jiaming Shen, Bolin Song, Zirui Wu, Yi Xu

3D reconstruction from images has wide applications in Virtual Reality and Automatic Driving, where the precision requirement is very high.

3D Reconstruction 3D Scene Reconstruction +2

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

1 code implementation15 Sep 2022 Yue Yu, Rongzhi Zhang, ran Xu, Jieyu Zhang, Jiaming Shen, Chao Zhang

Large Language Models have demonstrated remarkable few-shot performance, but the performance can be sensitive to the selection of few-shot instances.

Language Modelling Text Classification

Unsupervised Key Event Detection from Massive Text Corpora

1 code implementation8 Jun 2022 Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han

Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge.

Event Detection

TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters

no code implementations18 Jan 2022 Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu

Topic taxonomies, which represent the latent topic (or category) structure of document collections, provide valuable knowledge of contents in many applications such as web search and information filtering.

Clustering Topic coverage

Corpus-based Open-Domain Event Type Induction

1 code implementation EMNLP 2021 Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han

As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs.

Event Extraction Vocal Bursts Type Prediction

Eider: Empowering Document-level Relation Extraction with Efficient Evidence Extraction and Inference-stage Fusion

1 code implementation Findings (ACL) 2022 Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han

Typical DocRE methods blindly take the full document as input, while a subset of the sentences in the document, noted as the evidence, are often sufficient for humans to predict the relation of an entity pair.

Document-level Relation Extraction

Who Should Go First? A Self-Supervised Concept Sorting Model for Improving Taxonomy Expansion

no code implementations8 Apr 2021 Xiangchen Song, Jiaming Shen, Jieyu Zhang, Jiawei Han

Taxonomies have been widely used in various machine learning and text mining systems to organize knowledge and facilitate downstream tasks.

Taxonomy Expansion

Taxonomy Completion via Triplet Matching Network

1 code implementation6 Jan 2021 Jieyu Zhang, Xiangchen Song, Ying Zeng, Jiaze Chen, Jiaming Shen, Yuning Mao, Lei LI

Previous approaches focus on the taxonomy expansion, i. e. finding an appropriate hypernym concept from the taxonomy for a new query concept.

Taxonomy Expansion

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

no code implementations EMNLP 2020 Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han

To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing.

STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

1 code implementation18 Jun 2020 Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang

We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion.

Taxonomy Expansion

Empower Entity Set Expansion via Language Model Probing

1 code implementation ACL 2020 Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han

Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.

Language Modelling Question Answering

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

1 code implementation27 Jan 2020 Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han

Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.

SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

1 code implementation17 Oct 2019 Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han

In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features.

feature selection Question Answering

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams

1 code implementation10 Oct 2019 Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han

In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.

Clustering Language Modelling

Query-Specific Knowledge Summarization with Entity Evolutionary Networks

no code implementations29 Sep 2019 Carl Yang, Lingrui Gan, Zongyi Wang, Jiaming Shen, Jinfeng Xiao, Jiawei Han

Given a query, unlike traditional IR that finds relevant documents or entities, in this work, we focus on retrieving both entities and their connections for insightful knowledge summarization.

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

1 code implementation4 Sep 2019 Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han

Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity.

Knowledge Graphs

Eliciting Knowledge from Experts:Automatic Transcript Parsing for Cognitive Task Analysis

2 code implementations26 Jun 2019 Junyi Du, He Jiang, Jiaming Shen, Xiang Ren

To reduce human efforts and scale the process, automated CTA transcript parsing is desirable.

Relation Extraction

Weakly-Supervised Hierarchical Text Classification

1 code implementation29 Dec 2018 Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism.

Blocking Feature Engineering +3

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

2 code implementations22 Dec 2018 Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, Jiawei Han

Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion.


Mining Entity Synonyms with Efficient Neural Set Generation

1 code implementation16 Nov 2018 Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, Jiawei Han

Mining entity synonym sets (i. e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications.

Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering

no code implementations15 Sep 2018 Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler

In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models.

Clustering Multi-Task Learning +1

Weakly-Supervised Neural Text Classification

1 code implementation2 Sep 2018 Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han

Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types.

Feature Engineering General Classification +2

Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach

1 code implementation29 Apr 2018 Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, Jiawei Han

Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types.

Model Selection

Investigating Rumor News Using Agreement-Aware Search

1 code implementation21 Feb 2018 Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han

We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences.

Text Network Exploration via Heterogeneous Web of Topics

no code implementations2 Oct 2016 Junxian He, Ying Huang, Changfeng Liu, Jiaming Shen, Yuting Jia, Xinbing Wang

A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges.

Cannot find the paper you are looking for? You can Submit a new open access paper.