no code implementations • ACL 2022 • Xiaotao Gu, Yikang Shen, Jiaming Shen, Jingbo Shang, Jiawei Han
Recent studies have achieved inspiring success in unsupervised grammar induction using masked language modeling (MLM) as the proxy task.
no code implementations • 25 Aug 2023 • Jiaming Shen, Kun Hu, Wei Bao, Chang Wen Chen, Zhiyong Wang
The 2D animation workflow is typically initiated with the creation of keyframes using sketch-based drawing.
no code implementations • 30 Jun 2023 • Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky
On TREC-DL2019, PRP is only inferior to the GPT-4 solution on the NDCG@5 and NDCG@10 metrics, while outperforming other existing solutions, such as InstructGPT which has 175B parameters, by over 10% for nearly all ranking metrics.
2 code implementations • 28 Jun 2023 • Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang
Additionally, we present a comprehensive empirical study on data generation encompassing vital aspects like bias, diversity, and efficiency, and highlight three key observations: firstly, synthetic datasets generated by simple prompts exhibit significant biases, such as regional bias; secondly, attribute diversity plays a pivotal role in enhancing model performance; lastly, attributed prompts achieve the performance of simple class-conditional prompts while utilizing only 5\% of the querying cost of ChatGPT associated with the latter.
no code implementations • 5 Jun 2023 • Rongzhi Zhang, Yue Yu, Jiaming Shen, Xiquan Cui, Chao Zhang
In this work, we show that the standard implementation of the convex combination of base learners can hardly work due to the presence of noisy labels.
1 code implementation • 18 May 2023 • Yue Yu, Yuchen Zhuang, Rongzhi Zhang, Yu Meng, Jiaming Shen, Chao Zhang
With the development of large language models (LLMs), zero-shot learning has attracted much attention for various NLP tasks.
Ranked #1 on
Zero-Shot Text Classification
on AG News
no code implementations • 8 May 2023 • Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Jialu Liu, Michael Bendersky, Marc Najork, Chao Zhang
In this work, we argue that such a learning objective is sub-optimal because there exists a discrepancy between the teacher's output distribution and the ground truth label distribution.
no code implementations • 12 Apr 2023 • Jiaying Lu, Jiaming Shen, Bo Xiong, Wenjing Ma, Steffen Staab, Carl Yang
Medical decision-making processes can be enhanced by comprehensive biomedical knowledge bases, which require fusing knowledge graphs constructed from different sources via a uniform index system.
no code implementations • 12 Feb 2023 • Jiaming Shen, Jialu Liu, Dan Finnie, Negar Rahmati, Michael Bendersky, Marc Najork
With the growing need for news headline generation, we argue that the hallucination issue, namely the generated headlines being not supported by the original news stories, is a critical challenge for the deployment of this feature in web-scale systems Meanwhile, due to the infrequency of hallucination cases and the requirement of careful reading for raters to reach the correct consensus, it is difficult to acquire a large dataset for training a model to detect such hallucinations through human curation.
no code implementations • 28 Dec 2022 • Yunan Zhang, Le Yan, Zhen Qin, Honglei Zhuang, Jiaming Shen, Xuanhui Wang, Michael Bendersky, Marc Najork
We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation.
1 code implementation • 20 Dec 2022 • Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs).
no code implementations • 18 Oct 2022 • Dongha Lee, Jiaming Shen, Seonghyeon Lee, Susik Yoon, Hwanjo Yu, Jiawei Han
Topic taxonomies display hierarchical topic structures of a text corpus and provide topical knowledge to enhance various NLP applications.
no code implementations • 27 Sep 2022 • Jiaming Shen, Bolin Song, Zirui Wu, Yi Xu
3D reconstruction from images has wide applications in Virtual Reality and Automatic Driving, where the precision requirement is very high.
1 code implementation • 15 Sep 2022 • Yue Yu, Rongzhi Zhang, ran Xu, Jieyu Zhang, Jiaming Shen, Chao Zhang
Large Language Models have demonstrated remarkable few-shot performance, but the performance can be sensitive to the selection of few-shot instances.
1 code implementation • 8 Jun 2022 • Yunyi Zhang, Fang Guo, Jiaming Shen, Jiawei Han
Automated event detection from news corpora is a crucial task towards mining fast-evolving structured knowledge.
no code implementations • 18 Jan 2022 • Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, Hwanjo Yu
Topic taxonomies, which represent the latent topic (or category) structure of document collections, provide valuable knowledge of contents in many applications such as web search and information filtering.
1 code implementation • EMNLP 2021 • Jiaming Shen, Yunyi Zhang, Heng Ji, Jiawei Han
As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of <predicate sense, object head> pairs.
1 code implementation • Findings (ACL) 2022 • Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han
Typical DocRE methods blindly take the full document as input, while a subset of the sentences in the document, noted as the evidence, are often sufficient for humans to predict the relation of an entity pair.
Ranked #5 on
Relation Extraction
on DocRED
no code implementations • NAACL 2021 • Jiaming Shen, Wenda Qiu, Yu Meng, Jingbo Shang, Xiang Ren, Jiawei Han
Hierarchical multi-label text classification (HMTC) aims to tag each document with a set of classes from a taxonomic class hierarchy.
Multi Label Text Classification
Multi-Label Text Classification
+3
no code implementations • Findings (ACL) 2021 • Jiaming Shen, Jialu Liu, Tianqi Liu, Cong Yu, Jiawei Han
In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
no code implementations • 8 Apr 2021 • Xiangchen Song, Jiaming Shen, Jieyu Zhang, Jiawei Han
Taxonomies have been widely used in various machine learning and text mining systems to organize knowledge and facilitate downstream tasks.
1 code implementation • 6 Jan 2021 • Jieyu Zhang, Xiangchen Song, Ying Zeng, Jiaze Chen, Jiaming Shen, Yuning Mao, Lei LI
Previous approaches focus on the taxonomy expansion, i. e. finding an appropriate hypernym concept from the taxonomy for a new query concept.
1 code implementation • EMNLP 2020 • Jiaming Shen, Heng Ji, Jiawei Han
Linguistic steganography studies how to hide secret messages in natural language cover texts.
no code implementations • EMNLP 2020 • Jiaming Shen, Wenda Qiu, Jingbo Shang, Michelle Vanni, Xiang Ren, Jiawei Han
To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing.
1 code implementation • 18 Jun 2020 • Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang
We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion.
1 code implementation • ACL 2020 • Yunyi Zhang, Jiaming Shen, Jingbo Shang, Jiawei Han
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
1 code implementation • 27 Jan 2020 • Jiaxin Huang, Yiqing Xie, Yu Meng, Jiaming Shen, Yunyi Zhang, Jiawei Han
Given a small set of seed entities (e. g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus.
3 code implementations • 26 Jan 2020 • Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, Jiawei Han
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
no code implementations • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, Jiawei Han
Taxonomies are of great value to many knowledge-rich applications.
1 code implementation • 17 Oct 2019 • Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han
In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features.
1 code implementation • 10 Oct 2019 • Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han
In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.
no code implementations • 29 Sep 2019 • Carl Yang, Lingrui Gan, Zongyi Wang, Jiaming Shen, Jinfeng Xiao, Jiawei Han
Given a query, unlike traditional IR that finds relevant documents or entities, in this work, we focus on retrieving both entities and their connections for insightful knowledge summarization.
1 code implementation • 4 Sep 2019 • Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han
Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity.
1 code implementation • ACL 2019 • Junyi Du, He Jiang, Jiaming Shen, Xiang Ren
To reduce human efforts and scale the process, automated CTA transcript parsing is desirable.
2 code implementations • 26 Jun 2019 • Junyi Du, He Jiang, Jiaming Shen, Xiang Ren
To reduce human efforts and scale the process, automated CTA transcript parsing is desirable.
1 code implementation • 29 Dec 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
During the training process, our model features a hierarchical neural structure, which mimics the given hierarchy and is capable of determining the proper levels for documents with a blocking mechanism.
2 code implementations • 22 Dec 2018 • Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, Jiawei Han
Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion.
Databases
1 code implementation • 16 Nov 2018 • Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, Jiawei Han
Mining entity synonym sets (i. e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications.
no code implementations • 15 Sep 2018 • Jiaming Shen, Maryam Karimzadehgan, Michael Bendersky, Zhen Qin, Donald Metzler
In this paper, we study how to obtain query type in an unsupervised fashion and how to incorporate this information into query-dependent ranking models.
1 code implementation • 2 Sep 2018 • Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
Although many semi-supervised and weakly-supervised text classification models exist, they cannot be easily applied to deep neural models and meanwhile support limited supervision types.
1 code implementation • ACL 2018 • Yuning Mao, Xiang Ren, Jiaming Shen, Xiaotao Gu, Jiawei Han
We present a novel end-to-end reinforcement learning approach to automatic taxonomy induction from a set of terms.
1 code implementation • 29 Apr 2018 • Jiaming Shen, Jinfeng Xiao, Xinwei He, Jingbo Shang, Saurabh Sinha, Jiawei Han
Different from Web or general domain search, a large portion of queries in scientific literature search are entity-set queries, that is, multiple entities of possibly different types.
1 code implementation • 21 Feb 2018 • Jingbo Shang, Tianhang Sun, Jiaming Shen, Xingbang Liu, Anja Gruenheid, Flip Korn, Adam Lelkes, Cong Yu, Jiawei Han
We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences.
no code implementations • 2 Oct 2016 • Junxian He, Ying Huang, Changfeng Liu, Jiaming Shen, Yuting Jia, Xinbing Wang
A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges.