1 code implementation • EMNLP 2021 • Carl Edwards, ChengXiang Zhai, Heng Ji
Moreover, this can be viewed as an especially challenging cross-lingual retrieval problem by considering the molecules as a language with a very unique grammar.
Ranked #2 on
Cross-Modal Retrieval
on ChEBI-20
1 code implementation • 27 Jun 2023 • Liliang Ren, Mankeerat Sidhu, Qi Zeng, Revanth Gangi Reddy, Heng Ji, ChengXiang Zhai
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system.
1 code implementation • 19 Jun 2023 • Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai
Linear State Space Models (SSMs) have demonstrated strong performance in a variety of sequence modeling tasks due to their efficient encoding of the recurrent structure.
Ranked #2 on
Long-range modeling
on LRA
no code implementations • 14 Jun 2023 • Krisztian Balog, ChengXiang Zhai
Information access systems, such as search engines, recommender systems, and conversational assistants, have become integral to our daily lives as they help us satisfy our information needs.
1 code implementation • 25 May 2023 • Chenkai Sun, Jinning Li, Hou Pong Chan, ChengXiang Zhai, Heng Ji
Our analysis shows that the best-performing models are capable of predicting responses that are consistent with the personas, and as a byproduct, the task formulation also enables many interesting applications in the analysis of social network groups and their opinions, such as the discovery of extreme opinion groups.
no code implementations • 6 Apr 2023 • Daniel Campos, ChengXiang Zhai, Alessandro Magnani
The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking.
no code implementations • 5 Apr 2023 • Daniel Campos, ChengXiang Zhai
Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise.
no code implementations • 31 Mar 2023 • Daniel Campos, Alessandro Magnani, ChengXiang Zhai
In this paper, we consider the problem of improving the inference latency of language model-based dense retrieval systems by introducing structural compression and model size asymmetry between the context and query encoders.
no code implementations • 31 Mar 2023 • Daniel Campos, ChengXiang Zhai
Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries.
no code implementations • 30 Mar 2023 • Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai
In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3. 8 and 24. 3 times faster models without expertise in model compression.
no code implementations • 1 Mar 2023 • Adam Davies, Jize Jiang, ChengXiang Zhai
Despite the recent success of large pretrained language models (LMs) on a variety of prompting tasks, these models can be alarmingly brittle to small changes in inputs or application contexts.
no code implementations • 11 Feb 2023 • Jiayu Liu, Zhenya Huang, ChengXiang Zhai, Qi Liu
In LeAp, we perform knowledge learning in a novel problem-knowledge-expression paradigm, with a Knowledge Encoder to acquire knowledge from problem data and a Knowledge Decoder to apply knowledge for expression reasoning.
no code implementations • 5 Dec 2022 • Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han
Given a few seed entities of a certain type (e. g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.
no code implementations • 15 Nov 2022 • Kevin Pei, Ishan Jindal, Kevin Chen-Chuan Chang, ChengXiang Zhai, Yunyao Li
Open Information Extraction (OpenIE) has been used in the pipelines of various NLP tasks.
1 code implementation • 23 Oct 2022 • Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, ChengXiang Zhai, Heng Ji
Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks.
Ranked #3 on
Few-shot NER
on Few-NERD (INTRA)
(using extra training data)
1 code implementation • 9 Oct 2022 • Bhavya Bhavya, JinJun Xiong, ChengXiang Zhai
We propose a novel application of prompting Pre-trained Language Models (PLMs) to generate analogies and study how to design effective prompts for two task settings: generating a source concept analogous to a given target concept (aka Analogous Concept Generation or ACG), and generating an explanation of the similarity between a given pair of target concept and source concept (aka Analogous Explanation Generation or AEG).
1 code implementation • COLING 2022 • Kung-Hsiang Huang, ChengXiang Zhai, Heng Ji
Given the absence of cross-lingual information retrieval datasets with claim-like queries, we train the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a self-supervised algorithm that creates training instances by translating the title of a passage.
Ranked #1 on
Zero-shot Cross-lingual Fact-checking
on X-Fact
Cross-lingual Fact-checking
Cross-Lingual Information Retrieval
+4
1 code implementation • 31 Aug 2022 • Chenkai Sun, Tie XU, ChengXiang Zhai, Heng Ji
In this paper, we present Tetris, a new task of Goal-Oriented Script Completion.
no code implementations • 25 May 2022 • Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, ChengXiang Zhai
Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches.
1 code implementation • Findings (ACL) 2022 • Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang, Yunyao Li, Lucian Popa, ChengXiang Zhai
We propose a probabilistic approach to select a subset of a \textit{target domain representative keywords} from a candidate set, contrasting with a context domain.
1 code implementation • Findings (ACL) 2022 • Tuan Manh Lai, Heng Ji, ChengXiang Zhai
We use the profile to query the indexed search engine to retrieve candidate entities.
1 code implementation • Findings (EMNLP) 2021 • Tuan Lai, Heng Ji, ChengXiang Zhai
Biomedical entity linking is the task of linking entity mentions in a biomedical document to referent entities in a knowledge base.
no code implementations • 29 Aug 2021 • Chenkai Sun, Weijiang Li, Jinfeng Xiao, Nikolaus Nova Parulian, ChengXiang Zhai, Heng Ji
Automated knowledge discovery from trending chemical literature is essential for more efficient biomedical research.
1 code implementation • ACL 2021 • Tuan Lai, Heng Ji, ChengXiang Zhai, Quan Hung Tran
It then uses an entity linker to form a knowledge graph containing relevant background knowledge for the the entity mentions in the text.
no code implementations • 13 May 2021 • Safa Messaoud, Ismini Lourentzou, Assma Boughoula, Mona Zehni, Zhizhen Zhao, ChengXiang Zhai, Alexander G. Schwing
The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content.
no code implementations • 1 Jan 2021 • Yufeng Zhang, Yunan Zhang, ChengXiang Zhai
To classify images, neural networks extract features from raw inputs and then sum them up with fixed weights via the fully connected layer.
no code implementations • 5 Nov 2020 • Dominic Seyler, Wei Liu, XiaoFeng Wang, ChengXiang Zhai
Dark jargons are benign-looking words that have hidden, sinister meanings and are used by participants of underground forums for illicit behavior.
no code implementations • 21 Oct 2020 • Shubhra Kanti Karmaker Santu, Md. Mahadi Hassan, Micah J. Smith, Lei Xu, ChengXiang Zhai, Kalyan Veeramachaneni
AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research.
1 code implementation • 17 Oct 2020 • Arkin Dharawat, Ismini Lourentzou, Alex Morales, ChengXiang Zhai
Several works study health misinformation detection, yet little attention has been given to the perceived severity of misinformation posts.
no code implementations • EMNLP 2020 • Yiren Wang, ChengXiang Zhai, Hany Hassan Awadalla
In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.
no code implementations • 20 Feb 2020 • Yinan Zhang, Parikshit Sondhi, Anjan Goswami, ChengXiang Zhai
Faceted browsing is a commonly supported feature of user interfaces for access to information.
1 code implementation • 2019 IEEE International Conference on Big Data (Big Data) 2019 • Yuxin Xiao, Zecheng Zhang, Carl Yang, ChengXiang Zhai
In this way, it leverages both local and non-local information simultaneously.
Ranked #1 on
Heterogeneous Node Classification
on DBLP (PACT) 14k
(Macro-F1 (60% training data) metric)
no code implementations • 4 Dec 2019 • Yunan Zhang, Xiang Cheng, Heting Gao, ChengXiang Zhai
We model the question answering on KG as a cooperative task between two agents, a knowledge graph reasoning agent and an information extraction agent.
no code implementations • 22 Nov 2019 • Yiren Wang, Hongzhao Huang, Zhe Liu, Yutong Pang, Yongqiang Wang, ChengXiang Zhai, Fuchun Peng
Although n-gram language models (LMs) have been outperformed by the state-of-the-art neural LMs, they are still widely used in speech recognition due to its high efficiency in inference.
no code implementations • 11 Nov 2019 • Yunan Zhang, Xiang Cheng, Yufeng Zhang, Zihan Wang, Zhengqi Fang, Xiaoyan Wang, Zhenya Huang, ChengXiang Zhai
Answering complex questions involving multiple entities and relations is a challenging task.
no code implementations • CONLL 2019 • Shubhra Kanti Karmaker Santu, Kalyan Veeramachaneni, ChengXiang Zhai
Specifically, we propose a novel language model called Topical Influence Language Model (TILM), which is a novel extension of a neural language model to capture the influences on the contents in one text stream by the evolving topics in another related (or possibly same) text stream.
no code implementations • ICLR 2019 • Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, Tie-Yan Liu
Dual learning has attracted much attention in machine learning, computer vision and natural language processing communities.
Ranked #1 on
Machine Translation
on WMT2016 English-German
1 code implementation • 12 Apr 2019 • Ismini Lourentzou, Kabir Manghnani, ChengXiang Zhai
Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks.
Ranked #3 on
Lexical Normalization
on LexNorm
no code implementations • 1 Mar 2019 • Shubhra Kanti Karmaker Santu, Parikshit Sondhi, ChengXiang Zhai
In this paper, we discuss the practical challenges in applying learning to rank methods to E-Com search, including the challenges in feature representation, obtaining reliable relevance judgments, and optimally exploiting multiple user feedback signals such as click rates, add-to-cart ratios, order rates, and revenue.
no code implementations • 1 Mar 2019 • Shubhra Kanti Karmaker Santu, Liangda Li, Yi Chang, ChengXiang Zhai
This assumption is unrealistic as there are many correlated events in the real world which influence each other and thus, would pose a joint influence on the user search behavior rather than posing influence independently.
no code implementations • 22 Feb 2019 • Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, Tie-Yan Liu
However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states).
2 code implementations • SIGIR '18 2018 • Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, ChengXiang Zhai, Xue-Qi Cheng
The local matching layer focuses on producing a set of local relevance signals by modeling the semantic matching between a query and each passage of a document.
1 code implementation • 19 Apr 2018 • Dominic Seyler, Lunan Li, ChengXiang Zhai
We propose to use the difference of language models of users and adversaries to define novel interpretable semantic features for measuring semantic incoherence in a message stream.
no code implementations • EMNLP 2017 • Alex Morales, ChengXiang Zhai
We study the problem of automatically identifying humorous text from a new kind of text data, i. e., online reviews.
no code implementations • ICML 2017 • Rongda Zhu, Lingxiao Wang, ChengXiang Zhai, Quanquan Gu
We apply our generic algorithm to two illustrative latent variable models: Gaussian mixture model and mixture of linear regression, and demonstrate the advantages of our algorithm by both theoretical analysis and numerical experiments.