Understanding and Mitigating the Uncertainty in Zero-Shot Translation

20 May 2022 Wenxuan Wang, Wenxiang Jiao, Shuo Wang, Zhaopeng Tu, Michael R. Lyu

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation (MNMT) system.

Machine Translation

AEON: A Method for Automatic Evaluation of NLP Test Cases

13 May 2022 Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He, Yuxin Su, Michael R. Lyu

However, in practice, many of the generated test cases fail to preserve similar semantic meaning and are unnatural (e. g., grammar errors), which leads to a high false alarm rate and unnatural test cases.

Natural Language Processing

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

14 Feb 2022 Jianqiao Zhao, Yanyang Li, Wanyu Du, Yangfeng Ji, Dong Yu, Michael R. Lyu, LiWei Wang

Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.

Dialogue Evaluation

Towards Efficient Post-training Quantization of Pre-trained Language Models

30 Sep 2021 Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael R. Lyu

Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.


Graph-based Incident Aggregation for Large-Scale Online Service Systems

27 Aug 2021 Zhuangbin Chen, Jinyang Liu, Yuxin Su, Hongyu Zhang, Xuemin Wen, Xiao Ling, Yongqiang Yang, Michael R. Lyu

The proposed framework is evaluated with real-world incident data collected from a large-scale online service system of Huawei Cloud.

Graph Representation Learning

Experience Report: Deep Learning-based System Log Analysis for Anomaly Detection

13 Jul 2021 Zhuangbin Chen, Jinyang Liu, Wenwei Gu, Yuxin Su, Michael R. Lyu

To better understand the characteristics of different anomaly detectors, in this paper, we provide a comprehensive review and evaluation of five popular neural networks used by six state-of-the-art methods.

Anomaly Detection

Improving the Transferability of Adversarial Samples With Adversarial Transformations

CVPR 2021 Weibin Wu, Yuxin Su, Michael R. Lyu, Irwin King

Although deep neural networks (DNNs) have achieved tremendous performance in diverse vision challenges, they are surprisingly susceptible to adversarial examples, which are born of intentionally perturbing benign samples in a human-imperceptible fashion.

Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation

8 Jun 2021 Pengpeng Liu, Michael R. Lyu, Irwin King, Jia Xu

Then, a self-supervised learning framework is constructed: confident predictions from teacher models are served as annotations to guide the student model to learn optical flow for those less confident predictions.

Optical Flow Estimation

Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation

ACL 2021 Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Shuming Shi, Michael R. Lyu, Irwin King

In this work, we propose to improve the sampling procedure by selecting the most informative monolingual sentences to complement the parallel data.

Machine Translation

Open-Retrieval Conversational Machine Reading

17 Feb 2021 Yifan Gao, Jingjing Li, Chien-Sheng Wu, Michael R. Lyu, Irwin King

On our created OR-ShARC dataset, MUDERN achieves the state-of-the-art performance, outperforming existing single-passage conversational machine reading models as well as a new multi-passage conversational machine reading baseline by a large margin.

Reading Comprehension

Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings

EMNLP 2020 Yue Wang, Jing Li, Michael R. Lyu, Irwin King

Further analyses show that our multi-head attention is able to attend information from various aspects and boost classification or generation in diverse scenarios.

Effective Data-aware Covariance Estimator from Compressed Data

10 Oct 2020 Xixian Chen, Haiqin Yang, Shenglin Zhao, Michael R. Lyu, Irwin King

Estimating covariance matrix from massive high-dimensional and distributed data is significant for various real-world applications.

Making Online Sketching Hashing Even Faster

10 Oct 2020 Xixian Chen, Haiqin Yang, Shenglin Zhao, Michael R. Lyu, Irwin King

Data-dependent hashing methods have demonstrated good performance in various machine learning applications to learn a low-dimensional representation from the original data.

Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation

EMNLP 2020 Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael R. Lyu, Zhaopeng Tu

First, we train an identification model on the original training data, and use it to distinguish inactive examples and active examples by their sentence-level output probabilities.

Machine Translation

Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

EMNLP 2020 Yifan Gao, Chien-Sheng Wu, Jingjing Li, Shafiq Joty, Steven C. H. Hoi, Caiming Xiong, Irwin King, Michael R. Lyu

Based on the learned EDU and entailment representations, we either reply to the user our final decision "yes/no/irrelevant" of the initial question, or generate a follow-up question to inquiry more information.

Decision Making

Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing

23 Aug 2020 Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, Michael R. Lyu

Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT in identifying emerging app issues, improving the state-of-the-art method by 22. 3% in terms of F1-score.

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

14 Aug 2020 Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu

To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets.

Software Engineering

EMT: Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading

26 May 2020 Yifan Gao, Chien-Sheng Wu, Shafiq Joty, Caiming Xiong, Richard Socher, Irwin King, Michael R. Lyu, Steven C. H. Hoi

The goal of conversational machine reading is to answer user questions given a knowledge base text which may require asking clarification questions.

Reading Comprehension

Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models

28 Apr 2020 Shilin He, Xing Wang, Shuming Shi, Michael R. Lyu, Zhaopeng Tu

In this paper, we bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table -- an interpretable table of bilingual lexicons.

Machine Translation

VD-BERT: A Unified Vision and Dialog Transformer with BERT

EMNLP 2020 Yue Wang, Shafiq Joty, Michael R. Lyu, Irwin King, Caiming Xiong, Steven C. H. Hoi

By contrast, in this work, we propose VD-BERT, a simple yet effective framework of unified vision-dialog Transformer that leverages the pretrained BERT language models for Visual Dialog tasks.

Visual Dialog

Why an Android App is Classified as Malware? Towards Malware Classification Interpretation

24 Apr 2020 Bozhi Wu, Sen Chen, Cuiyun Gao, Lingling Fan, Yang Liu, Weiping Wen, Michael R. Lyu

In this paper, to fill this gap, we propose a novel and interpretable ML-based approach (named XMal) to classify malware with high accuracy and explain the classification result meanwhile.

Android Malware Detection

What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process

10 Feb 2020 Jichuan Zeng, Jing Li, Yulan He, Cuiyun Gao, Michael R. Lyu, Irwin King

In our world with full of uncertainty, debates and argumentation contribute to the progress of science and society.

Automating App Review Response Generation

10 Feb 2020 Cuiyun Gao, Jichuan Zeng, Xin Xia, David Lo, Michael R. Lyu, Irwin King

Previous studies showed that replying to a user review usually has a positive effect on the rating that is given by the user to the app.

Response Generation

Neuron Interaction Based Representation Composition for Neural Machine Translation

22 Nov 2019 Jian Li, Xing Wang, Baosong Yang, Shuming Shi, Michael R. Lyu, Zhaopeng Tu

Starting from this intuition, we propose a novel approach to compose representations learned by different components in neural machine translation (e. g., multi-layer networks or multi-head attention), based on modeling strong interactions among neurons in the representation vectors.

Machine Translation

Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network

20 Nov 2019 Wenxiang Jiao, Michael R. Lyu, Irwin King

We propose an Attention Gated Hierarchical Memory Network (AGHMN) to address the problems of prior work: (1) Commonly used convolutional neural networks (CNNs) for utterance feature extraction are less compatible in the memory modules; (2) Unidirectional gated recurrent units (GRUs) only allow each historical utterance to have context before it, preventing information propagation in the opposite direction; (3) The Soft Attention for summarizing loses the positional and ordering information of memories, regardless of how the memory bank is built.

Emotion Recognition in Conversation

Improving Word Representations: A Sub-sampled Unigram Distribution for Negative Sampling

21 Oct 2019 Wenxiang Jiao, Irwin King, Michael R. Lyu

Word2Vec is the most popular model for word representation and has been widely investigated in literature.

Sentence Completion

PT-CoDE: Pre-trained Context-Dependent Encoder for Utterance-level Emotion Recognition

20 Oct 2019 Wenxiang Jiao, Michael R. Lyu, Irwin King

Witnessing the success of transfer learning in natural language process (NLP), we propose to pre-train a context-dependent encoder (CoDE) for ULER by learning from unlabeled conversation data.

Emotion Recognition

Improving Question Generation With to the Point Context

IJCNLP 2019 Jingjing Li, Yifan Gao, Lidong Bing, Irwin King, Michael R. Lyu

Question generation (QG) is the task of generating a question from a reference sentence and a specified answer within the sentence.

Question Generation

Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression

24 Sep 2019 Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, Michael R. Lyu

Data compression is essential to reduce the cost of log storage.

Software Engineering

Towards Understanding Neural Machine Translation with Word Importance

IJCNLP 2019 Shilin He, Zhaopeng Tu, Xing Wang, Long-Yue Wang, Michael R. Lyu, Shuming Shi

Although neural machine translation (NMT) has advanced the state-of-the-art on various language pairs, the interpretability of NMT remains unsatisfactory.

Machine Translation

An Online Topic Modeling Framework with Topics Automatically Labeled

WS 2019 Fenglei Jin, Cuiyun Gao, Michael R. Lyu

In this paper, we propose a novel online topic tracking framework, named IEDL, for tracking the topic changes related to deep learning techniques on Stack Exchange and automatically interpreting each identified topic.

Interconnected Question Generation with Coreference Alignment and Conversation Flow Modeling

ACL 2019 Yifan Gao, Piji Li, Irwin King, Michael R. Lyu

The coreference alignment modeling explicitly aligns coreferent mentions in conversation history with corresponding pronominal references in generated questions, which makes generated questions interconnected to conversation history.

Question Generation

Topic-Aware Neural Keyphrase Generation for Social Media Language

ACL 2019 Yue Wang, Jing Li, Hou Pong Chan, Irwin King, Michael R. Lyu, Shuming Shi

Further discussions show that our model learns meaningful topics, which interprets its superiority in social media keyphrase generation.

Keyphrase Generation

Doctor of Crosswise: Reducing Over-parametrization in Neural Networks

24 May 2019 J. D. Curtó, I. C. Zarza, Kris Kitani, Irwin King, Michael R. Lyu

Dr. of Crosswise proposes a new architecture to reduce over-parametrization in Neural Networks.

Microblog Hashtag Generation via Encoding Conversation Contexts

NAACL 2019 Yue Wang, Jing Li, Irwin King, Michael R. Lyu, Shuming Shi

Automatic hashtag annotation plays an important role in content understanding for microblog posts.

Topic Models

HiGRU: Hierarchical Gated Recurrent Units for Utterance-level Emotion Recognition

NAACL 2019 Wenxiang Jiao, Haiqin Yang, Irwin King, Michael R. Lyu

In this paper, we address three challenges in utterance-level emotion recognition in dialogue systems: (1) the same word can deliver different emotions in different contexts; (2) some emotions are rarely seen in general dialogues; (3) long-range contextual information is hard to be effectively captured.

Emotion Recognition

Information Aggregation for Multi-Head Attention with Routing-by-Agreement

NAACL 2019 Jian Li, Baosong Yang, Zi-Yi Dou, Xing Wang, Michael R. Lyu, Zhaopeng Tu

Multi-head attention is appealing for its ability to jointly extract different types of information from multiple representation subspaces.

Machine Translation

DDFlow: Learning Optical Flow with Unlabeled Data Distillation

25 Feb 2019 Pengpeng Liu, Irwin King, Michael R. Lyu, Jia Xu

We present DDFlow, a data distillation approach to learning optical flow estimation from unlabeled data.

Optical Flow Estimation

Tools and Benchmarks for Automated Log Parsing

8 Nov 2018 Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu

Logs are imperative in the development and maintenance process of many software systems.

Software Engineering

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

NeurIPS 2018 Han Shao, Xiaotian Yu, Irwin King, Michael R. Lyu

In this paper, under a weaker assumption on noises, we study the problem of \underline{lin}ear stochastic {\underline b}andits with h{\underline e}avy-{\underline t}ailed payoffs (LinBET), where the distributions have finite moments of order $1+\epsilon$, for some $\epsilon\in (0, 1]$.

Multi-Head Attention with Disagreement Regularization

EMNLP 2018 Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, Tong Zhang

Multi-head attention is appealing for the ability to jointly attend to information from different representation subspaces at different positions.


Generating Distractors for Reading Comprehension Questions from Real Examinations

8 Sep 2018 Yifan Gao, Lidong Bing, Piji Li, Irwin King, Michael R. Lyu

We investigate the task of distractor generation for multiple choice reading comprehension questions from examinations.

Distractor Generation

Title-Guided Encoding for Keyphrase Generation

26 Aug 2018 Wang Chen, Yifan Gao, Jiani Zhang, Irwin King, Michael R. Lyu

Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP).

Natural Language Processing

Difficulty Controllable Generation of Reading Comprehension Questions

10 Jul 2018 Yifan Gao, Lidong Bing, Wang Chen, Michael R. Lyu, Irwin King

We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG).

Question Generation

DeepObfuscation: Securing the Structure of Convolutional Neural Networks via Knowledge Distillation

27 Jun 2018 Hui Xu, Yuxin Su, Zirui Zhao, Yangfan Zhou, Michael R. Lyu, Irwin King

Our obfuscation approach is very effective to protect the critical structure of a deep learning model from being exposed to attackers.

Cryptography and Security

A Directed Acyclic Graph Approach to Online Log Parsing

12 Jun 2018 Pinjia He, Jieming Zhu, Pengcheng Xu, Zibin Zheng, Michael R. Lyu

A typical log-based system reliability management procedure is to first parse log messages because of their unstructured format; and apply data mining techniques on the parsed logs to obtain critical system behavior information.

Software Engineering

Code Completion with Neural Attention and Pointer Networks

27 Nov 2017 Jian Li, Yue Wang, Michael R. Lyu, Irwin King

Intelligent code completion has become an essential research task to accelerate modern software development.

Code Completion

Semantically Consistent Image Completion with Fine-grained Details

26 Nov 2017 Pengpeng Liu, Xiaojuan Qi, Pinjia He, Yikang Li, Michael R. Lyu, Irwin King

Image completion has achieved significant progress due to advances in generative adversarial networks (GANs).

Image Inpainting

High-Resolution Deep Convolutional Generative Adversarial Networks

17 Nov 2017 Joachim D. Curtó, Irene C. Zarza, Fernando de la Torre, Irwin King, Michael R. Lyu

Generative Adversarial Networks (GANs) convergence in a high-resolution setting with a computational constrain of GPU memory capacity (from 12GB to 24 GB) has been beset with difficulty due to the known lack of convergence rate stability.

 Ranked #1 on Image Generation on CelebA 128x128 (MS-SSIM metric)

Image Generation

On Secure and Usable Program Obfuscation: A Survey

3 Oct 2017 Hui Xu, Yangfan Zhou, Yu Kang, Michael R. Lyu

On the other hand, the performance requirement for model-oriented obfuscation approaches is too weak to develop practical program obfuscation solutions.

Software Engineering

Toward Efficient and Accurate Covariance Matrix Estimation on Compressed Data

ICML 2017 Xixian Chen, Michael R. Lyu, Irwin King

Estimating covariance matrices is a fundamental technique in various domains, most notably in machine learning and signal processing.

Data Compression

A Survey of Point-of-interest Recommendation in Location-based Social Networks

3 Jul 2016 Shenglin Zhao, Irwin King, Michael R. Lyu

Then, we present a comprehensive review in three aspects: influential factors for POI recommendation, methodologies employed for POI recommendation, and different tasks in POI recommendation.

Recommendation Systems

N-Version Obfuscation: Impeding Software Tampering Replication with Program Diversity

8 Jun 2015 Hui Xu, Yangfan Zhou, Michael R. Lyu

Our idea is to impede the replication of tampering via program diversification, and thus increasing the complexity to break the whole software system.

Programming Languages

Exact and Stable Recovery of Pairwise Interaction Tensors

NeurIPS 2013 Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu

For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors.

Collaborative Filtering

