no code implementations • NAACL (ACL) 2022 • Tao Zhu, Zhe Zhao, Weijie Liu, Jiachi Liu, Yiren Chen, Weiquan Mao, Haoyan Liu, Kunbo Ding, Yudong Li, Xuefeng Yang
Catastrophic forgetting is a challenge for model deployment in industrial real-time systems, which requires the model to quickly master a new task without forgetting the old one.
no code implementations • 23 May 2023 • Guangke Chen, Yedi Zhang, Zhe Zhao, Fu Song
Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS, thus still falling behind practical attacks against proprietary commercial APIs and voice-controlled devices.
no code implementations • 19 May 2023 • Xingyu Bai, Taiqiang Wu, Han Guo, Zhe Zhao, Xuefeng Yang, Jiayi Li, Weijie Liu, Qi Ju, Weigang Guo, Yujiu Yang
Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs).
no code implementations • 16 May 2023 • Taiqiang Wu, Cheng Hou, Zhe Zhao, Shanshan Lao, Jiayi Li, Ngai Wong, Yujiu Yang
Knowledge Distillation (KD) is a predominant approach for BERT compression.
no code implementations • 24 Mar 2023 • Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang
Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic.
no code implementations • 28 Feb 2023 • Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder
Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning.
2 code implementations • 13 Dec 2022 • Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, Han Guo, Weigang Guo, Taiqiang Wu, Tao Zhu, Wenhang Shi, Chen Chen, Shan Huang, Sihong Chen, Liqun Liu, Feifei Li, Xiaoshuai Chen, Xingwu Sun, Zhanhui Kang, Xiaoyong Du, Linlin Shen, Kimmo Yan
The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework.
1 code implementation • 10 Dec 2022 • Yedi Zhang, Zhe Zhao, Fu Song, Min Zhang, Taolue Chen, Jun Sun
Experimental results on QNNs with different quantization bits confirm the effectiveness and efficiency of our approach, e. g., two orders of magnitude faster and able to solve more verification tasks in the same time limit than the state-of-the-art methods.
1 code implementation • COLING 2022 • Kunbo Ding, Weijie Liu, Yuejian Fang, Weiquan Mao, Zhe Zhao, Tao Zhu, Haoyan Liu, Rong Tian, Yiren Chen
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages.
no code implementations • 19 Sep 2022 • Rong Tian, Zijing Zhao, Weijie Liu, Haoyan Liu, Weiquan Mao, Zhe Zhao, Kimmo Yan
The latest industrial inference engines, such as FasterTransformer1 and TurboTransformers, have verified that half-precision floating point (FP16) and 8-bit integer (INT8) quantization can greatly improve model inference speed.
1 code implementation • Findings (NAACL) 2022 • Kunbo Ding, Weijie Liu, Yuejian Fang, Zhe Zhao, Qi Ju, Xuefeng Yang
Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks.
2 code implementations • COLING 2022 • Yudong Li, Yuqing Zhang, Zhe Zhao, Linlin Shen, Weijie Liu, Weiquan Mao, HUI ZHANG
The CSL can serve as a Chinese corpus.
1 code implementation • 12 Jul 2022 • Derek Hansen, Ismael Mendoza, Runjing Liu, Ziteng Pang, Zhe Zhao, Camille Avestruz, Jeffrey Regier
We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS).
no code implementations • 7 Jun 2022 • Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu
Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks, raising significant security concerns in deploying SRSs.
1 code implementation • 7 Jun 2022 • Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Feng Wang, Jiashui Wang
According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition.
no code implementations • 19 May 2022 • Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed H. Chi
First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other.
no code implementations • 1 Mar 2022 • Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.
1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler
In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.
1 code implementation • 14 Feb 2022 • Weijie Liu, Tao Zhu, Weiquan Mao, Zhe Zhao, Weigang Guo, Xuefeng Yang, Qi Ju
In this paper, we pay attention to the issue which is usually overlooked, i. e., \textit{similarity should be determined from different perspectives}.
1 code implementation • 15 Oct 2021 • Yinpeng Dong, Qi-An Fu, Xiao Yang, Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu, Jiayu Tang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Hui Xue, Chao Li, Ye Liu, Qilong Zhang, Lianli Gao, Yunrui Yu, Xitong Gao, Zhe Zhao, Daquan Lin, Jiadong Lin, Chuanbiao Song, ZiHao Wang, Zhennan Wu, Yang Guo, Jiequan Cui, Xiaogang Xu, Pengguang Chen
Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years.
1 code implementation • NeurIPS 2021 • Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
1 code implementation • 4 Sep 2021 • Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu
To bridge this gap, we present SEC4SR, the first platform enabling researchers to systematically and comprehensively evaluate adversarial attacks and defenses in SR. SEC4SR incorporates 4 white-box and 2 black-box attacks, 24 defenses including our novel feature-level transformations.
no code implementations • 14 Jul 2021 • Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals
The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.
3 code implementations • NeurIPS 2021 • Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi
State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example.
1 code implementation • 13 Mar 2021 • Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, Jun Sun
Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks.
no code implementations • 12 Mar 2021 • Yedi Zhang, Zhe Zhao, Guangke Chen, Fu Song, Taolue Chen
Verifying and explaining the behavior of neural networks is becoming increasingly important, especially when they are deployed in safety-critical applications.
no code implementations • 1 Jan 2021 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng
The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.
no code implementations • 1 Jan 2021 • Chris Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
no code implementations • ICLR 2021 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.
no code implementations • 29 Oct 2020 • Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn
Multi-task learning can leverage information learned by one task to benefit the training of other tasks.
no code implementations • 13 Aug 2020 • Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, Ed H. Chi
A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization.
no code implementations • 12 Jul 2020 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan
The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.
no code implementations • 9 Jun 2020 • Jiaqi Ma, Xinyang Yi, Weijing Tang, Zhe Zhao, Lichan Hong, Ed H. Chi, Qiaozhu Mei
We investigate the Plackett-Luce (PL) model based listwise learning-to-rank (LTR) on data with partitioned preference, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown.
1 code implementation • 2 May 2020 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng
The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.
Ranked #1 on
Dialogue Generation
on Persona-Chat
(BLEU-1 metric)
3 code implementations • COLING 2020 • Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, Zhenzhong Lan
The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks.
3 code implementations • ACL 2020 • Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju
Pre-trained language models like BERT have proven to be highly performant.
no code implementations • 10 Feb 2020 • Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain
Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget.
1 code implementation • 3 Nov 2019 • Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu
In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical blackbox setting.
2 code implementations • arXiv 2019 • Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang
For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge.
no code implementations • ACM Conference on Recommender Systems 2019 • Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, Ed Chi
However, batch loss is subject to sampling bias which could severely restrict model performance, particularly in the case of power-law distribution.
1 code implementation • IJCNLP 2019 • Zhe Zhao, Hui Chen, Jinbin Zhang, Xin Zhao, Tao Liu, Wei Lu, Xi Chen, Haotang Deng, Qi Ju, Xiaoyong Du
Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks.
no code implementations • RecSys 2019 • Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi
In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform.
1 code implementation • 19 May 2019 • Lei Bu, Yuchao Duan, Fu Song, Zhe Zhao
In this work, we first conduct a comprehensive study of existing methods and tools for crafting.
no code implementations • 2 Mar 2019 • Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, Cristos Goodrow
Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information.
7 code implementations • 19 Jul 2018 • Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, Ed Chi
In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data.
2 code implementations • ACL 2018 • Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du
Analogical reasoning is effective in capturing linguistic regularities.
no code implementations • EMNLP 2017 • Shen Li, Zhe Zhao, Tao Liu, Renfen Hu, Xiaoyong Du
Convolutional Neural Networks (CNNs) are widely used in NLP tasks.
no code implementations • EMNLP 2017 • Bofang Li, Tao Liu, Zhe Zhao, Buzhou Tang, Aleks Drozd, R, Anna Rogers, Xiaoyong Du
The number of word embedding models is growing every year.
no code implementations • EMNLP 2017 • Zhe Zhao, Tao Liu, Shen Li, Bofang Li, Xiaoyong Du
The existing word representation methods mostly limit their information source to word co-occurrence statistics.
no code implementations • 1 Jul 2017 • Alex Beutel, Jilin Chen, Zhe Zhao, Ed H. Chi
How can we learn a classifier that is "fair" for a protected or sensitive group, when we do not know if the input to the classifier belongs to the protected group?
1 code implementation • COLING 2016 • Bofang Li, Zhe Zhao, Tao Liu, Puwei Wang, Xiaoyong Du
We train n-gram embeddings and use NB weighting to guide the neural models to focus on important words.
1 code implementation • 27 Dec 2015 • Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang, Zhe Zhao
Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task.