Search Results for author: Zhe Zhao

Found 67 papers, 33 papers with code

Parameter-efficient Continual Learning Framework in Industrial Real-time Text Classification System

no code implementations NAACL (ACL) 2022 Tao Zhu, Zhe Zhao, Weijie Liu, Jiachi Liu, Yiren Chen, Weiquan Mao, Haoyan Liu, Kunbo Ding, Yudong Li, Xuefeng Yang

Catastrophic forgetting is a challenge for model deployment in industrial real-time systems, which requires the model to quickly master a new task without forgetting the old one.

Continual Learning text-classification +1

Mixture-of-Subspaces in Low-Rank Adaptation

1 code implementation16 Jun 2024 Taiqiang Wu, Jiahao Wang, Zhe Zhao, Ngai Wong

In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models.

Common Sense Reasoning Question Answering +3

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

no code implementations11 Jun 2024 Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu

Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges.

Benchmarking Fairness

Dynamic data sampler for cross-language transfer learning in large language models

1 code implementation17 May 2024 Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou, Xianxu Hou

Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications.

Language Modelling Transfer Learning +1

Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks

no code implementations23 Apr 2024 Zhe Zhao, Pengkun Wang, Xu Wang, Haibin Wen, Xiaolong Xie, Zhengyang Zhou, Qingfu Zhang, Yang Wang

Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning.

Graph Representation Learning

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

no code implementations3 Apr 2024 Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs).

Diversity Knowledge Distillation

KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation

1 code implementation22 Mar 2024 Xindi Luo, Zequn Sun, Jing Zhao, Zhe Zhao, Wei Hu

Parameter-efficient finetuning (PEFT) is a key technique for adapting large language models (LLMs) to downstream tasks.

Knowledge Graph Embeddings Knowledge Graphs

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

1 code implementation28 Feb 2024 Tong Liu, Yingjie Zhang, Zhe Zhao, Yinpeng Dong, Guozhu Meng, Kai Chen

We evaluate DRA across various open-source and closed-source models, showcasing state-of-the-art jailbreak success rates and attack efficiency.

Chatbot Reconstruction Attack

Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model

no code implementations21 Feb 2024 Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge.

Knowledge Distillation Transfer Learning

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

no code implementations7 Feb 2024 Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks.

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

no code implementations4 Oct 2023 Zhe Zhao, Qingyun Liu, Huan Gui, Bang An, Lichan Hong, Ed H. Chi

In this paper, we extend KD with an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models.

Decoder Knowledge Distillation +1

Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning

1 code implementation20 Sep 2023 Wenhang Shi, Yiren Chen, Zhe Zhao, Wei Lu, Kimmo Yan, Xiaoyong Du

Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance.

Continual Learning

$\rm SP^3$: Enhancing Structured Pruning via PCA Projection

no code implementations31 Aug 2023 Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen

Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency.

EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

no code implementations27 Jun 2023 Haowei Li, Wenqing Yan, Du Liu, Long Qian, Yuxing Yang, Yihao Liu, Zhe Zhao, Hui Ding, Guangzhi Wang

The head surface is reconstructed using depth data for spatial registration, avoiding fixing tracking targets rigidly on the patient's skull.


COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

1 code implementation5 Jun 2023 Shibal Ibrahim, Wenyu Chen, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

To deal with this challenge, we propose a novel, permutation-based local search method that can complement first-order methods in training any sparse gate, e. g., Hash routing, Top-k, DSelect-k, and COMET.

Language Modelling Recommendation Systems

QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems

no code implementations23 May 2023 Guangke Chen, Yedi Zhang, Zhe Zhao, Fu Song

Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS, thus still falling behind practical attacks against proprietary commercial APIs and voice-controlled devices.

Speaker Recognition

Recouple Event Field via Probabilistic Bias for Event Extraction

no code implementations19 May 2023 Xingyu Bai, Taiqiang Wu, Han Guo, Zhe Zhao, Xuefeng Yang, Jiayi Li, Weijie Liu, Qi Ju, Weigang Guo, Yujiu Yang

Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs).

Event Extraction

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

no code implementations24 Mar 2023 Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic.

Knowledge Distillation

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

no code implementations28 Feb 2023 Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning.

Combinatorial Optimization Network Pruning

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

3 code implementations13 Dec 2022 Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, Han Guo, Weigang Guo, Taiqiang Wu, Tao Zhu, Wenhang Shi, Chen Chen, Shan Huang, Sihong Chen, Liqun Liu, Feifei Li, Xiaoshuai Chen, Xingwu Sun, Zhanhui Kang, Xiaoyong Du, Linlin Shen, Kimmo Yan

The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework.


QVIP: An ILP-based Formal Verification Approach for Quantized Neural Networks

1 code implementation10 Dec 2022 Yedi Zhang, Zhe Zhao, Fu Song, Min Zhang, Taolue Chen, Jun Sun

Experimental results on QNNs with different quantization bits confirm the effectiveness and efficiency of our approach, e. g., two orders of magnitude faster and able to solve more verification tasks in the same time limit than the state-of-the-art methods.


A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

1 code implementation COLING 2022 Kunbo Ding, Weijie Liu, Yuejian Fang, Weiquan Mao, Zhe Zhao, Tao Zhu, Haoyan Liu, Rong Tian, Yiren Chen

Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages.

text-classification Text Classification +3

SAMP: A Model Inference Toolkit of Post-Training Quantization for Text Processing via Self-Adaptive Mixed-Precision

no code implementations19 Sep 2022 Rong Tian, Zijing Zhao, Weijie Liu, Haoyan Liu, Weiquan Mao, Zhe Zhao, Kan Zhou

The latest industrial inference engines, such as FasterTransformer and TurboTransformers, have verified that half-precision floating point (FP16) and 8-bit integer (INT8) quantization can greatly improve model inference speed.


Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching

1 code implementation Findings (NAACL) 2022 Kunbo Ding, Weijie Liu, Yuejian Fang, Zhe Zhao, Qi Ju, Xuefeng Yang

Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks.

Contrastive Learning Knowledge Distillation +3

Scalable Bayesian Inference for Detection and Deblending in Astronomical Images

1 code implementation12 Jul 2022 Derek Hansen, Ismael Mendoza, Runjing Liu, Ziteng Pang, Zhe Zhao, Camille Avestruz, Jeffrey Regier

We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS).

Bayesian Inference Variational Inference

AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems

no code implementations7 Jun 2022 Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu

Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks, raising significant security concerns in deploying SRSs.

Adversarial Attack Speaker Recognition

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

1 code implementation7 Jun 2022 Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Feng Wang, Jiashui Wang

According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition.

Speaker Recognition speech-recognition +1

Improving Multi-Task Generalization via Regularizing Spurious Correlation

no code implementations19 May 2022 Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed H. Chi

First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other.

Multi-Task Learning Representation Learning

Semantic Matching from Different Perspectives

1 code implementation14 Feb 2022 Weijie Liu, Tao Zhu, Weiquan Mao, Zhe Zhao, Weigang Guo, Xuefeng Yang, Qi Ju

In this paper, we pay attention to the issue which is usually overlooked, i. e., \textit{similarity should be determined from different perspectives}.

Sentence Text Matching +1

Transformer Memory as a Differentiable Search Index

1 code implementation14 Feb 2022 Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

SEC4SR: A Security Analysis Platform for Speaker Recognition

1 code implementation4 Sep 2021 Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu

To bridge this gap, we present SEC4SR, the first platform enabling researchers to systematically and comprehensively evaluate adversarial attacks and defenses in SR. SEC4SR incorporates 4 white-box and 2 black-box attacks, 24 defenses including our novel feature-level transformations.

Speaker Recognition

The Benchmark Lottery

no code implementations14 Jul 2021 Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

Attack as Defense: Characterizing Adversarial Examples using Robustness

1 code implementation13 Mar 2021 Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, Jun Sun

Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks.

BDD4BNN: A BDD-based Quantitative Analysis Framework for Binarized Neural Networks

no code implementations12 Mar 2021 Yedi Zhang, Zhe Zhao, Guangke Chen, Fu Song, Taolue Chen

Verifying and explaining the behavior of neural networks is becoming increasingly important, especially when they are deployed in safety-critical applications.


Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations1 Jan 2021 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

Information Transfer in Multi-Task Learning

no code implementations1 Jan 2021 Chris Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Multi-task learning can leverage information learned by one task to benefit the training of other tasks.

Multi-Task Learning

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations ICLR 2021 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Measuring and Harnessing Transference in Multi-Task Learning

no code implementations29 Oct 2020 Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Multi-task learning can leverage information learned by one task to benefit the training of other tasks.

Multi-Task Learning

Small Towers Make Big Differences

no code implementations13 Aug 2020 Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, Ed H. Chi

A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization.

Multi-Task Learning

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations12 Jul 2020 Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model

no code implementations9 Jun 2020 Jiaqi Ma, Xinyang Yi, Weijing Tang, Zhe Zhao, Lichan Hong, Ed H. Chi, Qiaozhu Mei

We investigate the Plackett-Luce (PL) model based listwise learning-to-rank (LTR) on data with partitioned preference, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown.

Extreme Multi-Label Classification Learning-To-Rank +1

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation2 May 2020 Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

 Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric, using extra training data)

Abstractive Text Summarization Dialogue Generation +6

Understanding and Improving Knowledge Distillation

no code implementations10 Feb 2020 Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget.

Knowledge Distillation Model Compression

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

1 code implementation3 Nov 2019 Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu

In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical blackbox setting.

Adversarial Attack Speaker Recognition +2

K-BERT: Enabling Language Representation with Knowledge Graph

2 code implementations arXiv 2019 Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang

For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge.

Knowledge Graphs Sentence

UER: An Open-Source Toolkit for Pre-training Models

1 code implementation IJCNLP 2019 Zhe Zhao, Hui Chen, Jinbin Zhang, Xin Zhao, Tao Liu, Wei Lu, Xi Chen, Haotang Deng, Qi Ju, Xiaoyong Du

Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks.

Recommending what video to watch next: a multitask ranking system

no code implementations RecSys 2019 Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi

In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform.

Fairness in Recommendation Ranking through Pairwise Comparisons

no code implementations2 Mar 2019 Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, Cristos Goodrow

Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information.

Fairness Recommendation Systems

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

10 code implementations19 Jul 2018 Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, Ed Chi

In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data.

Binary Classification Click-Through Rate Prediction +2

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

no code implementations1 Jul 2017 Alex Beutel, Jilin Chen, Zhe Zhao, Ed H. Chi

How can we learn a classifier that is "fair" for a protected or sensitive group, when we do not know if the input to the classifier belongs to the protected group?

Attribute Fairness +1

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

1 code implementation27 Dec 2015 Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang, Zhe Zhao

Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task.

General Classification Sentiment Analysis +1

Cannot find the paper you are looking for? You can Submit a new open access paper.