Search Results for author: Zhe Zhao

Found 62 papers, 30 papers with code

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

1 code implementation • 27 Dec 2015 • Bofang Li, Tao Liu, Xiaoyong Du, Deyuan Zhang, Zhe Zhao

Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task.

General Classification Sentiment Analysis +1

Paper
Code

Weighted Neural Bag-of-n-grams Model: New Baselines for Text Classification

1 code implementation • COLING 2016 • Bofang Li, Zhe Zhao, Tao Liu, Puwei Wang, Xiaoyong Du

We train n-gram embeddings and use NB weighting to guide the neural models to focus on important words.

General Classification text-classification +2

Paper
Code

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

no code implementations • 1 Jul 2017 • Alex Beutel, Jilin Chen, Zhe Zhao, Ed H. Chi

How can we learn a classifier that is "fair" for a protected or sensitive group, when we do not know if the input to the classifier belongs to the protected group?

Attribute Fairness +1

Paper
Add Code

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings

no code implementations • EMNLP 2017 • Bofang Li, Tao Liu, Zhe Zhao, Buzhou Tang, Aleks Drozd, R, Anna Rogers, Xiaoyong Du

The number of word embedding models is growing every year.

Chunking Learning Word Embeddings +4

Paper
Add Code

Initializing Convolutional Filters with Semantic Features for Text Classification

no code implementations • EMNLP 2017 • Shen Li, Zhe Zhao, Tao Liu, Renfen Hu, Xiaoyong Du

Convolutional Neural Networks (CNNs) are widely used in NLP tasks.

General Classification Sentiment Analysis +3

Paper
Add Code

Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

no code implementations • EMNLP 2017 • Zhe Zhao, Tao Liu, Shen Li, Bofang Li, Xiaoyong Du

The existing word representation methods mostly limit their information source to word co-occurrence statistics.

Language Modelling Word Embeddings

Paper
Add Code

Analogical Reasoning on Chinese Morphological and Semantic Relations

2 code implementations • ACL 2018 • Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du

Analogical reasoning is effective in capturing linguistic regularities.

Word Embeddings

11,586

Paper
Code

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts

10 code implementations • 19 Jul 2018 • Jiaqi Ma, Zhe Zhao, Xinyang Yi, Jilin Chen, Lichan Hong, Ed Chi

In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data.

Binary Classification Click-Through Rate Prediction +2

7,345

Paper
Code

Fairness in Recommendation Ranking through Pairwise Comparisons

no code implementations • 2 Mar 2019 • Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, Cristos Goodrow

Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information.

Fairness Recommendation Systems

Paper
Add Code

Taking Care of The Discretization Problem: A Comprehensive Study of the Discretization Problem and A Black-Box Adversarial Attack in Discrete Integer Domain

1 code implementation • 19 May 2019 • Lei Bu, Yuchao Duan, Fu Song, Zhe Zhao

In this work, we first conduct a comprehensive study of existing methods and tools for crafting.

Adversarial Attack

Paper
Code

Recommending what video to watch next: a multitask ranking system

no code implementations • RecSys 2019 • Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, Ed Chi

In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform.

Paper
Add Code

UER: An Open-Source Toolkit for Pre-training Models

1 code implementation • IJCNLP 2019 • Zhe Zhao, Hui Chen, Jinbin Zhang, Xin Zhao, Tao Liu, Wei Lu, Xi Chen, Haotang Deng, Qi Ju, Xiaoyong Du

Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks.

2,907

Paper
Code

Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

2 code implementations • ACM Conference on Recommender Systems 2019 • Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Ajit Kumthekar, Zhe Zhao, Li Wei, Ed Chi

However, batch loss is subject to sampling bias which could severely restrict model performance, particularly in the case of power-law distribution.

Recommendation Systems Retrieval

310

Paper
Code

K-BERT: Enabling Language Representation with Knowledge Graph

2 code implementations • arXiv 2019 • Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang

For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge.

Knowledge Graphs Sentence

1,946

Paper
Code

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

1 code implementation • 3 Nov 2019 • Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu

In this paper, we conduct the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical blackbox setting.

Adversarial Attack Speaker Recognition +2

Paper
Code

Understanding and Improving Knowledge Distillation

no code implementations • 10 Feb 2020 • Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain

Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget.

Knowledge Distillation Model Compression

Paper
Add Code

FastBERT: a Self-distilling BERT with Adaptive Inference Time

3 code implementations • ACL 2020 • Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Haotang Deng, Qi Ju

Pre-trained language models like BERT have proven to be highly performant.

603

Paper
Code

CLUE: A Chinese Language Understanding Evaluation Benchmark

3 code implementations • COLING 2020 • Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, Zhenzhong Lan

The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks.

General Classification Machine Reading Comprehension +4

3,819

Paper
Code

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation • 2 May 2020 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric, using extra training data)

Abstractive Text Summarization Dialogue Generation +6

Paper
Code

Learning-to-Rank with Partitioned Preference: Fast Estimation for the Plackett-Luce Model

no code implementations • 9 Jun 2020 • Jiaqi Ma, Xinyang Yi, Weijing Tang, Zhe Zhao, Lichan Hong, Ed H. Chi, Qiaozhu Mei

We investigate the Plackett-Luce (PL) model based listwise learning-to-rank (LTR) on data with partitioned preference, where a set of items are sliced into ordered and disjoint partitions, but the ranking of items within a partition is unknown.

Extreme Multi-Label Classification Learning-To-Rank +1

Paper
Add Code

HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections

no code implementations • 12 Jul 2020 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

The proposed approach is based on a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Small Towers Make Big Differences

no code implementations • 13 Aug 2020 • Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, Ed H. Chi

A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization.

Multi-Task Learning

Paper
Add Code

Measuring and Harnessing Transference in Multi-Task Learning

no code implementations • 29 Oct 2020 • Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Multi-task learning can leverage information learned by one task to benefit the training of other tasks.

Multi-Task Learning

Paper
Add Code

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

no code implementations • ICLR 2021 • Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Specifically, we propose a decomposable hypernetwork that learns grid-wise projections that help to specialize regions in weight matrices for different tasks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Information Transfer in Multi-Task Learning

no code implementations • 1 Jan 2021 • Chris Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Multi-task learning can leverage information learned by one task to benefit the training of other tasks.

Multi-Task Learning

Paper
Add Code

Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations • 1 Jan 2021 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

Paper
Add Code

BDD4BNN: A BDD-based Quantitative Analysis Framework for Binarized Neural Networks

no code implementations • 12 Mar 2021 • Yedi Zhang, Zhe Zhao, Guangke Chen, Fu Song, Taolue Chen

Verifying and explaining the behavior of neural networks is becoming increasingly important, especially when they are deployed in safety-critical applications.

Quantization

Paper
Add Code

Attack as Defense: Characterizing Adversarial Examples using Robustness

1 code implementation • 13 Mar 2021 • Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, Jun Sun

Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks.

Paper
Code

DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

3 code implementations • NeurIPS 2021 • Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi

State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example.

Multi-Task Learning Recommendation Systems

32,776

Paper
Code

The Benchmark Lottery

no code implementations • 14 Jul 2021 • Mostafa Dehghani, Yi Tay, Alexey A. Gritsenko, Zhe Zhao, Neil Houlsby, Fernando Diaz, Donald Metzler, Oriol Vinyals

The world of empirical machine learning (ML) strongly relies on benchmarks in order to determine the relative effectiveness of different algorithms and methods.

Benchmarking BIG-bench Machine Learning +3

Paper
Add Code

SEC4SR: A Security Analysis Platform for Speaker Recognition

1 code implementation • 4 Sep 2021 • Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu

To bridge this gap, we present SEC4SR, the first platform enabling researchers to systematically and comprehensively evaluate adversarial attacks and defenses in SR. SEC4SR incorporates 4 white-box and 2 black-box attacks, 24 defenses including our novel feature-level transformations.

Speaker Recognition

Paper
Code

Efficiently Identifying Task Groupings for Multi-Task Learning

1 code implementation • NeurIPS 2021 • Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

Multi-task learning can leverage information learned by one task to benefit the training of other tasks.

Multi-Task Learning

32,783

Paper
Code

Adversarial Attacks on ML Defense Models Competition

1 code implementation • 15 Oct 2021 • Yinpeng Dong, Qi-An Fu, Xiao Yang, Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu, Jiayu Tang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Hui Xue, Chao Li, Ye Liu, Qilong Zhang, Lianli Gao, Yunrui Yu, Xitong Gao, Zhe Zhao, Daquan Lin, Jiadong Lin, Chuanbiao Song, ZiHao Wang, Zhennan Wu, Yang Guo, Jiequan Cui, Xiaogang Xu, Pengguang Chen

Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years.

Adversarial Attack Adversarial Robustness +1

449

Paper
Code

Transformer Memory as a Differentiable Search Index

1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.

Information Retrieval Retrieval

148

Paper
Code

Semantic Matching from Different Perspectives

1 code implementation • 14 Feb 2022 • Weijie Liu, Tao Zhu, Weiquan Mao, Zhe Zhao, Weigang Guo, Xuefeng Yang, Qi Ju

In this paper, we pay attention to the issue which is usually overlooked, i. e., \textit{similarity should be determined from different perspectives}.

Sentence Text Matching +1

Paper
Code

HyperPrompt: Prompt-based Task-Conditioning of Transformers

no code implementations • 1 Mar 2022 • Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way.

Computational Efficiency Multi-Task Learning +1

Paper
Add Code

Improving Multi-Task Generalization via Regularizing Spurious Correlation

no code implementations • 19 May 2022 • Ziniu Hu, Zhe Zhao, Xinyang Yi, Tiansheng Yao, Lichan Hong, Yizhou Sun, Ed H. Chi

First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other.

Multi-Task Learning Representation Learning

Paper
Add Code

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

1 code implementation • 7 Jun 2022 • Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Feng Wang, Jiashui Wang

According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition.

Speaker Recognition speech-recognition +1

Paper
Code

AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems

no code implementations • 7 Jun 2022 • Guangke Chen, Zhe Zhao, Fu Song, Sen Chen, Lingling Fan, Yang Liu

Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks, raising significant security concerns in deploying SRSs.

Adversarial Attack Speaker Recognition

Paper
Add Code

Scalable Bayesian Inference for Detection and Deblending in Astronomical Images

1 code implementation • 12 Jul 2022 • Derek Hansen, Ismael Mendoza, Runjing Liu, Ziteng Pang, Zhe Zhao, Camille Avestruz, Jeffrey Regier

We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS).

Bayesian Inference Variational Inference

Paper
Code

CSL: A Large-scale Chinese Scientific Literature Dataset

1 code implementation • COLING 2022 • Yudong Li, Yuqing Zhang, Zhe Zhao, Linlin Shen, Weijie Liu, Weiquan Mao, HUI ZHANG

The CSL can serve as a Chinese corpus.

text-classification Text Classification

511

Paper
Code

Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching

1 code implementation • Findings (NAACL) 2022 • Kunbo Ding, Weijie Liu, Yuejian Fang, Zhe Zhao, Qi Ju, Xuefeng Yang

Previous studies have proved that cross-lingual knowledge distillation can significantly improve the performance of pre-trained models for cross-lingual similarity matching tasks.

Contrastive Learning Knowledge Distillation +3

Paper
Code

SAMP: A Model Inference Toolkit of Post-Training Quantization for Text Processing via Self-Adaptive Mixed-Precision

no code implementations • 19 Sep 2022 • Rong Tian, Zijing Zhao, Weijie Liu, Haoyan Liu, Weiquan Mao, Zhe Zhao, Kan Zhou

The latest industrial inference engines, such as FasterTransformer and TurboTransformers, have verified that half-precision floating point (FP16) and 8-bit integer (INT8) quantization can greatly improve model inference speed.

Quantization

Paper
Add Code

A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning

1 code implementation • COLING 2022 • Kunbo Ding, Weijie Liu, Yuejian Fang, Weiquan Mao, Zhe Zhao, Tao Zhu, Haoyan Liu, Rong Tian, Yiren Chen

Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries, which are expensive and impractical for low-resource languages.

text-classification Text Classification +3

Paper
Code

QVIP: An ILP-based Formal Verification Approach for Quantized Neural Networks

1 code implementation • 10 Dec 2022 • Yedi Zhang, Zhe Zhao, Fu Song, Min Zhang, Taolue Chen, Jun Sun

Experimental results on QNNs with different quantization bits confirm the effectiveness and efficiency of our approach, e. g., two orders of magnitude faster and able to solve more verification tasks in the same time limit than the state-of-the-art methods.

Quantization

Paper
Code

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

3 code implementations • 13 Dec 2022 • Zhe Zhao, Yudong Li, Cheng Hou, Jing Zhao, Rong Tian, Weijie Liu, Yiren Chen, Ningyuan Sun, Haoyan Liu, Weiquan Mao, Han Guo, Weigang Guo, Taiqiang Wu, Tao Zhu, Wenhang Shi, Chen Chen, Shan Huang, Sihong Chen, Liqun Liu, Feifei Li, Xiaoshuai Chen, Xingwu Sun, Zhanhui Kang, Xiaoyong Du, Linlin Shen, Kimmo Yan

The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework.

2,978

Paper
Code

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

no code implementations • 28 Feb 2023 • Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning.

Combinatorial Optimization Network Pruning

Paper
Add Code

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

no code implementations • 24 Mar 2023 • Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic.

Knowledge Distillation

Paper
Add Code

Weight-Inherited Distillation for Task-Agnostic BERT Compression

1 code implementation • 16 May 2023 • Taiqiang Wu, Cheng Hou, Shanshan Lao, Jiayi Li, Ngai Wong, Zhe Zhao, Yujiu Yang

Knowledge Distillation (KD) is a predominant approach for BERT compression.

Knowledge Distillation

Paper
Code

Recouple Event Field via Probabilistic Bias for Event Extraction

no code implementations • 19 May 2023 • Xingyu Bai, Taiqiang Wu, Han Guo, Zhe Zhao, Xuefeng Yang, Jiayi Li, Weijie Liu, Qi Ju, Weigang Guo, Yujiu Yang

Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs).

Event Extraction

Paper
Add Code

QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems

no code implementations • 23 May 2023 • Guangke Chen, Yedi Zhang, Zhe Zhao, Fu Song

Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS, thus still falling behind practical attacks against proprietary commercial APIs and voice-controlled devices.

Speaker Recognition

Paper
Add Code

COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

1 code implementation • 5 Jun 2023 • Shibal Ibrahim, Wenyu Chen, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

To deal with this challenge, we propose a novel, permutation-based local search method that can complement first-order methods in training any sparse gate, e. g., Hash routing, Top-k, DSelect-k, and COMET.

Language Modelling Recommendation Systems

Paper
Code

EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

no code implementations • 27 Jun 2023 • Haowei Li, Wenqing Yan, Du Liu, Long Qian, Yuxing Yang, Yihao Liu, Zhe Zhao, Hui Ding, Guangzhi Wang

The head surface is reconstructed using depth data for spatial registration, avoiding fixing tracking targets rigidly on the patient's skull.

Anatomy

Paper
Add Code

$\rm SP^3$: Enhancing Structured Pruning via PCA Projection

no code implementations • 31 Aug 2023 • Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen

Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency.

Paper
Add Code

Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning

1 code implementation • 20 Sep 2023 • Wenhang Shi, Yiren Chen, Zhe Zhao, Wei Lu, Kimmo Yan, Xiaoyong Du

Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance.

Continual Learning

Paper
Code

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

no code implementations • 4 Oct 2023 • Zhe Zhao, Qingyun Liu, Huan Gui, Bang An, Lichan Hong, Ed H. Chi

In this paper, we extend KD with an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models.

Knowledge Distillation Transfer Learning

Paper
Add Code

LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

no code implementations • 7 Feb 2024 • Yuji Roh, Qingyun Liu, Huan Gui, Zhe Yuan, Yujin Tang, Steven Euijong Whang, Liang Liu, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

By combining two complementing models, LEVI effectively suppresses problematic features in both the fine-tuning data and pre-trained model and preserves useful features for new tasks.

Paper
Add Code

Wisdom of Committee: Distilling from Foundation Model to Specialized Application Model

no code implementations • 21 Feb 2024 • Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

Further, to accommodate the dissimilarity among the teachers in the committee, we introduce DiverseDistill, which allows the student to understand the expertise of each teacher and extract task knowledge.

Knowledge Distillation Transfer Learning

Paper
Add Code

Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction

no code implementations • 28 Feb 2024 • Tong Liu, Yingjie Zhang, Zhe Zhao, Yinpeng Dong, Guozhu Meng, Kai Chen

We evaluate DRA across various open-source and close-source models, showcasing state-of-the-art jailbreak success rates and attack efficiency.

Reconstruction Attack

Paper
Add Code

KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation

1 code implementation • 22 Mar 2024 • Xindi Luo, Zequn Sun, Jing Zhao, Zhe Zhao, Wei Hu

Parameter-efficient finetuning (PEFT) is a key technique for adapting large language models (LLMs) to downstream tasks.

Knowledge Graph Embeddings Knowledge Graphs

Paper
Code

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

no code implementations • 3 Apr 2024 • Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs).

Knowledge Distillation

Paper
Add Code

Parameter-efficient Continual Learning Framework in Industrial Real-time Text Classification System

no code implementations • NAACL (ACL) 2022 • Tao Zhu, Zhe Zhao, Weijie Liu, Jiachi Liu, Yiren Chen, Weiquan Mao, Haoyan Liu, Kunbo Ding, Yudong Li, Xuefeng Yang

Catastrophic forgetting is a challenge for model deployment in industrial real-time systems, which requires the model to quickly master a new task without forgetting the old one.

Continual Learning text-classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.