Search Results for author: Xuanli He

Found 20 papers, 8 papers with code

Koala: An Index for Quantifying Overlaps with Pre-training Corpora

no code implementations26 Mar 2023 Thuy-Trang Vu, Xuanli He, Gholamreza Haffari, Ehsan Shareghi

In very recent years more attention has been placed on probing the role of pre-training data in Large Language Models (LLMs) downstream behaviour.

Memorization

Rethinking Round-Trip Translation for Machine Translation Evaluation

1 code implementation15 Sep 2022 Terry Yue Zhuo, Qiongkai Xu, Xuanli He, Trevor Cohn

Round-trip translation could be served as a clever and straightforward technique to alleviate the requirement of the parallel evaluation corpus.

Machine Translation Translation

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

1 code implementation5 Dec 2021 Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, Chenguang Wang

Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred billion word generations per day.

Document Summarization Image Captioning +3

Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning

no code implementations30 Oct 2021 Xuanli He, Iman Keivanloo, Yi Xu, Xiang He, Belinda Zeng, Santosh Rajagopalan, Trishul Chilimbi

To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT.

text-classification Text Classification

Generate, Annotate, and Learn: Generative Models Advance Self-Training and Knowledge Distillation

no code implementations29 Sep 2021 Xuanli He, Islam Nassar, Jamie Ryan Kiros, Gholamreza Haffari, Mohammad Norouzi

To obtain strong task-specific generative models, we either fine-tune a large language model (LLM) on inputs from specific tasks, or prompt a LLM with a few input examples to generate more unlabeled examples.

Few-Shot Learning Knowledge Distillation +1

Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection

1 code implementation EMNLP 2021 Thuy-Trang Vu, Xuanli He, Dinh Phung, Gholamreza Haffari

Once the in-domain data is detected by the classifier, the NMT model is then adapted to the new domain by jointly learning translation and domain discrimination tasks.

Contrastive Learning Machine Translation +3

Killing One Bird with Two Stones: Model Extraction and Attribute Inference Attacks against BERT-based APIs

no code implementations23 May 2021 Chen Chen, Xuanli He, Lingjuan Lyu, Fangzhao Wu

In this work, we bridge this gap by first presenting an effective model extraction attack, where the adversary can practically steal a BERT-based API (the target/victim model) by only querying a limited number of queries.

Inference Attack Model extraction +3

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

1 code implementation NAACL 2021 Xuanli He, Lingjuan Lyu, Qiongkai Xu, Lichao Sun

Finally, we investigate two defence strategies to protect the victim model and find that unless the performance of the victim model is sacrificed, both model ex-traction and adversarial transferability can effectively compromise the target models

Model extraction text-classification +2

EXPLORING VULNERABILITIES OF BERT-BASED APIS

no code implementations1 Jan 2021 Xuanli He, Lingjuan Lyu, Lichao Sun, Xiaojun Chang, Jun Zhao

We then demonstrate how the extracted model can be exploited to develop effective attribute inference attack to expose sensitive information of the training data.

Inference Attack Model extraction +3

Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness

2 code implementations Findings of the Association for Computational Linguistics 2020 Lingjuan Lyu, Xuanli He, Yitong Li

It has been demonstrated that hidden representation learned by a deep model can encode private information of the input, hence can be exploited to recover such information with reasonable accuracy.

Fairness

Towards Differentially Private Text Representations

no code implementations25 Jun 2020 Lingjuan Lyu, Yitong Li, Xuanli He, Tong Xiao

Most deep learning frameworks require users to pool their local data or model updates to a trusted server to train or maintain a global model.

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation

1 code implementation ACL 2020 Xuanli He, Gholamreza Haffari, Mohammad Norouzi

This paper introduces Dynamic Programming Encoding (DPE), a new segmentation algorithm for tokenizing sentences into subword units.

Machine Translation Translation

Sequence to Sequence Mixture Model for Diverse Machine Translation

no code implementations CONLL 2018 Xuanli He, Gholamreza Haffari, Mohammad Norouzi

In this paper, we develop a novel sequence to sequence mixture (S2SMIX) model that improves both translation diversity and quality by adopting a committee of specialized translation models rather than a single translation model.

Machine Translation Translation

Exploring Textual and Speech information in Dialogue Act Classification with Speaker Domain Adaptation

no code implementations ALTA 2018 Xuanli He, Quan Hung Tran, William Havard, Laurent Besacier, Ingrid Zukerman, Gholamreza Haffari

In spite of the recent success of Dialogue Act (DA) classification, the majority of prior works focus on text-based classification with oracle transcriptions, i. e. human transcriptions, instead of Automatic Speech Recognition (ASR)'s transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Word Representation Models for Morphologically Rich Languages in Neural Machine Translation

no code implementations WS 2017 Ekaterina Vylomova, Trevor Cohn, Xuanli He, Gholamreza Haffari

Dealing with the complex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation.

Hard Attention Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.