Search Results for author: Xuebo Liu

Found 34 papers, 26 papers with code

Improving Attributed Text Generation of Large Language Models via Preference Learning

no code implementations • 27 Mar 2024 • Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang

Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content.

Misinformation Retrieval +2

Paper
Add Code

SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection

1 code implementation • 26 Feb 2024 • Liangxin Liu, Xuebo Liu, Derek F. Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself.

Paper
Code

DB-LLM: Accurate Dual-Binarization for Efficient LLMs

no code implementations • 19 Feb 2024 • Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, DaCheng Tao

Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment.

Binarization Computational Efficiency +1

Paper
Add Code

Revisiting Demonstration Selection Strategies in In-Context Learning

no code implementations • 22 Jan 2024 • Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent.

In-Context Learning

Paper
Add Code

Clustering Pseudo Language Family in Multilingual Translation Models with Fisher Information Matrix

1 code implementation • 5 Dec 2023 • Xinyu Ma, Xuebo Liu, Min Zhang

In multilingual translation research, the comprehension and utilization of language families are of paramount importance.

Clustering Translation

Paper
Code

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

1 code implementation • 17 Oct 2023 • Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan

For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos.

Benchmarking Language Modelling +4

Paper
Code

Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm

1 code implementation • 25 Jul 2023 • Hexuan Deng, Xin Zhang, Meishan Zhang, Xuebo Liu, Min Zhang

In this paper, we conduct a holistic exploration of the Universal Decompositional Semantic (UDS) Parsing.

Attribute Data Augmentation +1

Paper
Code

Pluggable Neural Machine Translation Models via Memory-augmented Adapters

1 code implementation • 12 Jul 2023 • Yuzhuang Xu, Shuo Wang, Peng Li, Xuebo Liu, Xiaolong Wang, Weidong Liu, Yang Liu

Although neural machine translation (NMT) models perform well in the general domain, it remains rather challenging to control their generation behavior to satisfy the requirement of different users.

Machine Translation NMT +1

Paper
Code

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

1 code implementation • 24 May 2023 • Qihuang Zhong, Liang Ding, Juhua Liu, Xuebo Liu, Min Zhang, Bo Du, DaCheng Tao

Token dropping is a recently-proposed strategy to speed up the pretraining of masked language models, such as BERT, by skipping the computation of a subset of the input tokens at several middle layers.

Paper
Code

Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization

1 code implementation • 3 May 2023 • Chi Seng Cheang, Hou Pong Chan, Derek F. Wong, Xuebo Liu, Zhaocong Li, Yanming Sun, Shudong Liu, Lidia S. Chao

Moreover, the knowledge memorized by PLMs may quickly become outdated, which affects the generalization performance of PLMs on future data.

Abstractive Text Summarization

Paper
Code

Towards Making the Most of ChatGPT for Machine Translation

1 code implementation • 24 Mar 2023 • Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, DaCheng Tao

We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community.

In-Context Learning Machine Translation +2

Paper
Code

ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation

1 code implementation • 8 Dec 2022 • Zhaocong Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang

In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can continuously transfer knowledge from the parent model during the training of the child model.

Low-Resource Neural Machine Translation NMT +2

Paper
Code

Improving Simultaneous Machine Translation with Monolingual Data

1 code implementation • 2 Dec 2022 • Hexuan Deng, Liang Ding, Xuebo Liu, Meishan Zhang, DaCheng Tao, Min Zhang

Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e. g., +3. 15 BLEU on En-Zh).

Hallucination Knowledge Distillation +4

Paper
Code

Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation with Stroke Sequence Modeling

1 code implementation • 23 Nov 2022 • Zhijun Wang, Xuebo Liu, Min Zhang

Existing research generally treats Chinese character as a minimum unit for representation.

Ranked #1 on Machine Translation on WMT2017 Chinese-English

Data Augmentation Machine Translation +1

Paper
Code

Revisiting Grammatical Error Correction Evaluation and Beyond

1 code implementation • 3 Nov 2022 • Peiyuan Gong, Xuebo Liu, Heyan Huang, Min Zhang

Pretraining-based (PT-based) automatic evaluation metrics (e. g., BERTScore and BARTScore) have been widely used in several sentence generation tasks (e. g., machine translation and text summarization) due to their better correlation with human judgments over traditional overlap-based methods.

Grammatical Error Correction Machine Translation +2

Paper
Code

BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation

no code implementations • 16 Apr 2022 • Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao

Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.

Grammatical Error Correction Machine Translation +1

Paper
Add Code

ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation

1 code implementation • ACL 2022 • Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, Jingbo Zhu, Xuebo Liu, Min Zhang

Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE.

Abstractive Text Summarization Machine Translation +1

Paper
Code

Variance-Aware Machine Translation Test Sets

1 code implementation • 7 Nov 2021 • Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

We release 70 small and discriminative test sets for machine translation (MT) evaluation called variance-aware test sets (VAT), covering 35 translation directions from WMT16 to WMT20 competitions.

Machine Translation Translation

Paper
Code

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

1 code implementation • Findings (EMNLP) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT).

Machine Translation NMT +2

Paper
Code

Difficulty-Aware Machine Translation Evaluation

1 code implementation • ACL 2021 • Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

The high-quality translation results produced by machine translation (MT) systems still pose a huge challenge for automatic evaluation.

Machine Translation Sentence +1

Paper
Code

On the Copying Behaviors of Pre-Training for Neural Machine Translation

1 code implementation • Findings (ACL) 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Shuming Shi, Zhaopeng Tu

In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding.

Machine Translation NMT +1

Paper
Code

Progressive Multi-Granularity Training for Non-Autoregressive Translation

no code implementations • Findings (ACL) 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

Non-autoregressive translation (NAT) significantly accelerates the inference process via predicting the entire target sequence.

Sentence Translation

Paper
Add Code

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

1 code implementation • ACL 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

Results demonstrate that the proposed approach can significantly and universally improve translation quality by reducing translation errors on low-frequency words.

Knowledge Distillation Translation

Paper
Code

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

433

Paper
Code

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation

1 code implementation • 3 Mar 2021 • Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

Meta-learning has been sufficiently validated to be beneficial for low-resource neural machine translation (NMT).

Domain Adaptation Low-Resource Neural Machine Translation +3

Paper
Code

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

1 code implementation • ICLR 2021 • Xuebo Liu, Longyue Wang, Derek F. Wong, Liang Ding, Lidia S. Chao, Zhaopeng Tu

Encoder layer fusion (EncoderFusion) is a technique to fuse all the encoder layers (instead of the uppermost layer) for sequence-to-sequence (Seq2Seq) models, which has proven effective on various NLP tasks.

Grammatical Error Correction Machine Translation +3

Paper
Code

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

no code implementations • ICLR 2021 • Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, DaCheng Tao, Zhaopeng Tu

To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data.

Knowledge Distillation Translation

Paper
Add Code

DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding

no code implementations • Findings of the Association for Computational Linguistics 2020 • Zilong Wang, Mingjie Zhan, Xuebo Liu, Ding Liang

The table detection and handcrafted features in previous works cannot apply to all forms because of their requirements on formats.

Optical Character Recognition (OCR) Table Detection

Paper
Add Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

Norm-Based Curriculum Learning for Neural Machine Translation

1 code implementation • ACL 2020 • Xuebo Liu, Houtim Lai, Derek F. Wong, Lidia S. Chao

We use the norm (aka length or module) of a word embedding as a measure of 1) the difficulty of the sentence, 2) the competence of the model, and 3) the weight of the sentence.

Machine Translation NMT +2

Paper
Code

Scene Text Image Super-Resolution in the Wild

4 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai

For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.

Image Super-Resolution

414

Paper
Code

Shared-Private Bilingual Word Embeddings for Neural Machine Translation

no code implementations • ACL 2019 • Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu

For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units.

Machine Translation NMT +3