Search Results for author: Jingbo Zhu

Found 92 papers, 32 papers with code

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

no code implementations5 Nov 2024 Bei Li, Tong Zheng, Rui Wang, Jiahao Liu, Qingyan Guo, Junliang Guo, Xu Tan, Tong Xiao, Jingbo Zhu, Jingang Wang, Xunliang Cai

First, we introduce a predictor-corrector learning framework to minimize truncation errors, which consists of a high-order predictor and a multistep corrector.

Abstractive Text Summarization Language Modelling +3

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

1 code implementation7 Oct 2024 Xinyu Liu, Runsong Zhao, Pengcheng Huang, Chunyang Xiao, Bei Li, Jingang Wang, Tong Xiao, Jingbo Zhu

We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models.

Memorization

LRHP: Learning Representations for Human Preferences via Preference Pairs

no code implementations6 Oct 2024 Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu

These preference pairs are typically used to encode human preferences into a single numerical value through reward modeling, which acts as a reward signal during reinforcement learning from human feedback (RLHF).

Representation Learning

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation

no code implementations24 Sep 2024 Xiaoqian Liu, Yangfan Du, Jianjin Wang, Yuan Ge, Chen Xu, Tong Xiao, Guocheng Chen, Jingbo Zhu

Simultaneous Speech Translation (SimulST) involves generating target language text while continuously processing streaming speech input, presenting significant real-time challenges.

Multi-Task Learning

More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss

no code implementations22 Sep 2024 Runsong Zhao, Pengcheng Huang, Xinyu Liu, Chunyang Xiao, Tong Xiao, Jingbo Zhu

Compressing Transformer inputs into compressd tokens allows running LLMs with improved speed and cost efficiency.

Position

NDP: Next Distribution Prediction as a More Broad Target

no code implementations30 Aug 2024 Junhao Ruan, Abudukeyumu Abudula, Xinyu Liu, Bei Li, Yinqiao Li, Chenglong Wang, Yuchun Fan, Yuan Ge, Tong Xiao, Jingbo Zhu

In our work, we extend the critique of NTP, highlighting its limitation also due to training with a narrow objective: the prediction of a sub-optimal one-hot distribution.

Data Compression Domain Adaptation +1

RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

1 code implementation22 Aug 2024 Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM).

Hallucination

Cross-layer Attention Sharing for Large Language Models

no code implementations4 Aug 2024 Yongyu Mu, Yuzhang Wu, Yuchun Fan, Chenglong Wang, Hengyu Li, Qiaozhi He, Murun Yang, Tong Xiao, Jingbo Zhu

Our implementations of LiSA achieve a 6X compression of Q and K, with maximum throughput improvements of 19. 5% for LLaMA3-8B and 32. 3% for LLaMA2-7B.

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

no code implementations18 Jul 2024 Pengcheng Huang, Yongyu Mu, Yuzhang Wu, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu

Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations.

Machine Translation NMT +1

Revisiting Interpolation Augmentation for Speech-to-Text Generation

1 code implementation22 Jun 2024 Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets.

Text Generation

Hybrid Alignment Training for Large Language Models

1 code implementation21 Jun 2024 Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu

Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences.

Instruction Following

Teaching Language Models to Self-Improve by Learning from Language Feedback

no code implementations11 Jun 2024 Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, Jingbo Zhu

Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging.

Language Modelling

Recent Advances in End-to-End Simultaneous Speech Translation

no code implementations1 Jun 2024 Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, Yingfeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input.

Translation

Prior Constraints-based Reward Model Training for Aligning Large Language Models

1 code implementation1 Apr 2024 Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs. However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model. This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.

reinforcement-learning Reinforcement Learning

Efficient Prompting Methods for Large Language Models: A Survey

no code implementations1 Apr 2024 Kaiyan Chang, Songcheng Xu, Chenglong Wang, Yingfeng Luo, Xiaoqian Liu, Tong Xiao, Jingbo Zhu

Prompting is a mainstream paradigm for adapting large language models to specific natural language processing tasks without modifying internal parameters.

In-Context Learning Survey

RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners

no code implementations19 Mar 2024 Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, Jingbo Zhu

Our experiments across 11 arithmetic and commonsense reasoning tasks show that RankPrompt significantly enhances the reasoning performance of ChatGPT and GPT-4, with improvements of up to 13%.

Revealing the Parallel Multilingual Learning within Large Language Models

2 code implementations14 Mar 2024 Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang, Jingbo Zhu

In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.

In-Context Learning

Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation

1 code implementation28 Feb 2024 Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Boxing Chen, Hao Yang, Bei Li, Tong Xiao, Jingbo Zhu

Given the significant resource allocation required for training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data.

Clustering Diversity

Introduction to Transformers: an NLP Perspective

1 code implementation29 Nov 2023 Tong Xiao, Jingbo Zhu

Transformers have dominated empirical machine learning models of natural language processing.

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

1 code implementation7 Nov 2023 Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning.

Multi-Task Learning

Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs

1 code implementation26 Oct 2023 Yuxin Zuo, Bei Li, Chuanhao Lv, Tong Zheng, Tong Xiao, Jingbo Zhu

This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete.

Attribute Multimodal Machine Translation +2

PartialFormer: Modeling Part Instead of Whole for Machine Translation

1 code implementation23 Oct 2023 Tong Zheng, Bei Li, Huiwen Bao, Jiale Wang, Weiqiao Shan, Tong Xiao, Jingbo Zhu

In this work, we emphasize the importance of hidden dimensions in designing lightweight FFNs, a factor often overlooked in previous architectures.

Abstractive Text Summarization Machine Translation +1

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

1 code implementation21 Sep 2023 Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task.

speech-recognition Speech Recognition +1

Learning Evaluation Models from Large Language Models for Sequence Generation

no code implementations8 Aug 2023 Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu

Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters.

Machine Translation Style Transfer +1

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

2 code implementations4 Aug 2023 Chenglong Wang, Hang Zhou, Yimin Hu, Yifu Huo, Bei Li, Tongran Liu, Tong Xiao, Jingbo Zhu

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +6

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

no code implementations24 Jun 2023 Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo Zhu

While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Recent Advances in Direct Speech-to-text Translation

no code implementations20 Jun 2023 Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu

Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly.

Data Augmentation Decoder +3

Understanding Parameter Sharing in Transformers

no code implementations15 Jun 2023 Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu

Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner.

Machine Translation

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

1 code implementation13 Jun 2023 Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST).

MobileNMT: Enabling Translation in 15MB and 30ms

1 code implementation7 Jun 2023 Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu

With the co-design of model and engine, compared with the existing system, we speed up 47. 0x and save 99. 5% of memory with only 11. 6% loss of BLEU.

Model Compression NMT +2

Deliberate then Generate: Enhanced Prompting Framework for Text Generation

no code implementations31 May 2023 Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, Jingbo Zhu

Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts.

Text Generation

Bridging the Granularity Gap for Acoustic Modeling

1 code implementation27 May 2023 Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, Jingbo Zhu

While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights.

speech-recognition Speech Recognition

CTC-based Non-autoregressive Speech Translation

1 code implementation27 May 2023 Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency.

Translation

TranSFormer: Slow-Fast Transformer for Machine Translation

no code implementations26 May 2023 Bei Li, Yi Jing, Xu Tan, Zhen Xing, Tong Xiao, Jingbo Zhu

Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems.

Machine Translation Translation

Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

no code implementations10 May 2023 Ye Lin, Shuhan Zhou, Yanyang Li, Anxiang Ma, Tong Xiao, Jingbo Zhu

For years the model performance in machine learning obeyed a power-law relationship with the model size.

Machine Translation

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

no code implementations1 Feb 2023 Chenglong Wang, Yi Lu, Yongyu Mu, Yimin Hu, Tong Xiao, Jingbo Zhu

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model.

Knowledge Distillation

Prompting Neural Machine Translation with Translation Memories

no code implementations13 Jan 2023 Abudurexiti Reheman, Tao Zhou, Yingfeng Luo, Di Yang, Tong Xiao, Jingbo Zhu

Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community.

Machine Translation NMT +1

EIT: Enhanced Interactive Transformer

2 code implementations20 Dec 2022 Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, Jingbo Zhu

Two principles: the complementary principle and the consensus principle are widely acknowledged in the literature of multi-view learning.

Abstractive Text Summarization Language Modelling +3

Learning Multiscale Transformer Models for Sequence Generation

1 code implementation19 Jun 2022 Bei Li, Tong Zheng, Yi Jing, Chengbo Jiao, Tong Xiao, Jingbo Zhu

In this work, we define those scales in different linguistic units, including sub-words, words and phrases.

On Vision Features in Multimodal Machine Translation

2 code implementations ACL 2022 Bei Li, Chuanhao Lv, Zefan Zhou, Tao Zhou, Tong Xiao, Anxiang Ma, Jingbo Zhu

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.

Image Captioning Multimodal Machine Translation +3

The NiuTrans System for the WMT21 Efficiency Task

1 code implementation16 Sep 2021 Chenglong Wang, Chi Hu, Yongyu Mu, Zhongxiang Yan, Siming Wu, Minyi Hu, Hang Cao, Bei Li, Ye Lin, Tong Xiao, Jingbo Zhu

This paper describes the NiuTrans system for the WMT21 translation efficiency task (http://statmt. org/wmt21/efficiency-task. html).

Knowledge Distillation Translation

The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline Task

no code implementations ACL (IWSLT) 2021 Chen Xu, Xiaoqian Liu, Xiaowen Liu, Laohu Wang, Canan Huang, Tong Xiao, Jingbo Zhu

This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription.

Position Translation

Stacked Acoustic-and-Textual Encoding: Integrating the Pre-trained Models into Speech Translation Encoders

no code implementations ACL 2021 Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu

To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

An Efficient Transformer Decoder with Compressed Sub-layers

no code implementations3 Jan 2021 Yanyang Li, Ye Lin, Tong Xiao, Jingbo Zhu

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness.

Decoder Machine Translation +1

Learning Light-Weight Translation Models from Deep Transformer

1 code implementation27 Dec 2020 Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu

We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model.

Knowledge Distillation Machine Translation +2

A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

no code implementations COLING 2020 Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, ShuJian Huang, Tong Xiao, Jingbo Zhu

Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13. 64~55. 53% between English and four distant languages, i. e., Chinese, Japanese, Vietnamese and Thai.

Dimensionality Reduction Self-Learning

Layer-Wise Multi-View Learning for Neural Machine Translation

no code implementations COLING 2020 Qiang Wang, Changliang Li, Yue Zhang, Tong Xiao, Jingbo Zhu

In this way, in addition to the topmost encoder layer (referred to as the primary view), we also incorporate an intermediate encoder layer as the auxiliary view.

Decoder Machine Translation +3

Shallow-to-Deep Training for Neural Machine Translation

1 code implementation EMNLP 2020 Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu

We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly.

Machine Translation NMT +2

Towards Fully 8-bit Integer Inference for the Transformer Model

no code implementations17 Sep 2020 Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu

8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently.

Language Modelling Quantization +1

Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation

1 code implementation ACL 2020 Bei Li, Hui Liu, Ziyang Wang, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li

In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence.

Decoder fr-en +4

Learning Architectures from an Extended Search Space for Language Modeling

no code implementations ACL 2020 Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li

Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell.

Chunking Language Modelling +4

Neural Machine Translation with Joint Representation

1 code implementation16 Feb 2020 Yanyang Li, Qiang Wang, Tong Xiao, Tongran Liu, Jingbo Zhu

Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e. g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency.

Decoder Machine Translation +2

Shared-Private Bilingual Word Embeddings for Neural Machine Translation

no code implementations ACL 2019 Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu

For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units.

Machine Translation NMT +3

The NiuTrans Machine Translation System for WMT18

no code implementations WS 2018 Qiang Wang, Bei Li, Jiqiang Liu, Bojian Jiang, Zheyang Zhang, Yinqiao Li, Ye Lin, Tong Xiao, Jingbo Zhu

This paper describes the submission of the NiuTrans neural machine translation system for the WMT 2018 Chinese ↔ English news translation tasks.

Machine Translation Translation

A Simple and Effective Approach to Coverage-Aware Neural Machine Translation

no code implementations ACL 2018 Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, Jingbo Zhu

We offer a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT).

Machine Translation NMT +1

Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation

no code implementations EMNLP 2017 Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, Jingbo Zhu

This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.