no code implementations • IWSLT (ACL) 2022 • Yuhao Zhang, Canan Huang, Chen Xu, Xiaoqian Liu, Bei Li, Anxiang Ma, Tong Xiao, Jingbo Zhu
This paper describes NiuTrans’s submission to the IWSLT22 English-to-Chinese (En-Zh) offline speech translation task.
no code implementations • WMT (EMNLP) 2020 • Chi Hu, Hui Liu, Kai Feng, Chen Xu, Nuo Xu, Zefan Zhou, Shiqin Yan, Yingfeng Luo, Chenglong Wang, Xia Meng, Tong Xiao, Jingbo Zhu
This paper describes the submissions of the NiuTrans Team to the WMT 2020 Quality Estimation Shared Task.
no code implementations • WMT (EMNLP) 2020 • Yuhao Zhang, Ziyang Wang, Runzhe Cao, Binghao Wei, Weiqiao Shan, Shuhan Zhou, Abudurexiti Reheman, Tao Zhou, Xin Zeng, Laohu Wang, Yongyu Mu, Jingnan Zhang, Xiaoqian Liu, Xuanjun Zhou, Yinqiao Li, Bei Li, Tong Xiao, Jingbo Zhu
This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks.
no code implementations • WMT (EMNLP) 2021 • Chenglong Wang, Chi Hu, Yongyu Mu, Zhongxiang Yan, Siming Wu, Yimin Hu, Hang Cao, Bei Li, Ye Lin, Tong Xiao, Jingbo Zhu
This paper describes the NiuTrans system for the WMT21 translation efficiency task.
no code implementations • 5 Nov 2024 • Bei Li, Tong Zheng, Rui Wang, Jiahao Liu, Qingyan Guo, Junliang Guo, Xu Tan, Tong Xiao, Jingbo Zhu, Jingang Wang, Xunliang Cai
First, we introduce a predictor-corrector learning framework to minimize truncation errors, which consists of a high-order predictor and a multistep corrector.
1 code implementation • 7 Oct 2024 • Xinyu Liu, Runsong Zhao, Pengcheng Huang, Chunyang Xiao, Bei Li, Jingang Wang, Tong Xiao, Jingbo Zhu
We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models.
no code implementations • 6 Oct 2024 • Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Qiaozhi He, Murun Yang, Tong Xiao, Chunliang Zhang, Tongran Liu, Jingbo Zhu
These preference pairs are typically used to encode human preferences into a single numerical value through reward modeling, which acts as a reward signal during reinforcement learning from human feedback (RLHF).
no code implementations • 24 Sep 2024 • Xiaoqian Liu, Yangfan Du, Jianjin Wang, Yuan Ge, Chen Xu, Tong Xiao, Guocheng Chen, Jingbo Zhu
Simultaneous Speech Translation (SimulST) involves generating target language text while continuously processing streaming speech input, presenting significant real-time challenges.
no code implementations • 22 Sep 2024 • Runsong Zhao, Pengcheng Huang, Xinyu Liu, Chunyang Xiao, Tong Xiao, Jingbo Zhu
Compressing Transformer inputs into compressd tokens allows running LLMs with improved speed and cost efficiency.
no code implementations • 30 Aug 2024 • Junhao Ruan, Abudukeyumu Abudula, Xinyu Liu, Bei Li, Yinqiao Li, Chenglong Wang, Yuchun Fan, Yuan Ge, Tong Xiao, Jingbo Zhu
In our work, we extend the critique of NTP, highlighting its limitation also due to training with a narrow objective: the prediction of a sub-optimal one-hot distribution.
1 code implementation • 22 Aug 2024 • Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu
However, these techniques face the difficulty arising from the scarcity of visual preference data, which is required to train a visual reward model (VRM).
no code implementations • 4 Aug 2024 • Yongyu Mu, Yuzhang Wu, Yuchun Fan, Chenglong Wang, Hengyu Li, Qiaozhi He, Murun Yang, Tong Xiao, Jingbo Zhu
Our implementations of LiSA achieve a 6X compression of Q and K, with maximum throughput improvements of 19. 5% for LLaMA3-8B and 32. 3% for LLaMA2-7B.
no code implementations • 18 Jul 2024 • Pengcheng Huang, Yongyu Mu, Yuzhang Wu, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu
Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations.
1 code implementation • 22 Jun 2024 • Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang
Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets.
1 code implementation • 21 Jun 2024 • Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu
Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences.
no code implementations • 11 Jun 2024 • Chi Hu, Yimin Hu, Hang Cao, Tong Xiao, Jingbo Zhu
Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging.
no code implementations • 1 Jun 2024 • Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, Yingfeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu
Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input.
1 code implementation • 1 Apr 2024 • Hang Zhou, Chenglong Wang, Yimin Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu
Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs. However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model. This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.
no code implementations • 1 Apr 2024 • Kaiyan Chang, Songcheng Xu, Chenglong Wang, Yingfeng Luo, Xiaoqian Liu, Tong Xiao, Jingbo Zhu
Prompting is a mainstream paradigm for adapting large language models to specific natural language processing tasks without modifying internal parameters.
no code implementations • 19 Mar 2024 • Chi Hu, Yuan Ge, Xiangnan Ma, Hang Cao, Qiang Li, Yonghua Yang, Tong Xiao, Jingbo Zhu
Our experiments across 11 arithmetic and commonsense reasoning tasks show that RankPrompt significantly enhances the reasoning performance of ChatGPT and GPT-4, with improvements of up to 13%.
2 code implementations • 14 Mar 2024 • Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang, Jingbo Zhu
In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
1 code implementation • 28 Feb 2024 • Yuan Ge, Yilun Liu, Chi Hu, Weibin Meng, Shimin Tao, Xiaofeng Zhao, Hongxia Ma, Li Zhang, Boxing Chen, Hao Yang, Bei Li, Tong Xiao, Jingbo Zhu
Given the significant resource allocation required for training and evaluating models, it is advantageous to have an efficient method for selecting high-quality IT data.
no code implementations • 18 Dec 2023 • Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, Jingbo Zhu
End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model.
1 code implementation • 29 Nov 2023 • Tong Xiao, Jingbo Zhu
Transformers have dominated empirical machine learning models of natural language processing.
1 code implementation • 7 Nov 2023 • Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu
Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning.
1 code implementation • 26 Oct 2023 • Yuxin Zuo, Bei Li, Chuanhao Lv, Tong Zheng, Tong Xiao, Jingbo Zhu
This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete.
1 code implementation • 23 Oct 2023 • Tong Zheng, Bei Li, Huiwen Bao, Jiale Wang, Weiqiao Shan, Tong Xiao, Jingbo Zhu
In this work, we emphasize the importance of hidden dimensions in designing lightweight FFNs, a factor often overlooked in previous architectures.
Ranked #23 on Machine Translation on WMT2014 English-German
1 code implementation • 21 Sep 2023 • Chen Xu, Xiaoqian Liu, Erfeng He, Yuhao Zhang, Qianqian Dong, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang
In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task.
no code implementations • 8 Aug 2023 • Chenglong Wang, Hang Zhou, Kaiyan Chang, Tongran Liu, Chunliang Zhang, Quan Du, Tong Xiao, Jingbo Zhu
Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters.
2 code implementations • 4 Aug 2023 • Chenglong Wang, Hang Zhou, Yimin Hu, Yifu Huo, Bei Li, Tongran Liu, Tong Xiao, Jingbo Zhu
Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.
no code implementations • 24 Jun 2023 • Xinyu Liu, Yan Ding, Kaikai An, Chunyang Xiao, Pranava Madhyastha, Tong Xiao, Jingbo Zhu
While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2
no code implementations • 20 Jun 2023 • Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu
Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly.
no code implementations • 15 Jun 2023 • Ye Lin, Mingxuan Wang, Zhexi Zhang, Xiaohui Wang, Tong Xiao, Jingbo Zhu
Inspired by this, we tune the training hyperparameters related to model convergence in a targeted manner.
1 code implementation • 13 Jun 2023 • Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu
Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST).
1 code implementation • 7 Jun 2023 • Ye Lin, Xiaohui Wang, Zhexi Zhang, Mingxuan Wang, Tong Xiao, Jingbo Zhu
With the co-design of model and engine, compared with the existing system, we speed up 47. 0x and save 99. 5% of memory with only 11. 6% loss of BLEU.
no code implementations • 31 May 2023 • Bei Li, Rui Wang, Junliang Guo, Kaitao Song, Xu Tan, Hany Hassan, Arul Menezes, Tong Xiao, Jiang Bian, Jingbo Zhu
Large language models (LLMs) have shown remarkable success across a wide range of natural language generation tasks, where proper prompt designs make great impacts.
1 code implementation • 27 May 2023 • Chen Xu, Yuhao Zhang, Chengbo Jiao, Xiaoqian Liu, Chi Hu, Xin Zeng, Tong Xiao, Anxiang Ma, Huizhen Wang, Jingbo Zhu
While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights.
1 code implementation • 27 May 2023 • Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu
Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency.
no code implementations • 27 May 2023 • Yongyu Mu, Abudurexiti Reheman, Zhiquan Cao, Yuchun Fan, Bei Li, Yinqiao Li, Tong Xiao, Chunliang Zhang, Jingbo Zhu
Using translation memories (TMs) as prompts is a promising approach to in-context learning of machine translation models.
no code implementations • 26 May 2023 • Bei Li, Yi Jing, Xu Tan, Zhen Xing, Tong Xiao, Jingbo Zhu
Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems.
no code implementations • 10 May 2023 • Ye Lin, Shuhan Zhou, Yanyang Li, Anxiang Ma, Tong Xiao, Jingbo Zhu
For years the model performance in machine learning obeyed a power-law relationship with the model size.
no code implementations • 1 Feb 2023 • Chenglong Wang, Yi Lu, Yongyu Mu, Yimin Hu, Tong Xiao, Jingbo Zhu
Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model.
no code implementations • 13 Jan 2023 • Abudurexiti Reheman, Tao Zhou, Yingfeng Luo, Di Yang, Tong Xiao, Jingbo Zhu
Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community.
2 code implementations • 20 Dec 2022 • Tong Zheng, Bei Li, Huiwen Bao, Tong Xiao, Jingbo Zhu
Two principles: the complementary principle and the consensus principle are widely acknowledged in the literature of multi-view learning.
no code implementations • 4 Dec 2022 • Yuhao Zhang, Chen Xu, Bojie Hu, Chunliang Zhang, Tong Xiao, Jingbo Zhu
We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems.
1 code implementation • 19 Jun 2022 • Bei Li, Tong Zheng, Yi Jing, Chengbo Jiao, Tong Xiao, Jingbo Zhu
In this work, we define those scales in different linguistic units, including sub-words, words and phrases.
2 code implementations • ACL 2022 • Bei Li, Chuanhao Lv, Zefan Zhou, Tao Zhou, Tong Xiao, Anxiang Ma, Jingbo Zhu
Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.
1 code implementation • ACL 2022 • Bei Li, Quan Du, Tao Zhou, Yi Jing, Shuhan Zhou, Xin Zeng, Tong Xiao, Jingbo Zhu, Xuebo Liu, Min Zhang
Inspired by this, we design a new architecture, {\it ODE Transformer}, which is analogous to the Runge-Kutta method that is well motivated in ODE.
no code implementations • WMT (EMNLP) 2021 • Shuhan Zhou, Tao Zhou, Binghao Wei, Yingfeng Luo, Yongyu Mu, Zefan Zhou, Chenglong Wang, Xuanjun Zhou, Chuanhao Lv, Yi Jing, Laohu Wang, Jingnan Zhang, Canan Huang, Zhongxiang Yan, Chi Hu, Bei Li, Tong Xiao, Jingbo Zhu
This paper describes NiuTrans neural machine translation systems of the WMT 2021 news translation tasks.
1 code implementation • 16 Sep 2021 • Chenglong Wang, Chi Hu, Yongyu Mu, Zhongxiang Yan, Siming Wu, Minyi Hu, Hang Cao, Bei Li, Ye Lin, Tong Xiao, Jingbo Zhu
This paper describes the NiuTrans system for the WMT21 translation efficiency task (http://statmt. org/wmt21/efficiency-task. html).
2 code implementations • WS 2020 • Chi Hu, Bei Li, Ye Lin, Yinqiao Li, Yanyang Li, Chenglong Wang, Tong Xiao, Jingbo Zhu
This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task.
no code implementations • EMNLP 2021 • Chi Hu, Chenglong Wang, Xiangnan Ma, Xia Meng, Yinqiao Li, Tong Xiao, Jingbo Zhu, Changliang Li
This paper addresses the efficiency challenge of Neural Architecture Search (NAS) by formulating the task as a ranking problem.
1 code implementation • Findings (EMNLP) 2021 • Ye Lin, Yanyang Li, Tong Xiao, Jingbo Zhu
Improving Transformer efficiency has become increasingly attractive recently.
no code implementations • ACL (IWSLT) 2021 • Chen Xu, Xiaoqian Liu, Xiaowen Liu, Laohu Wang, Canan Huang, Tong Xiao, Jingbo Zhu
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task, which translates from the English audio to German text directly without intermediate transcription.
no code implementations • ACL 2021 • Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu
To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 6 Apr 2021 • Bei Li, Quan Du, Tao Zhou, Shuhan Zhou, Xin Zeng, Tong Xiao, Jingbo Zhu
We show that a residual block of layers in Transformer can be described as a higher-order solution to ODEs.
no code implementations • 3 Jan 2021 • Yanyang Li, Ye Lin, Tong Xiao, Jingbo Zhu
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness.
1 code implementation • 27 Dec 2020 • Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu
We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model.
no code implementations • COLING 2020 • Chen Xu, Bojie Hu, Yufan Jiang, Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu
This eases training by highlighting easy samples that the current model has enough competence to learn.
Low Resource Neural Machine Translation Low-Resource Neural Machine Translation +3
no code implementations • COLING 2020 • Yanyang Li, Yingfeng Luo, Ye Lin, Quan Du, Huizhen Wang, ShuJian Huang, Tong Xiao, Jingbo Zhu
Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13. 64~55. 53% between English and four distant languages, i. e., Chinese, Japanese, Vietnamese and Thai.
no code implementations • COLING 2020 • Qiang Wang, Changliang Li, Yue Zhang, Tong Xiao, Jingbo Zhu
In this way, in addition to the topmost encoder layer (referred to as the primary view), we also incorporate an intermediate encoder layer as the auxiliary view.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Qiang Wang, Tong Xiao, Jingbo Zhu
The standard neural machine translation model can only decode with the same depth configuration as training.
1 code implementation • EMNLP 2020 • Bei Li, Ziyang Wang, Hui Liu, Yufan Jiang, Quan Du, Tong Xiao, Huizhen Wang, Jingbo Zhu
We find that stacking layers is helpful in improving the representation ability of NMT models and adjacent layers perform similarly.
no code implementations • ACL 2021 • Ye Lin, Yanyang Li, Ziyang Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu
Inspired by this, we investigate methods of model acceleration and compression in another line of research.
no code implementations • 17 Sep 2020 • Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu
8-bit integer inference, as a promising direction in reducing both the latency and storage of deep neural networks, has made great progress recently.
1 code implementation • ACL 2020 • Bei Li, Hui Liu, Ziyang Wang, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li
In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence.
no code implementations • ACL 2020 • Yinqiao Li, Chi Hu, Yuhao Zhang, Nuo Xu, Yufan Jiang, Tong Xiao, Jingbo Zhu, Tongran Liu, Changliang Li
Neural architecture search (NAS) has advanced significantly in recent years but most NAS systems restrict search to learning architectures of a recurrent or convolutional cell.
1 code implementation • COLING 2018 • Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li, Jingbo Zhu
In this paper, we propose a multi-layer representation fusion (MLRF) approach to fusing stacked layers.
1 code implementation • 16 Feb 2020 • Yanyang Li, Qiang Wang, Tong Xiao, Tongran Liu, Jingbo Zhu
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to the explicit modelling of the interaction between any two source and target units, e. g., alignment, the recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency.
1 code implementation • IJCNLP 2019 • Yufan Jiang, Chi Hu, Tong Xiao, Chunliang Zhang, Jingbo Zhu
In this paper, we study differentiable neural architecture search (NAS) methods for natural language processing.
Ranked #1 on Language Modelling on PTB Diagnostic ECG Database
no code implementations • WS 2019 • Bei Li, Yinqiao Li, Chen Xu, Ye Lin, Jiqiang Liu, Hui Liu, Ziyang Wang, Yuhao Zhang, Nuo Xu, Zeyang Wang, Kai Feng, Hexuan Chen, Tengbo Liu, Yanyang Li, Qiang Wang, Tong Xiao, Jingbo Zhu
We participated in 13 translation directions, including 11 supervised tasks, namely EN↔{ZH, DE, RU, KK, LT}, GU→EN and the unsupervised DE↔CS sub-track.
no code implementations • 26 Jun 2019 • Tong Xiao, Yinqiao Li, Jingbo Zhu, Zhengtao Yu, Tongran Liu
This is even 16 times faster than the baseline with no use of the attention cache.
no code implementations • ACL 2019 • Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu
For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units.
2 code implementations • ACL 2019 • Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao
Transformer is the state-of-the-art model in recent machine translation evaluations.
no code implementations • WS 2018 • Qiang Wang, Bei Li, Jiqiang Liu, Bojian Jiang, Zheyang Zhang, Yinqiao Li, Ye Lin, Tong Xiao, Jingbo Zhu
This paper describes the submission of the NiuTrans neural machine translation system for the WMT 2018 Chinese ↔ English news translation tasks.
no code implementations • ACL 2018 • Yanyang Li, Tong Xiao, Yinqiao Li, Qiang Wang, Changming Xu, Jingbo Zhu
We offer a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT).
no code implementations • EMNLP 2017 • Baosong Yang, Derek F. Wong, Tong Xiao, Lidia S. Chao, Jingbo Zhu
This paper proposes a hierarchical attentional neural translation model which focuses on enhancing source-side hierarchical representations by covering both local and global semantic information using a bidirectional tree-based encoder.