Search Results for author: Hany Hassan Awadalla

Found 20 papers, 8 papers with code

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

1 code implementation20 Sep 2023 Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on.

Language Modelling Machine Translation +1

Task-Based MoE for Multitask Multilingual Machine Translation

no code implementations30 Aug 2023 Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications.

Machine Translation Translation

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

no code implementations16 Aug 2023 Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements.


Do GPTs Produce Less Literal Translations?

1 code implementation26 May 2023 Vikas Raunak, Arul Menezes, Matt Post, Hany Hassan Awadalla

On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs.

Machine Translation NMT +3

ResiDual: Transformer with Dual Residual Connections

1 code implementation28 Apr 2023 Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

In this paper, we propose ResiDual, a novel Transformer architecture with Pre-Post-LN (PPLN), which fuses the connections in Post-LN and Pre-LN together and inherits their advantages while avoids their limitations.

Machine Translation

How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

1 code implementation18 Feb 2023 Amr Hendy, Mohamed Abdelrehim, Amr Sharaf, Vikas Raunak, Mohamed Gabr, Hitokazu Matsushita, Young Jin Kim, Mohamed Afify, Hany Hassan Awadalla

In this paper, we present a comprehensive evaluation of GPT models for machine translation, covering various aspects such as quality of different GPT models in comparison with state-of-the-art research and commercial systems, effect of prompting strategies, robustness towards domain shifts and document-level translation.

Machine Translation Text Generation +1

Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production

no code implementations18 Nov 2022 Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla

Mixture of Experts (MoE) models with conditional execution of sparsely activated layers have enabled training models with a much larger number of parameters.

Machine Translation

Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

no code implementations11 Aug 2022 Muhammad ElNokrashy, Amr Hendy, Mohamed Maher, Mohamed Afify, Hany Hassan Awadalla

In a WMT evaluation campaign, From-English performance improves by 4. 17 and 2. 87 BLEU points, in the zero-shot setting, and when direct data is available for training, respectively.


Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations

no code implementations30 Jun 2022 Akiko Eriguchi, Shufang Xie, Tao Qin, Hany Hassan Awadalla

Multilingual Neural Machine Translation (MNMT) enables one system to translate sentences from multiple source languages to multiple target languages, greatly reducing deployment costs compared with conventional bilingual systems.

Machine Translation Translation

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

no code implementations28 May 2022 Rui Liu, Young Jin Kim, Alexandre Muzio, Hany Hassan Awadalla

Sparsely activated transformers, such as Mixture of Experts (MoE), have received great interest due to their outrageous scaling capability which enables dramatical increases in model size without significant increases in computational cost.

Machine Translation

Scalable and Efficient MoE Training for Multitask Multilingual Models

1 code implementation22 Sep 2021 Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andres Felipe Cruz Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla

By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy.

Machine Translation Text Generation

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

2 code implementations25 Jun 2021 Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei

While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).

Abstractive Text Summarization Machine Translation +5

Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions

no code implementations WMT (EMNLP) 2020 Muhammad N. ElNokrashy, Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, Ahmed Tawfik, Hany Hassan Awadalla

For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvement over baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.

FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

2 code implementations EMNLP (sustainlp) 2020 Young Jin Kim, Hany Hassan Awadalla

In this paper, we present FastFormers, a set of recipes to achieve efficient inference-time performance for Transformer-based models on various NLU tasks.

Knowledge Distillation Natural Language Understanding

Multi-task Learning for Multilingual Neural Machine Translation

no code implementations EMNLP 2020 Yiren Wang, ChengXiang Zhai, Hany Hassan Awadalla

In this work, we propose a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.

Cross-Lingual Transfer Denoising +4

Detecting Interrogative Utterances with Recurrent Neural Networks

no code implementations3 Nov 2015 Junyoung Chung, Jacob Devlin, Hany Hassan Awadalla

In this paper, we explore different neural network architectures that can predict if a speaker of a given utterance is asking a question or making a statement.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.