33 papers with code • 1 benchmarks • 1 datasets

This task has no description! Would you like to contribute one?


Most implemented papers

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

google/edward2 ICLR 2020

We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs.

STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework

PaddlePaddle/PaddleNLP ACL 2019

Simultaneous translation, which translates sentences before they are finished, is useful in many scenarios but is notoriously difficult due to word-order differences.

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

nyu-dl/dl4mt-cdec ACL 2016

The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation.

Fully Character-Level Neural Machine Translation without Explicit Segmentation

nyu-dl/dl4mt-c2c TACL 2017

We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of BLEU score and human judgment.

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

nikhil-iyer-97/wide-minima-density-hypothesis 9 Mar 2020

Several papers argue that wide minima generalize better than narrow minima.

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

fe1ixxu/BiBERT EMNLP 2021

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems.

Hint-Based Training for Non-Autoregressive Machine Translation

zhuohan123/hint-nart IJCNLP 2019

Due to the unparallelizable nature of the autoregressive factorization, AutoRegressive Translation (ART) models have to generate tokens sequentially during decoding and thus suffer from high inference latency.

ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation

lifu-tu/ENGINE ACL 2020

We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model.

Iterative Refinement in the Continuous Space for Non-Autoregressive Neural Machine Translation

zomux/lanmt-ebm EMNLP 2020

Given a continuous latent variable model for machine translation (Shu et al., 2020), we train an inference network to approximate the gradient of the marginal log probability of the target sentence, using only the latent variable as input.

Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses

ntunlp/pronoun-finetuning EMNLP 2020

Our sentence-level model shows a 0. 5 BLEU improvement on both the WMT14 and the IWSLT13 De-En testsets, while our contextual model achieves the best results, improving from 31. 81 to 32 BLEU on WMT14 De-En testset, and from 32. 10 to 33. 13 on the IWSLT13 De-En testset, with corresponding improvements in pronoun translation.