This paper describes the machine translation systems proposed by the University of Technology Sydney Natural Language Processing (UTS_NLP) team for the WMT20 English-Basque biomedical translation tasks.
1 code implementation • • Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann, Lana Yeganova
Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities.
Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models.
To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective.
Neural machine translation models are often biased toward the limited translation references seen during training.
Document-level machine translation focuses on the translation of entire documents from a source to a target language.
With the rapid growth of the scientific literature, manually selecting appropriate citations for a paper is becoming increasingly challenging and time-consuming.
This is a serious issue for low-resource language pairs and many specialized translation domains that are inherently limited in the amount of available supervised data.
Regularization of neural machine translation is still a significant problem, especially in low-resource settings.
Cluster labeling is the assignment of representative labels to clusters obtained from the organization of a document collection.
Automatic post-editing (APE) systems aim to correct the systematic errors made by machine translators.
Nevertheless, the embeddings need to be retrained over datasets that are adequate for the domain, in order to adequately cover the domain-specific vocabulary.
Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources.
Automated extraction of concepts from patient clinical records is an essential facilitator of clinical research.
Extraction of concepts present in patient clinical records is an essential step in clinical research.
In the proposed approach, the action class is predicted by a structural model(learnt by Latent Structural SVM) based on measurements from the image superpixels and their latent classes.
This infinite adaptive online approach is capable of segmenting and classifying the sequential data over unlimited number of classes, while meeting the memory and delay constraints of streaming contexts.
In this paper, we propose a non-parametric conditional factor regression (NCFR)model for domains with high-dimensional input and response.