As neural language models grow in effectiveness, they are increasingly being applied in real-world settings.
As a result, over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data.
In recent years, large neural networks for natural language generation (NLG) have made leaps and bounds in their ability to generate fluent text.
For open-ended language generation tasks such as storytelling and dialogue, choosing the right decoding algorithm is critical to controlling the tradeoff between generation quality and diversity.
Importantly, though we discuss potential modifications, this document is not meant as a formal research paper, but instead is a response to some of the privacy characteristics of direct contact tracing apps like TraceTogether and an early-stage Request for Comments to the community.
Cryptography and Security
Recent advancements in neural language modelling make it possible to rapidly generate vast amounts of human-sounding text.
While conditional language models have greatly improved in their ability to output high-quality natural language, many NLP applications benefit from being able to generate a diverse set of candidate sequences.
To facilitate research on the task, we introduce a large-scale multilingual corpus of images, each labeled with the word it represents.
Our results demonstrate that this representation is useful for learning motion in the general setting where explicit labels are difficult to obtain.