This work explores the application of Lambda networks, an alternative framework for capturing long-range interactions without attention, for the keyword spotting task.
Keyword spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
On the other hand, Multilingual Neural Machine Translation (MultiNMT) approaches rely on higher-quality and more massive data sets.
Sounds are an important source of information on our daily interactions with objects.
End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.