PaddleSpeech is an open-source all-in-one speech toolkit.
In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency.
In simultaneous translation (SimulMT), the most widely used strategy is the wait-k policy thanks to its simplicity and effectiveness in balancing translation quality and latency.
In this study, we used a whole-brain model to show that heterogeneity in nodal excitability had a significant impact on seizure propagation in the networks, and compromised the prediction accuracy with structural connections.
Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation.
This paper investigates an unmanned aerial vehicle (UAV)-assisted wireless powered mobile-edge computing (MEC) system, where the UAV powers the mobile terminals by wireless power transfer (WPT) and provides computation service for them.
Models of phonemes, broad phonetic classes, and syllables all significantly outperform the utterance model, demonstrating that phonetic units are helpful and should be incorporated in speech emotion recognition.
Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach.
To this end, we propose a structure-aware interactive graph neural network (SIGN) which consists of two components: polar-inspired graph attention layers (PGAL) and pairwise interactive pooling (PiPool).
Simultaneous speech-to-text translation is widely useful in many scenarios.
Recently, representation learning for text and speech has successfully improved many language related tasks.
Furthermore, we have created our own implementation of the algorithm in which we have incorporated additional experiments in order to evaluate the algorithmʼs relevance in the scope of different dimensionality reduction techniques and differently structured data.
The hierarchical attentive aggregation can capture spatial dependencies among atoms, as well as fuse the position-enhanced information with the capability of discriminating multiple spatial relations among atoms.
Simultaneous translation, which performs translation concurrently with the source speech, is widely useful in many scenarios such as international conferences, negotiations, press releases, legal proceedings, and medicine.
More interestingly, our proposed models behave extremely well in small-sample learning when only a small training dataset is provided.
End-to-end Speech-to-text Translation (E2E-ST), which directly translates source language speech to target language text, is widely useful in practice, but traditional cascaded approaches (ASR+MT) often suffer from error propagation in the pipeline.
Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay.
Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay.
In particular, we aim to design an online computation offloading algorithm to maximize the network data processing capability subject to the long-term data queue stability and average power constraints.
Edge-computing Networking and Internet Architecture
We show that training on out-of-domain data and fine-tuning with as few as 4, 000 NEJM sentence pairs improve translation quality by 25. 3 (13. 4) BLEU for en$\to$zh (zh$\to$en) directions.
Extensive numerical results show both the CNN-based classifier and LSTM-based classifier extract similar radio features relating to modulation reference points.
Simultaneous translation has many important application scenarios and attracts much attention from both academia and industry recently.
Adaptive policies are better than fixed policies for simultaneous translation, since they can flexibly balance the tradeoff between translation quality and latency based on the current context information.
1 code implementation • 21 Apr 2020 • He Zhang, Liang Zhang, Ang Lin, Congcong Xu, Ziyu Li, Kaibo Liu, Boxiang Liu, Xiaopin Ma, Fanfan Zhao, Weiguo Yao, Hangwen Li, David H. Mathews, Yujian Zhang, Liang Huang
Messenger RNA (mRNA) vaccines are being used for COVID-19, but still suffer from the critical issue of mRNA instability and degradation, which is a major obstacle in the storage, distribution, and efficacy of the vaccine.
Deep learning has recently been applied to automatically classify the modulation categories of received radio signals without manual experience.
Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audios with high naturalness.
Beam search is universally used in full-sentence translation but its application to simultaneous translation remains non-trivial, where output words are committed on the fly.
To make it worse, the amount of social media parallel corpora is extremely limited.
Simultaneous translation is widely useful but remains one of the most difficult tasks in NLP.
Simultaneous translation, which translates sentences before they are finished, is useful in many scenarios but is notoriously difficult due to word-order differences.
Neural machine translation (NMT) is notoriously sensitive to noises, but noises are almost inevitable in practice.
This paper describes multimodal machine translation systems developed jointly by Oregon State University and Baidu Research for WMT 2018 Shared Task on multimodal translation.
In neural text generation such as neural machine translation, summarization, and image captioning, beam search is widely used to improve the output text quality.
Beam search is widely used in neural machine translation, and usually improves translation quality compared to greedy search.
Neural text generation, including neural machine translation, image captioning, and summarization, has been quite successful recently.
To tackle this problem, we propose in this paper a Deep Reinforcement learning-based Online Offloading (DROO) framework that implements a deep neural network to generate offloading decisions.
Networking and Internet Architecture
However, the minimal span parser of Stern et al (2017a) which holds the current state of the art accuracy is a chart parser running in cubic time, $O(n^3)$, which is too slow for longer sentences and for applications beyond sentence boundaries such as end-to-end discourse parsing and joint sentence boundary detection and parsing.
This paper describes Oregon State University's submissions to the shared WMT'17 task "multimodal translation task I".
In order to utilize the potential benefits from their correlations, we propose a jointly trained model for learning the two tasks simultaneously via Long Short-Term Memory (LSTM) networks.
We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features.
Parsing accuracy using efficient greedy transition systems has improved dramatically in recent years thanks to neural networks.
Recently, neural network approaches for parsing have largely automated the combination of individual features, but still rely on (often a larger number of) atomic features created from human linguistic intuition, and potentially omitting important global context.
In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies.
Semantic parsing has made significant progress, but most current semantic parsers are extremely slow (CKY-based) and rather primitive in representation.
Ranked #3 on Semantic Parsing on ATIS