We present an approach to learn voice-face representations from the talking face videos, without any identity labels.
Such sparse and loose matching requires contextual features capturing the geometric structure of the point clouds.
Yet, it is labor-intensive to accurately annotate large amount of audio data, and the dataset may contain noisy labels in the practical settings.
Rapid progress has been made in the field of reading comprehension and question answering, where several systems have achieved human parity in some simplified settings.
Ranked #5 on Question Answering on DROP Test
This paper considers the reading comprehension task in which multiple documents are given as input.
Open-domain targeted sentiment analysis aims to detect opinion targets along with their sentiment polarities from a sentence.
In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet.
Ranked #13 on Object Detection on PASCAL VOC 2007
Despite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models.
Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred.
Ranked #12 on Question Answering on SQuAD2.0 dev
Depthwise convolutions provide significant performance benefits owing to the reduction in both parameters and mult-adds.
For leveraging the waveform-based features and spectrogram-based features in a single model, we introduce two-phase method to fuse the different features.
Sound Audio and Speech Processing
Compact neural networks are inclined to exploit "sparsely-connected" convolutions such as depthwise convolution and group convolution for employment in mobile applications.
Experiments on ILSVRC 2012 and PASCAL VOC 2007 datasets demonstrate that FD-MobileNet consistently outperforms MobileNet and achieves comparable results with ShuffleNet under different computational budgets, for instance, surpassing MobileNet by 5. 5% on the ILSVRC 2012 top-1 accuracy and 3. 6% on the VOC 2007 mAP under a complexity of 12 MFLOPs.
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects.
Ranked #17 on Question Answering on TriviaQA
Most traditional methods struggle to balance the precision and computational burden when data and its number of classes increased.