Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved.
Despite its better bio-plausibility, goal-driven spiking neural network (SNN) has not achieved applicable performance for classifying biological spike trains, and showed little bio-functional similarities compared to traditional artificial neural networks.
Self-attention mechanism is applied within windows for capturing temporal important information locally in a fine-grained way.
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses.
Ranked #1 on Speech Emotion Recognition on LSSED
To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.
Ranked #5 on Referring Expression Segmentation on A2D Sentences
A new unsupervised learning method of depth and ego-motion using multiple masks from monocular video is proposed in this paper.
Speech emotion recognition is a vital contributor to the next generation of human-computer interaction (HCI).
Ranked #3 on Speech Emotion Recognition on LSSED
Visual localization is a crucial component in the application of mobile robot and autonomous driving.
no code implementations • 30 Mar 2020 • Xiyi Wei, Yu-Tian Xiao, Jian Wang, Rui Chen, Wei zhang, Yue Yang, Daojun Lv, Chao Qin, Di Gu, Bo Zhang, Weidong Chen, Jianquan Hou, Ninghong Song, Guohua Zeng, Shancheng Ren
Objective: To conduct a meta-analysis of current studies that examined sex differences in severity and mortality in patients with COVID-19, and identify potential mechanisms underpinning these differences.
To retrieve a target image from the database, the query image is first encoded using the encoder belonging to the query domain to obtain a domain-invariant feature vector.
In this work, we propose to train CNNs from images annotated with multiple tags, to enhance the quality of visual representation of the trained CNN model.
In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model.
Saliency computation has become a popular research field for many applications due to the useful information provided by saliency maps.