While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks.
Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life.
We also provide insights into the attributes of sound event representations that enable such efficient information transfer.
Magnetic resonance imaging (MRI) noninvasively provides critical information about how human brain structures develop across stages of life.
Then, a Wasserstein coupled dictionary, containing multiple pairs of counterpart graph keys with each key corresponding to one modality, is constructed for further feature learning.
In this system, 940nm infrared light is mainly used for audio signal transmission, and an handstand pendulum based on PID is used to control the angle and stability of infrared light emission.
For label diffusion of instance-awareness in graph convolution, rather than using the statistical label correlation alone, an image-dependent label correlation matrix (LCM), fusing both the statistical LCM and an individual one of each image instance, is constructed for graph inference on labels to inject adaptive information of label-awareness into the learned features of the model.
Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary.
Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality.
1 code implementation • • Yimin Wang, Qi Li, Li-Juan Liu, Zhi Zhou, Zongcai Ruan, Lingsheng Kong, Yaoyao Li, Yun Wang, Ning Zhong, Renjie Chai, Xiangfeng Luo, Yike Guo, Michael Hawrylycz, Qingming Luo, Zhongze Gu, Wei Xie, Hongkui Zeng, Hanchuan Peng
Neuron morphology is recognized as a key determinant of cell type, yet the quantitative profiling of a mammalian neuron’s complete three-dimensional (3-D) morphology remains arduous when the neuron has complex arborization and long projection.
2 code implementations • 2 Aug 2019 • Kun Han, Junwen Chen, HUI ZHANG, Haiyang Xu, Yiping Peng, Yun Wang, Ning Ding, Hui Deng, Yonghu Gao, Tingwei Guo, Yi Zhang, Yahao He, Baochang Ma, Yu-Long Zhou, Kangli Zhang, Chao Liu, Ying Lyu, Chenxi Wang, Cheng Gong, Yunbo Wang, Wei Zou, Hui Song, Xiangang Li
In this paper we present DELTA, a deep learning based language technology platform.
Ranked #3 on Text Classification on Yahoo! Answers
This paper compares five types of pooling functions both theoretically and experimentally, with special focus on their performance of localization.
Sound Audio and Speech Processing
Research on sound event detection (SED) with weak labeling has mostly focused on presence/absence labeling, which provides no temporal information at all about the event occurrences.
Sound Audio and Speech Processing
1 code implementation • 5 Feb 2018 • Yun Wang, Massimo Robberto, Mark Dickinson, Lynne A. Hillenbrand, Wesley Fraser, Peter Behroozi, Jarle Brinchmann, Chia-Hsun Chuang, Andrea Cimatti, Robert Content, Emanuele Daddi, Henry C. Ferguson, Christopher Hirata, Michael J. Hudson, J. Davy Kirkpatrick, Alvaro Orsi, Alice Shapley, Mario Ballardini, Robert Barkhouser, James Bartlett, Robert Benjamin, Ranga Chary, Charlie Conroy, Megan Donahue, Olivier Dore, Peter Eisenhardt, Karl Glazebrook, George Helou, Sangeeta Malhotra, Lauro Moscardini, Jeffrey A. Newman, Zoran Ninkov, Michael Ressler, James Rhoads, Jason Rhodes, Daniel Scolnic, Stephen Smee, Francesco Valentino, Risa H. Wechsler
ATLAS Probe will lead to transformative science over the entire range of astrophysics: from galaxy evolution to the dark Universe, from Solar System objects to the dusty regions of the Milky Way.
Instrumentation and Methods for Astrophysics Cosmology and Nongalactic Astrophysics Earth and Planetary Astrophysics Astrophysics of Galaxies Solar and Stellar Astrophysics
Recently dictionary screening has been proposed as an effective way to improve the computational efficiency of solving the lasso problem, which is one of the most commonly used method for learning sparse representations.
For a given target vector, dictionary screening quickly identifies a subset of dictionary columns that will receive zero weight in a solution of the corresponding lasso problem.
In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks.