We propose a soft-label sorting network along with the counting network, which sorts the given images by their crowd numbers.
Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis.
In this paper, we propose an editing test to evaluate users' editing experience of music generation models in a systematic way.
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize.
We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT).
The proposed eSUSAN extracts the univalue segment assimilating nucleus from the circle kernel based on the similarity across timestamps and distinguishes corner events by the number of pixels in the nucleus area.
Inspired by the strong searching capability of neural architecture search (NAS) in CNN, this paper proposes Graph Neural Architecture Search (GNAS) with novel-designed search space.
Its major difference from the traditional image style transfer problem is that the style information is provided by music rather than images.
Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage.
In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection.
Ranked #1 on RGB Salient Object Detection on ISTD
We propose the multi-layered cepstrum (MLC) method to estimate multiple fundamental frequencies (MF0) of a signal under challenging contamination such as high-pass filter noise.
Our experiments on both vocal melody extraction and general melody extraction validate the effectiveness of the proposed model.
A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing.
Sound Audio and Speech Processing
This paper presents a new approach in understanding how deep neural networks (DNNs) work by applying homomorphic signal processing techniques.