To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music.
Concretely, we encode users' behavior sequences and initialize the cluster centers (latent intents) as learnable neurons.
In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0. 82, 0. 87, 0. 78, and 0. 94 in consistency, rhythmicity, structure, and overall quality, respectively.
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies.
Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally.
In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias.
This paper develops automatic song translation (AST) for tonal languages and addresses the unique challenge of aligning words' tones with melody of a song in addition to conveying the original meaning.
To our knowledge, S3T is the first method combining the Swin Transformer with a self-supervised learning method for music classification.
In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e. g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i. e., the system consists of a lyric-to-template module and a template-to-melody module).
Considering that there is a large amount of ASR training data, a straightforward method is to leverage ASR data to enhance ALT training.
In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.
In this paper, our work makes an attempt to fuse the subject individual EDA features and the external evoked music features.