Search Results for author: Kejun Zhang

Found 19 papers, 10 papers with code

A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives

no code implementations1 Apr 2025 Shuyu Li, Shulei Ji, ZiHao Wang, Songruoyao Wu, Jiaxing Yu, Kejun Zhang

Multi-modal music generation, using multiple modalities like text, images, and video alongside musical scores and audio as guidance, is an emerging research area with broad applications.

Music Generation

A Comprehensive Survey on Generative AI for Video-to-Music Generation

no code implementations18 Feb 2025 Shulei Ji, Songruoyao Wu, ZiHao Wang, Shuyu Li, Kejun Zhang

The burgeoning growth of video-to-music generation can be attributed to the ascendancy of multimodal generative models.

Music Generation

SongGLM: Lyric-to-Melody Generation with 2D Alignment Encoding and Multi-Task Pre-Training

no code implementations24 Dec 2024 Jiaxing Yu, Xinda Wu, Yunfei Xu, Tieyao Zhang, Songruoyao Wu, Le Ma, Kejun Zhang

In this paper, we propose SongGLM, a lyric-to-melody generation system that leverages 2D alignment encoding and multi-task pre-training based on the General Language Model (GLM) to guarantee the alignment and harmony between lyrics and melodies.

Risk of Text Backdoor Attacks Under Dataset Distillation

1 code implementation Information Security Conference 2024 Kejun Zhang, Yutuo Song, Shaofei Xu, Pengcheng Li, Rong Qian, Pengzhi Han, Lingyun Xu

We propose a framework for backdoor attacks in the context of text dataset distillation, termed Text Backdoor Attack under Dataset Distillation (TBADD).

Backdoor Attack Dataset Distillation +2

MetaBGM: Dynamic Soundtrack Transformation For Continuous Multi-Scene Experiences With Ambient Awareness And Personalization

no code implementations5 Sep 2024 Haoxuan Liu, ZiHao Wang, HaoRong Hong, Youwei Feng, Jiaxin Yu, Han Diao, Yunfei Xu, Kejun Zhang

This paper introduces MetaBGM, a groundbreaking framework for generating background music that adapts to dynamic scenes and real-time user interactions.

Audio Generation

SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement

1 code implementation10 Jul 2024 ZiHao Wang, Le Ma, Yongsheng Feng, Xin Pan, Yuhang Jin, Kejun Zhang

Singing voice conversion (SVC) aims to convert a singer's voice to another singer's from a reference audio while keeping the original semantics.

Disentanglement Voice Conversion

MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation

no code implementations3 Jul 2024 ZiHao Wang, Haoxuan Liu, Jiaxing Yu, Tao Zhang, Yan Liu, Kejun Zhang

This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model, with the ultimate goal of creating songs that accurately satisfy human auditory expectations and structurally align with musical norms.

Descriptive Rhythm

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

1 code implementation15 Feb 2024 ZiHao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang

To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music.

Information Retrieval Music Information Retrieval

End-to-end Learnable Clustering for Intent Learning in Recommendation

2 code implementations11 Jan 2024 Yue Liu, Shihao Zhu, Jun Xia, Yingwei Ma, Jian Ma, Xinwang Liu, Shengju Yu, Kejun Zhang, Wenliang Zhong

Concretely, we encode user behavior sequences and initialize the cluster centers (latent intents) as learnable neurons.

Clustering Contrastive Learning +2

MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation

1 code implementation19 Sep 2023 Xinda Wu, Zhijie Huang, Kejun Zhang, Jiaxing Yu, Xu Tan, Tieyao Zhang, ZiHao Wang, Lingyun Sun

In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0. 82, 0. 87, 0. 78, and 0. 94 in consistency, rhythmicity, structure, and overall quality, respectively.

Rhythm

REMAST: Real-time Emotion-based Music Arrangement with Soft Transition

1 code implementation14 May 2023 ZiHao Wang, Le Ma, Chen Zhang, Bo Han, Yunfei Xu, Yikai Wang, Xinyi Chen, HaoRong Hong, Wenbo Liu, Xinda Wu, Kejun Zhang

Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies.

WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning

1 code implementation11 Jan 2023 Kejun Zhang, Xinda Wu, Tieyao Zhang, Zhijie Huang, Xu Tan, Qihao Liang, Songruoyao Wu, Lingyun Sun

Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally.

Music Generation

SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias

no code implementations13 Sep 2022 ZiHao Wang, Qihao Liang, Kejun Zhang, Yuxing Wang, Chen Zhang, Pengfei Yu, Yongsheng Feng, Wenbo Liu, Yikai Wang, Yuntai Bao, Yiheng Yang

In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias.

Automatic Song Translation for Tonal Languages

no code implementations Findings (ACL) 2022 Fenfei Guo, Chen Zhang, Zhirui Zhang, Qixin He, Kejun Zhang, Jun Xie, Jordan Boyd-Graber

This paper develops automatic song translation (AST) for tonal languages and addresses the unique challenge of aligning words' tones with melody of a song in addition to conveying the original meaning.

Translation

S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification

1 code implementation21 Feb 2022 Hang Zhao, Chen Zhang, Belei Zhu, Zejun Ma, Kejun Zhang

To our knowledge, S3T is the first method combining the Swin Transformer with a self-supervised learning method for music classification.

Classification Data Augmentation +5

TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method

1 code implementation20 Sep 2021 Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu

In this paper, we develop TeleMelody, a two-stage lyric-to-melody generation system with music template (e. g., tonality, chord progression, rhythm pattern, and cadence) to bridge the gap between lyrics and melodies (i. e., the system consists of a lyric-to-template module and a template-to-melody module).

Rhythm

Denoising Text to Speech with Frame-Level Noise Modeling

no code implementations17 Dec 2020 Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu

In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.

Denoising text-to-speech +1

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

1 code implementation22 Aug 2020 Guanghao Yin, Shou-qian Sun, Dian Yu, Dejian Li, Kejun Zhang

In this paper, our work makes an attempt to fuse the subject individual EDA features and the external evoked music features.

Emotion Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.