1 code implementation • 10 Oct 2023 • Xiulong Liu, Zhikang Dong, Peng Zhang
In recent years, there has been a growing emphasis on the intersection of audio, vision, and text modalities, driving forward the advancements in multimodal research.
no code implementations • 10 Oct 2023 • Zhikang Dong, Bin Chen, Xiulong Liu, Pawel Polak, Peng Zhang
The reasoning module, equipped with the power of Large Language Model (Vicuna-7B) and extended to multi-modal inputs, is able to provide reasonable explanation for the recommended music.
no code implementations • 6 Jun 2023 • Xiulong Liu, Sudipta Paul, Moitreya Chatterjee, Anoop Cherian
Audio-visual navigation of an agent towards locating an audio goal is a challenging task especially when the audio is sporadic or the environment is noisy.
no code implementations • 22 Feb 2023 • Zhizhi Yu, Di Jin, Cuiying Huo, Zhiqiang Wang, Xiulong Liu, Heng Qi, Jia Wu, Lingfei Wu
Graph neural networks for trust evaluation typically adopt a straightforward way such as one-hot or node2vec to comprehend node characteristics, which ignores the valuable semantic knowledge attached to nodes.
no code implementations • NeurIPS 2021 • Kun Su, Xiulong Liu, Eli Shlizerman
It is often the case that the experience of watching the video can be enhanced by adding a musical soundtrack that is in-sync with the rhythmic features of these activities.
no code implementations • 7 Dec 2020 • Kun Su, Xiulong Liu, Eli Shlizerman
We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting.
1 code implementation • NeurIPS 2020 • Kun Su, Xiulong Liu, Eli Shlizerman
We present a novel system that gets as an input video frames of a musician playing the piano and generates the music for that video.
1 code implementation • CVPR 2020 • Kun Su, Xiulong Liu, Eli Shlizerman
Given inputs of body keypoints sequences obtained during various movements, our system associates the sequences with actions.