Search Results for author: Peiwen Sun

Found 11 papers, 5 papers with code

OmniAudio: Generating Spatial Audio from 360-Degree Video

no code implementations21 Apr 2025 Huadai Liu, Tianyi Luo, Qikai Jiang, Kaicheng Luo, Peiwen Sun, Jialei Wan, Rongjie Huang, Qian Chen, Wen Wang, Xiangtai Li, Shiliang Zhang, Zhijie Yan, Zhou Zhao, Wei Xue

To generate spatial audio from 360-degree video, we propose a novel framework OmniAudio, which leverages self-supervised pre-training using both spatial audio data (in FOA format) and large-scale non-spatial data.

Audio Generation

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

no code implementations14 Oct 2024 Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo

However, when it comes to stereo audio generation, the soundscapes often have a complex scene of multiple objects and directions.

Audio Generation multimodal generation

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

1 code implementation30 Aug 2024 Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

By enhancing the semantic ability of the codec, X-Codec significantly reduces WER in speech synthesis tasks and extends these benefits to non-speech applications, including music and sound generation.

Audio Compression Audio Generation +6

Unveiling and Mitigating Bias in Audio Visual Segmentation

no code implementations23 Jul 2024 Peiwen Sun, Honggang Zhang, Di Hu

For audio priming bias, to enhance audio sensitivity to different intensities and semantics, a perception module specifically for audio perceives the latent semantic information and incorporates information into a limited set of queries, namely active queries.

Attribute Visual Grounding

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

1 code implementation16 Jul 2024 Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu

Audio-Visual Segmentation (AVS) aims to achieve pixel-level localization of sound sources in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS, further pursues semantic understanding of audio-visual scenes.

Decoder global-optimization +2

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

no code implementations15 Jul 2024 Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues.

Segmentation

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

1 code implementation15 Jul 2024 Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu

The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual space using audio cues.

Language Modelling Large Language Model +3

FlashSpeech: Efficient Zero-Shot Speech Synthesis

1 code implementation23 Apr 2024 Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Wei Xue, Qifeng Liu, Yike Guo

The generation processes of FlashSpeech can be achieved efficiently with one or two sampling steps while maintaining high audio quality and high similarity to the audio prompt for zero-shot speech generation.

Rhythm Speech Synthesis +1

FusionINN: Decomposable Image Fusion for Brain Tumor Monitoring

1 code implementation23 Mar 2024 Nishant Kumar, Ziyan Tao, Jaikirat Singh, Yang Li, Peiwen Sun, Binghui Zhao, Stefan Gumhold

Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image.

Denoising Diagnostic +1

Learning Audio-Visual embedding for Person Verification in the Wild

no code implementations9 Sep 2022 Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu

It has already been observed that audio-visual embedding is more robust than uni-modality embedding for person verification.

Face Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.