no code implementations • 17 Aug 2023 • Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang
Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation.
no code implementations • 13 Apr 2023 • Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang
The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures.
no code implementations • 18 Mar 2023 • Jiayang Bai, Zhen He, Shan Yang, Jie Guo, Zhenyu Chen, Yan Zhang, Yanwen Guo
Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama.
no code implementations • 7 Feb 2023 • Wangbin Ding, Lei LI, Junyi Qiu, Sihan Wang, Liqin Huang, Yinyin Chen, Shan Yang, Xiahai Zhuang
For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visualize myocardial scar and edema of MI, respectively.
no code implementations • 6 Nov 2022 • Junyi Qiu, Lei LI, Sihan Wang, Ke Zhang, Yinyin Chen, Shan Yang, Xiahai Zhuang
We therefore conducted extensive experiments to investigate the performance of the proposed method in dealing with such complex combinations of different CMR sequences.
1 code implementation • 11 Aug 2022 • Jakob Wasserthal, Hanns-Christian Breit, Manfred T. Meyer, Maurice Pradella, Daniel Hinck, Alexander W. Sauter, Tobias Heye, Daniel Boll, Joshy Cyriac, Shan Yang, Michael Bach, Martin Segeroth
The model significantly outperformed another publicly available segmentation model on a separate dataset (Dice score, 0. 932 versus 0. 871, respectively).
no code implementations • 15 Jun 2022 • Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su
The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.
no code implementations • 18 Feb 2022 • Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng
Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.
no code implementations • 13 Feb 2022 • Jiayang Bai, Jie Guo, Chenchen Wan, Zhenyu Chen, Zhen He, Shan Yang, Piaopiao Yu, Yan Zhang, Yanwen Guo
At its core is a new lighting model (dubbed DSGLight) based on depth-augmented Spherical Gaussians (SG) and a Graph Convolutional Network (GCN) that infers the new lighting representation from a single LDR image of limited field-of-view.
no code implementations • 29 Dec 2021 • Hai Su, Shan Yang, Shuqing Zhang, Songsen Yu
Color image steganography based on deep learning is the art of hiding information in the color image.
no code implementations • 8 Sep 2021 • Songxiang Liu, Shan Yang, Dan Su, Dong Yu
The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.
1 code implementation • NeurIPS 2021 • Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio.
Ranked #1 on
Action Classification
on Kinetics-Sounds
no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su
Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.
no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Lei Xie, Dan Su
Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.
no code implementations • CVPR 2021 • Fengmin Shi, Jie Guo, Haonan Zhang, Shan Yang, Xiying Wang, Yanwen Guo
We demonstrate that local geometry has a greater impact on the sound than the global geometry and offers more cues in material recognition.
no code implementations • 17 Jun 2021 • Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, Yi Liu
We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos.
1 code implementation • ACL 2021 • Shan Yang, Yongfei Zhang, Guanglin Niu, Qinghua Zhao, ShiLiang Pu
Few-shot relation extraction (FSRE) is of great importance in long-tail distribution problem, especially in special domain with low-resource data.
1 code implementation • ICCV 2021 • RuiLong Li, Shan Yang, David A. Ross, Angjoo Kanazawa
We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion conditioned on music.
Ranked #2 on
Motion Synthesis
on BRACE
1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu
In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.
9 code implementations • Interspeech2020 2020 • Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie
In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.
Sound Audio and Speech Processing
no code implementations • 28 Apr 2020 • Shan Yang, Yuxuan Wang, Lei Xie
As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.
no code implementations • ICCV 2017 • Shan Yang, Junbang Liang, Ming C. Lin
To extract information about the cloth, our method characterizes both the motion space and the visual appearance of the cloth geometry.
4 code implementations • 6 Jul 2017 • Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li
In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).
Sound
no code implementations • 3 Aug 2016 • Shan Yang, Tanya Ambert, Zherong Pan, Ke Wang, Licheng Yu, Tamara Berg, Ming C. Lin
Most recent garment capturing techniques rely on acquiring multiple views of clothing, which may not always be readily available, especially in the case of pre-existing photographs from the web.
4 code implementations • 31 Jul 2016 • Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg
Humans refer to objects in their environments all the time, especially in dialogue with other people.