no code implementations • 30 Aug 2024 • Ashraya K. Indrakanti, Jakob Wasserthal, Martin Segeroth, Shan Yang, Victor Schulze-Zachau, Joshy Cyriac, Michael Bach, Marios Psychogios, Matthias A. Mutke
Purpose: To develop an open-source nnU-Net-based AI model for combined detection and segmentation of unruptured intracranial aneurysms (UICA) in 3D TOF-MRI, and compare models trained on datasets with aneurysm-like differential diagnoses.
1 code implementation • 29 May 2024 • Tugba Akinci D'Antonoli, Lucas K. Berger, Ashraya K. Indrakanti, Nathan Vishwanathan, Jakob Weiß, Matthias Jung, Zeynep Berkarda, Alexander Rau, Marco Reisert, Thomas Küstner, Alexandra Walter, Elmar M. Merkle, Martin Segeroth, Joshy Cyriac, Shan Yang, Jakob Wasserthal
Materials and Methods: In this study we extended the capabilities of TotalSegmentator to MR images.
no code implementations • 21 Feb 2024 • Zhendong Xiao, Changhao Chen, Shan Yang, Wu Wei
Camera relocalization is pivotal in computer vision, with applications in AR, drones, robotics, and autonomous driving.
no code implementations • 24 Jan 2024 • Shan Yang, Yongfei Zhang
Secondly, we propose a multi-task learning-based synchronization module to ensure that the visual encoder of the MLLM is trained synchronously with the ReID task.
no code implementations • 13 Dec 2023 • Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu Lou, Ming Lin, Shan Yang
In this work, we propose an efficient Video-Language Alignment (ViLA) network.
Ranked #1 on Video Question Answering on STAR Benchmark
no code implementations • 6 Oct 2023 • Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou
Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings.
Ranked #12 on Monocular Depth Estimation on NYU-Depth V2
no code implementations • 17 Aug 2023 • Xijun Wang, Anqi Liang, Junbang Liang, Ming Lin, Yu Lou, Shan Yang
Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation.
no code implementations • 13 Apr 2023 • Akshay Gadi Patil, Yiming Qian, Shan Yang, Brian Jackson, Eric Bennett, Hao Zhang
The dominant majority of 3D models that appear in gaming, VR/AR, and those we use to train geometric deep learning algorithms are incomplete, since they are modeled as surface meshes and missing their interior structures.
no code implementations • 18 Mar 2023 • Jiayang Bai, Zhen He, Shan Yang, Jie Guo, Zhenyu Chen, Yan Zhang, Yanwen Guo
Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama.
no code implementations • 7 Feb 2023 • Wangbin Ding, Lei LI, Junyi Qiu, Sihan Wang, Liqin Huang, Yinyin Chen, Shan Yang, Xiahai Zhuang
For instance, balanced steady-state free precession cine sequences present clear anatomical boundaries, while late gadolinium enhancement and T2-weighted CMR sequences visualize myocardial scar and edema of MI, respectively.
no code implementations • 6 Nov 2022 • Junyi Qiu, Lei LI, Sihan Wang, Ke Zhang, Yinyin Chen, Shan Yang, Xiahai Zhuang
We therefore conducted extensive experiments to investigate the performance of the proposed method in dealing with such complex combinations of different CMR sequences.
1 code implementation • 11 Aug 2022 • Jakob Wasserthal, Hanns-Christian Breit, Manfred T. Meyer, Maurice Pradella, Daniel Hinck, Alexander W. Sauter, Tobias Heye, Daniel Boll, Joshy Cyriac, Shan Yang, Michael Bach, Martin Segeroth
The model significantly outperformed another publicly available segmentation model on a separate dataset (Dice score, 0. 932 versus 0. 871, respectively).
no code implementations • 15 Jun 2022 • Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su
The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech.
no code implementations • 18 Feb 2022 • Disong Wang, Shan Yang, Dan Su, Xunying Liu, Dong Yu, Helen Meng
Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system.
no code implementations • 13 Feb 2022 • Jiayang Bai, Jie Guo, Chenchen Wan, Zhenyu Chen, Zhen He, Shan Yang, Piaopiao Yu, Yan Zhang, Yanwen Guo
At its core is a new lighting model (dubbed DSGLight) based on depth-augmented Spherical Gaussians (SG) and a Graph Convolutional Network (GCN) that infers the new lighting representation from a single LDR image of limited field-of-view.
no code implementations • 29 Dec 2021 • Hai Su, Shan Yang, Shuqing Zhang, Songsen Yu
Color image steganography based on deep learning is the art of hiding information in the color image.
no code implementations • 8 Sep 2021 • Songxiang Liu, Shan Yang, Dan Su, Dong Yu
The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice.
1 code implementation • NeurIPS 2021 • Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
Humans perceive the world by concurrently processing and fusing high-dimensional inputs from multiple modalities such as vision and audio.
Ranked #2 on Action Classification on Kinetics-Sounds
no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Lei Xie, Dan Su
Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.
no code implementations • 21 Jun 2021 • Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su
Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.
no code implementations • CVPR 2021 • Fengmin Shi, Jie Guo, Haonan Zhang, Shan Yang, Xiying Wang, Yanwen Guo
We demonstrate that local geometry has a greater impact on the sound than the global geometry and offers more cues in material recognition.
no code implementations • 17 Jun 2021 • Bo Hu, Bryan Seybold, Shan Yang, David Ross, Avneesh Sud, Graham Ruby, Yi Liu
We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos.
1 code implementation • ACL 2021 • Shan Yang, Yongfei Zhang, Guanglin Niu, Qinghua Zhao, ShiLiang Pu
Few-shot relation extraction (FSRE) is of great importance in long-tail distribution problem, especially in special domain with low-resource data.
1 code implementation • ICCV 2021 • RuiLong Li, Shan Yang, David A. Ross, Angjoo Kanazawa
We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion conditioned on music.
Ranked #2 on Motion Synthesis on BRACE
1 code implementation • 3 Dec 2020 • Haohan Guo, Heng Lu, Na Hu, Chunlei Zhang, Shan Yang, Lei Xie, Dan Su, Dong Yu
In order to make timbre conversion more stable and controllable, speaker embedding is further decomposed to the weighted sum of a group of trainable vectors representing different timbre clusters.
9 code implementations • Interspeech2020 2020 • Geng Yang, Shan Yang, Kai Liu, Peng Fang, Wei Chen, Lei Xie
In this paper, we propose multi-band MelGAN, a much faster waveform generation model targeting to high-quality text-to-speech.
Sound Audio and Speech Processing
no code implementations • 28 Apr 2020 • Shan Yang, Yuxuan Wang, Lei Xie
As for the speech-side noise, we propose to learn a noise-independent feature in the auto-regressive decoder through adversarial training and data augmentation, which does not need an extra speech enhancement model.
no code implementations • ICCV 2017 • Shan Yang, Junbang Liang, Ming C. Lin
To extract information about the cloth, our method characterizes both the motion space and the visual appearance of the cloth geometry.
4 code implementations • 6 Jul 2017 • Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li
In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).
Sound
no code implementations • 3 Aug 2016 • Shan Yang, Tanya Ambert, Zherong Pan, Ke Wang, Licheng Yu, Tamara Berg, Ming C. Lin
Most recent garment capturing techniques rely on acquiring multiple views of clothing, which may not always be readily available, especially in the case of pre-existing photographs from the web.
4 code implementations • 31 Jul 2016 • Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, Tamara L. Berg
Humans refer to objects in their environments all the time, especially in dialogue with other people.