no code implementations • COLING 2022 • Jianguo Mao, Jiyuan Zhang, Zengfeng Zeng, Weihua Peng, Wenbin Jiang, Xiangdong Wang, Hong Liu, Yajuan Lyu
It then performs dynamic reasoning based on the hierarchical representations of evidences to solve complex biomedical problems.
no code implementations • NAACL 2022 • Jianguo Mao, Wenbin Jiang, Xiangdong Wang, Zhifan Feng, Yajuan Lyu, Hong Liu, Yong Zhu
Then, it performs multistep reasoning for better answer decision between the representations of the question and the video, and dynamically integrate the reasoning results.
1 code implementation • 3 Sep 2024 • Zekang Yang, Hong Liu, Xiangdong Wang
In this paper, we propose a new training method, multimodal object-level contrast learning, for cancer survival risk prediction.
1 code implementation • 15 Aug 2024 • Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu
Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks.
1 code implementation • 30 Jun 2024 • Zekang Yang, Hong Liu, Xiangdong Wang
Cancer survival prediction is a challenging task that involves analyzing of the tumor microenvironment within Whole Slide Image (WSI).
no code implementations • 15 Sep 2023 • Yiming Li, Xiangdong Wang, Hong Liu
Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associate audio features with human language, making it a natural zero-shot classifier to recognize unseen sound categories.
no code implementations • 15 Sep 2023 • Yiming Li, Xiangdong Wang, Hong Liu, Rui Tao, Long Yan, Kazushige Ouchi
Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss.
no code implementations • 23 Aug 2023 • Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang
To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text.
no code implementations • 22 Aug 2023 • Hualei Wang, Jianguo Mao, Zhifang Guo, Jiarui Wan, Hong Liu, Xiangdong Wang
Large language models reveal deep comprehension and fluent generation in the field of multi-modality.
1 code implementation • 11 Jul 2023 • Yachuan Li, Zongmin Li, Xavier Soria P., Chaozhi Yang, Qian Xiao, Yun Bai, Hua Li, Xiangdong Wang
In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model.
1 code implementation • 18 Oct 2022 • Yiming Li, Zhifang Guo, Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan, Kazushige Ouchi
For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is used.
2 code implementations • 12 Oct 2021 • Rui Tao, Long Yan, Kazushige Ouchi, Xiangdong Wang
The recently proposed Mean Teacher method, which exploits large-scale unlabeled data in a self-ensembling manner, has achieved state-of-the-art results in several semi-supervised learning benchmarks.
1 code implementation • 5 Oct 2021 • Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan, Kazushige Ouchi
A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction.
no code implementations • 6 Nov 2019 • Yong Ruan, Xiangdong Wang, Hong Liu, Zhigang Ou, Yun Gao, Jianfeng Cheng, Yueliang Qian
For this, we train transformer model using feature sequence of audio and their phoneme sequence with lexical stress marks.
1 code implementation • 11 Sep 2019 • Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian
In this paper, we describe in detail the system we submitted to DCASE2019 task 4: sound event detection (SED) in domestic environments.
1 code implementation • 6 Jun 2019 • Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian
Instead of designing a single model by considering a trade-off between the two sub-targets, we design a teacher model aiming at audio tagging to guide a student model aiming at boundary detection to learn using the unlabeled data.
1 code implementation • 24 May 2019 • Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian
In this paper, a special decision surface for the weakly-supervised sound event detection (SED) and a disentangled feature (DF) for the multi-label problem in polyphonic SED are proposed.
1 code implementation • The 13th Asian Conference on Computer Vision 2016 • Haomiao Ni, Hong Liu, Xiangdong Wang, Yueliang Qian
This paper proposes a novel human action recognition using the decision-level fusion of both skeleton and depth sequence.