no code implementations • 28 Jan 2024 • Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu
Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.
no code implementations • 15 Sep 2023 • Yiming Li, Xiangdong Wang, Hong Liu, Rui Tao, Long Yan, Kazushige Ouchi
Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss.
no code implementations • 23 Aug 2023 • Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang
To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text.
1 code implementation • 18 Oct 2022 • Yiming Li, Zhifang Guo, Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan, Kazushige Ouchi
For the frame-wise model, the ICT-TOSHIBA system of DCASE 2021 Task 4 is used.
2 code implementations • 12 Oct 2021 • Rui Tao, Long Yan, Kazushige Ouchi, Xiangdong Wang
The recently proposed Mean Teacher method, which exploits large-scale unlabeled data in a self-ensembling manner, has achieved state-of-the-art results in several semi-supervised learning benchmarks.
1 code implementation • 5 Oct 2021 • Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan, Kazushige Ouchi
A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction.