no code implementations • 23 Jan 2023 • Gurunath Reddy M, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang
We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans.
no code implementations • 5 Dec 2021 • Jiwei Zhang, Yi Yu, Suhua Tang, Jianming Wu, Wei Li
On the one hand, audio encoder and visual encoder separately encode audio data and visual data into two different latent spaces.
no code implementations • 12 Nov 2020 • Gurunath Reddy Madhumani, Yi Yu, Florian Harscoët, Simon Canales, Suhua Tang
In this paper, we propose a technique to address the most challenging aspect of algorithmic songwriting process, which enables the human community to discover original lyrics, and melodies suitable for the generated lyrics.
1 code implementation • 12 May 2019 • Junjun Jiang, Yi Yu, Zheng Wang, Suhua Tang, Ruimin Hu, Jiayi Ma
In this paper, we present a simple but effective single image SR method based on ensemble learning, which can produce a better performance than that could be obtained from any of SR methods to be ensembled (or called component super-resolvers).
2 code implementations • 3 Sep 2018 • Junjun Jiang, Yi Yu, Suhua Tang, Jiayi Ma, Akiko Aizawa, Kiyoharu Aizawa
To this end, this study incorporates the contextual information of image patch and proposes a powerful and efficient context-patch based face hallucination approach, namely Thresholding Locality-constrained Representation and Reproducing learning (TLcR-RL).
1 code implementation • 28 Jun 2018 • Junjun Jiang, Yi Yu, Jinhui Hu, Suhua Tang, Jiayi Ma
Most of the current face hallucination methods, whether they are shallow learning-based or deep learning-based, all try to learn a relationship model between Low-Resolution (LR) and High-Resolution (HR) spaces with the help of a training set.
no code implementations • 8 May 2018 • Yi Yu, Suhua Tang, Kiyoharu Aizawa, Akiko Aizawa
Given a photo as input, this model performs (i) exact venue search (find the venue where the photo was taken), and (ii) group venue search (find relevant venues with the same category as that of the photo), by the cross-modal correlation between the input photo and textual description of venues.
no code implementations • 14 Dec 2017 • Francisco Raposo, David Martins de Matos, Ricardo Ribeiro, Suhua Tang, Yi Yu
Modeling of music audio semantics has been previously tackled through learning of mappings from audio data to high-level tags or latent unsupervised spaces.