Search Results for author: Yi-Jen Shih

Found 6 papers, 4 papers with code

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

no code implementations2 Nov 2022 Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath

This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.

Image Retrieval Retrieval +1

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

1 code implementation3 Oct 2022 Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-Yi Lee, David Harwath

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling Retrieval +1

Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

1 code implementation7 Nov 2021 Yi-Jen Shih, Shih-Lun Wu, Frank Zalkow, Meinard Müller, Yi-Hsuan Yang

To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation.

Music Generation Representation Learning Sound Multimedia Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.