Search Results for author: Lichao Zhang

Found 15 papers, 7 papers with code

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design

no code implementations19 Nov 2023 JIA YU, Lichao Zhang, Zijie Chen, Fayu Pan, Miaomiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan

Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models.

Image Generation

Efficient Human-AI Coordination via Preparatory Language-based Convention

no code implementations1 Nov 2023 Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu

Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence.

Language Modelling Large Language Model

Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

1 code implementation12 Oct 2023 Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, Zhenzhong Lan

Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users.

Text-to-Image Generation

DisCover: Disentangled Music Representation Learning for Cover Song Identification

no code implementations19 Jul 2023 Jiahao Xun, Shengyu Zhang, Yanting Yang, Jieming Zhu, Liqun Deng, Zhou Zhao, Zhenhua Dong, RuiQi Li, Lichao Zhang, Fei Wu

We analyze the CSI task in a disentanglement view with the causal graph technique, and identify the intra-version and inter-version effects biasing the invariant learning.

Blocking Cover song identification +3

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

no code implementations24 May 2023 Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date.

Speech-to-Speech Translation Translation

AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment

no code implementations8 May 2023 RuiQi Li, Rongjie Huang, Lichao Zhang, Jinglin Liu, Zhou Zhao

The speech-to-singing (STS) voice conversion task aims to generate singing samples corresponding to speech recordings while facing a major challenge: the alignment between the target (singing) pitch contour and the source (speech) content is difficult to learn in a text-free situation.

STS Voice Conversion

Learning Robust Self-attention Features for Speech Emotion Recognition with Label-adaptive Mixup

1 code implementation7 May 2023 Lei Kang, Lichao Zhang, Dazhi Jiang

Speech Emotion Recognition (SER) is to recognize human emotions in a natural verbal interaction scenario with machines, which is considered as a challenging problem due to the ambiguous human emotions.

Speech Emotion Recognition

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

1 code implementation25 May 2022 Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou Zhao

Specifically, a sequence of discrete representations derived in a self-supervised manner are predicted from the model and passed to a vocoder for speech reconstruction, while still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e. g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism.

Representation Learning Speech Synthesis +2

Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking

1 code implementation31 Jul 2021 Jingxian Sun, Lichao Zhang, Yufei zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, Yanning Zhang

To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data.

Transfer Learning

Multi-Modal Fusion for End-to-End RGB-T Tracking

1 code implementation30 Aug 2019 Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost Van de Weijer, Fahad Shahbaz Khan

Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities.

Image-to-Image Translation Rgb-T Tracking

Learning the Model Update for Siamese Trackers

1 code implementation ICCV 2019 Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan

In general, this template is linearly combined with the accumulated template from the previous frame, resulting in an exponential decay of information over time.

Visual Tracking

Synthetic data generation for end-to-end thermal infrared tracking

no code implementations4 Jun 2018 Lichao Zhang, Abel Gonzalez-Garcia, Joost Van de Weijer, Martin Danelljan, Fahad Shahbaz Khan

These methods provide us with a large labeled dataset of synthetic TIR sequences, on which we can train end-to-end optimal features for tracking.

Image-to-Image Translation Synthetic Data Generation +2

Ensembles of Generative Adversarial Networks

no code implementations3 Dec 2016 Yaxing Wang, Lichao Zhang, Joost Van de Weijer

The first one is based on the fact that in the minimax game which is played to optimize the GAN objective the generator network keeps on changing even after the network can be considered optimal.

Cannot find the paper you are looking for? You can Submit a new open access paper.