Search Results for author: Jingwen Chen

Found 11 papers, 6 papers with code

Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning

no code implementations31 Dec 2024 Jianjie Luo, Jingwen Chen, Yehao Li, Yingwei Pan, Jianlin Feng, Hongyang Chao, Ting Yao

Additionally, to facilitate the model training with synthetic data, a novel CLIP-weighted cross-entropy loss is devised to prioritize the high-quality image-text pairs over the low-quality counterparts.

Caption Generation Decoder +2

Improving Text-guided Object Inpainting with Semantic Pre-inpainting

1 code implementation12 Sep 2024 Yifu Chen, Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Zhineng Chen, Tao Mei

In this paper, we propose to decompose the typical single-stage object inpainting into two cascaded processes: 1) semantic pre-inpainting that infers the semantic features of desired objects in a multi-modal feature space; 2) high-fieldity object generation in diffusion latent space that pivots on such inpainted semantic features.

Denoising Object

Improving Virtual Try-On with Garment-focused Diffusion Models

1 code implementation12 Sep 2024 Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, Yang Cao, Tao Mei

To address this, we shape a new Diffusion model, namely GarDiff, which triggers the garment-focused diffusion process with amplified guidance of both basic visual appearance and detailed textures (i. e., high-frequency details) derived from the given garment.

Image Generation Virtual Try-on

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

no code implementations9 Nov 2023 Jingwen Chen, Yingwei Pan, Ting Yao, Tao Mei

To achieve this, we present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network enabling more conditions of text prompts and style images.

Style Transfer Text-to-Image Generation

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

2 code implementations18 Aug 2021 Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei

Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.

Cross-Modal Retrieval Decoder +6

Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network

1 code implementation27 Jan 2021 Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, Tao Mei

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging.

Decoder

Physical-Virtual Collaboration Modeling for Intra-and Inter-Station Metro Ridership Prediction

2 code implementations14 Jan 2020 Lingbo Liu, Jingwen Chen, Hefeng Wu, Jiajie Zhen, Guanbin Li, Liang Lin

To address this problem, we model a metro system as graphs with various topologies and propose a unified Physical-Virtual Collaboration Graph Network (PVCGN), which can effectively learn the complex ridership patterns from the tailor-designed graphs.

Representation Learning

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

1 code implementation3 May 2019 Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei

Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations.

Decoder Sentence +1

Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding

no code implementations16 Nov 2016 Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang

Creating aesthetically pleasing pieces of art, including music, has been a long-term goal for artificial intelligence research.

Music Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.