Search Results for author: Jinbin Bai

Found 17 papers, 11 papers with code

An Empirical Study of GPT-4o Image Generation Capabilities

1 code implementation8 Apr 2025 Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, Lu Qi

The landscape of image generation has rapidly evolved, from early GAN-based approaches to diffusion models and, most recently, to unified generative architectures that seek to bridge understanding and generation tasks.

Benchmarking Image Generation +3

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer

1 code implementation21 Mar 2025 Qingyu Shi, Jianzong Wu, Jinbin Bai, Jiangning Zhang, Lu Qi, Xiangtai Li, Yunhai Tong

In contrast, state-of-the-art video Diffusion Transformers (DiT) models use 3D full attention, which does not explicitly separate temporal and spatial information.

Benchmarking Video Generation

Evaluating Image Caption via Cycle-consistent Text-to-Image Generation

no code implementations7 Jan 2025 Tianyu Cui, Jinbin Bai, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ye Shi

Recent research has revealed that the modality gap generally exists in the representation of contrastive learning-based multi-modal systems, undermining the reliability of cross-modality metrics like CLIPScore.

Contrastive Learning Diversity +2

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

1 code implementation5 Dec 2024 Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan

HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.

RelationBooth: Towards Relation-Aware Customized Object Generation

no code implementations30 Oct 2024 Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li, Ming-Husan Yang

Instead, this work addresses that gap by focusing on relation-aware customized image generation, which aims to preserve the identities from image prompts while maintaining the predicate relations described in text prompts.

Image Generation Object +1

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

no code implementations17 Oct 2024 Donghao Zhou, Jiancheng Huang, Jinbin Bai, Jiaze Wang, Hao Chen, Guangyong Chen, Xiaowei Hu, Pheng-Ann Heng

Recent text-to-image models generate high-quality images from text prompts but lack precise control over specific components within visual concepts.

Image Generation

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

1 code implementation10 Oct 2024 Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan

We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.

Feature Compression Image Generation

An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control

2 code implementations7 Mar 2024 Aosong Feng, Weikang Qiu, Jinbin Bai, Xiao Zhang, Zhen Dong, Kaicheng Zhou, Rex Ying, Leandros Tassiulas

Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content.

Descriptive

Integrating View Conditions for Image Synthesis

1 code implementation24 Oct 2023 Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou

In the field of image processing, applying intricate semantic modifications within existing images remains an enduring challenge.

Image Generation Object

Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks

1 code implementation ICCV 2023 Sixiang Chen, Tian Ye, Jinbin Bai, ErKang Chen, Jun Shi, Lei Zhu

In the real world, image degradations caused by rain often exhibit a combination of rain streaks and raindrops, thereby increasing the challenges of recovering the underlying clean image.

Rain Removal

Taming Diffusion Models for Music-driven Conducting Motion Generation

1 code implementation15 Jun 2023 Zhuoran Zhao, Jinbin Bai, Delong Chen, Debang Wang, Yubo Pan

Generating the motion of orchestral conductors from a given piece of symphony music is a challenging task since it requires a model to learn semantic music features and capture the underlying distribution of real conducting motion.

Diversity Motion Generation

Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

1 code implementation15 May 2023 Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, ErKang Chen

In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only $\sim$ 9k parameters and $\sim$ 0. 01s processing time.

Computational Efficiency Image Enhancement

RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

no code implementations23 Feb 2023 Jingxia Jiang, Jinbin Bai, Yun Liu, Junjie Yin, Sixiang Chen, Tian Ye, ErKang Chen

Underwater images typically experience mixed degradations of brightness and structure caused by the absorption and scattering of light by suspended particles.

Image Enhancement

Translating Natural Language to Planning Goals with Large-Language Models

1 code implementation10 Feb 2023 Yaqi Xie, Chen Yu, Tongyao Zhu, Jinbin Bai, Ze Gong, Harold Soh

Recent large language models (LLMs) have demonstrated remarkable performance on a variety of natural language processing (NLP) tasks, leading to intense excitement about their applicability across various domains.

Spatial Reasoning Translation

Adverse Weather Removal with Codebook Priors

no code implementations ICCV 2023 Tian Ye, Sixiang Chen, Jinbin Bai, Jun Shi, Chenghao Xue, Jingxia Jiang, Junjie Yin, ErKang Chen, Yun Liu

Inspired by recent advancements in codebook and vector quantization (VQ) techniques, we present a novel Adverse Weather Removal network with Codebook Priors (AWRCP) to address the problem of unified adverse weather removal.

Quantization

LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval

no code implementations11 Jul 2022 Jinbin Bai, Chunhui Liu, Feiyue Ni, Haofan Wang, Mengying Hu, Xiaofeng Guo, Lele Cheng

To overcome the above issue, we present a novel mechanism for learning the translation relationship from a source modality space $\mathcal{S}$ to a target modality space $\mathcal{T}$ without the need for a joint latent space, which bridges the gap between visual and textual domains.

Representation Learning Text Retrieval +3

Cannot find the paper you are looking for? You can Submit a new open access paper.