Search Results for author: Jianmin Bao

Found 55 papers, 38 papers with code

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

no code implementations12 Jun 2024 Xinzhi Mu, Li Chen, Bohan Chen, Shuyang Gu, Jianmin Bao, Dong Chen, Ji Li, Yuhui Yuan

This task essentially requires generating coherent and consistent visual content within the confines of a font-shaped canvas, as opposed to a traditional rectangular canvas.

Text-to-Image Generation

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

1 code implementation22 Apr 2024 Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Language Modelling

Towards More Unified In-context Visual Understanding

no code implementations CVPR 2024 Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline. Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines.

Decoder Image Captioning +2

PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

no code implementations8 Nov 2023 Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li

Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID).

Person Re-Identification

CCEdit: Creative and Controllable Video Editing via Diffusion Models

no code implementations CVPR 2024 Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

The versatility of our framework is demonstrated through a diverse range of choices in both structure representations and personalized T2I models, as well as the option to provide the edited key frame.

Text-to-Image Generation Video Editing

AltFreezing for More General Video Face Forgery Detection

1 code implementation CVPR 2023 Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li

In this paper, we propose to capture both spatial and temporal artifacts in one model for face forgery detection.

Data Augmentation

HQ-50K: A Large-scale, High-quality Dataset for Image Restoration

1 code implementation8 Jun 2023 Qinhong Yang, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Lu Yuan, Gang Hua, Nenghai Yu

This paper introduces a new large-scale image restoration dataset, called HQ-50K, which contains 50, 000 high-quality images with rich texture details and semantic diversity.

Denoising Diversity +3

Designing a Better Asymmetric VQGAN for StableDiffusion

2 code implementations7 Jun 2023 Zixin Zhu, Xuelu Feng, Dongdong Chen, Jianmin Bao, Le Wang, Yinpeng Chen, Lu Yuan, Gang Hua

The training cost of our asymmetric VQGAN is cheap, and we only need to retrain a new asymmetric decoder while keeping the vanilla VQGAN encoder and StableDiffusion unchanged.

Decoder Image Inpainting

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

1 code implementation NeurIPS 2023 Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions.

CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

1 code implementation CVPR 2023 Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Wenqiang Zhang

Our framework, termed as domain-aware sign language retrieval via Cross-lingual Contrastive learning or CiCo for short, outperforms the pioneering method by large margins on various datasets, e. g., +22. 4 T2V and +28. 0 V2T R@1 improvements on How2Sign dataset, and +13. 7 T2V and +17. 1 V2T R@1 improvements on PHOENIX-2014T dataset.

Contrastive Learning Sign Language Retrieval +4

DIRE for Diffusion-Generated Image Detection

2 code implementations ICCV 2023 Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, Houqiang Li

We find that existing detectors struggle to detect images generated by diffusion models, even if we include generated images from a specific diffusion model in their training data.

Efficient Diffusion Training via Min-SNR Weighting Strategy

2 code implementations ICCV 2023 Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, Baining Guo

Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence.

Denoising Image Generation +2

Improving CLIP Fine-tuning Performance

1 code implementation ICCV 2023 Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.

object-detection Object Detection +1

Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion

no code implementations CVPR 2023 Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining Guo

This paper presents a 3D generative model that uses diffusion models to automatically generate 3D digital avatars represented as neural radiance fields.

Computational Efficiency

CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet

1 code implementation12 Dec 2022 Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Shuyang Gu, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory.

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

2 code implementations7 Dec 2022 Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Data Augmentation Instance Segmentation +5

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

no code implementations28 Nov 2022 YiXuan Wang, Wengang Zhou, Jianmin Bao, Weilun Wang, Li Li, Houqiang Li

The key idea of our CLIP2GAN is to bridge the output feature embedding space of CLIP and the input latent space of StyleGAN, which is realized by introducing a mapping network.

Attribute Image Generation +1

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

1 code implementation22 Nov 2022 Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.

Denoising Diversity +2

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

no code implementations CVPR 2023 Xiaoyi Dong, Jianmin Bao, Yinglin Zheng, Ting Zhang, Dongdong Chen, Hao Yang, Ming Zeng, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

Second, masked self-distillation is also consistent with vision-language contrastive from the perspective of training objective as both utilize the visual encoder for feature aligning, and thus is able to learn local semantics getting indirect supervision from the language.

Representation Learning

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

1 code implementation14 Jul 2022 Xiaoyi Dong, Jianmin Bao, Ting Zhang, Dongdong Chen, Weiming Zhang, Lu Yuan, Dong Chen, Fang Wen, Nenghai Yu

The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance.

Decoder Object Detection +2

Semantic Image Synthesis via Diffusion Models

3 code implementations30 Jun 2022 Weilun Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Dong Chen, Lu Yuan, Houqiang Li

Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks compared with Generative Adversarial Nets (GANs).

Decoder Denoising +2

Improved Vector Quantized Diffusion Models

1 code implementation31 May 2022 Zhicong Tang, Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

When trained on ImageNet, we dramatically improve the FID score from 11. 89 to 4. 83, demonstrating the superiority of our proposed techniques.

Denoising Image Generation

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

1 code implementation27 May 2022 Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools.

Ranked #2 on Instance Segmentation on COCO test-dev (using extra training data)

Contrastive Learning Image Classification +5

Large-Scale Pre-training for Person Re-identification with Noisy Labels

2 code implementations CVPR 2022 Dengpan Fu, Dongdong Chen, Hao Yang, Jianmin Bao, Lu Yuan, Lei Zhang, Houqiang Li, Fang Wen, Dong Chen

Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning.

Contrastive Learning Multi-Object Tracking +3

Semi-Supervised Image-to-Image Translation using Latent Space Mapping

no code implementations29 Mar 2022 Pan Zhang, Jianmin Bao, Ting Zhang, Dong Chen, Fang Wen

Thanks to the low dimensional feature space, it is easier to find the desired mapping function, resulting in improved quality of translation results as well as the stability of the translation model.

Image-to-Image Translation Translation

Protecting Celebrities from DeepFake with Identity Consistency Transformer

1 code implementation CVPR 2022 Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Ting Zhang, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

In this work we propose Identity Consistency Transformer, a novel face forgery detection method that focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions.

Face Swapping

StyleSwin: Transformer-based GAN for High-resolution Image Generation

1 code implementation CVPR 2022 BoWen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo

To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity.

 Ranked #1 on Image Generation on CelebA 256x256 (FID metric)

Blocking Computational Efficiency +3

General Facial Representation Learning in a Visual-Linguistic Manner

2 code implementations CVPR 2022 Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen

In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner.

 Ranked #1 on Face Parsing on CelebAMask-HQ (using extra training data)

Face Alignment Face Parsing +1

Vector Quantized Diffusion Model for Text-to-Image Synthesis

2 code implementations CVPR 2022 Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo

Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.

 Ranked #1 on Text-to-Image Generation on Oxford 102 Flowers (using extra training data)

Denoising Text-to-Image Generation

SimMIM: A Simple Framework for Masked Image Modeling

4 code implementations CVPR 2022 Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Representation Learning Self-Supervised Image Classification +1

Dual Path Learning for Domain Adaptation of Semantic Segmentation

1 code implementation ICCV 2021 Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Fang Wen, Wenqiang Zhang

In this paper, based on the observation that domain adaptation frameworks performed in the source and target domain are almost complementary in terms of image translation and SSL, we propose a novel dual path learning (DPL) framework to alleviate visual inconsistency.

Domain Adaptation Segmentation +4

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

6 code implementations CVPR 2022 Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, Baining Guo

By further pretraining on the larger dataset ImageNet-21K, we achieve 87. 5% Top-1 accuracy on ImageNet-1K and high segmentation performance on ADE20K with 55. 7 mIoU.

Image Classification Semantic Segmentation

Uformer: A General U-Shaped Transformer for Image Restoration

4 code implementations CVPR 2022 Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li

Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration.

Deblurring Decoder +6

High-Fidelity and Arbitrary Face Editing

no code implementations CVPR 2021 Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, Zhouhui Lian

However, we observe that the generator tends to find a tricky way to hide information from the original image to satisfy the constraint of cycle consistency, making it impossible to maintain the rich details (e. g., wrinkles and moles) of non-editing areas.

Attribute Vocal Bursts Intensity Prediction

Identity-Driven DeepFake Detection

no code implementations7 Dec 2020 Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, Baining Guo

Our approach takes as input the suspect image/video as well as the target identity information (a reference image or video).

DeepFake Detection Face Swapping

Unsupervised Pre-training for Person Re-identification

1 code implementation CVPR 2021 Dengpan Fu, Dongdong Chen, Jianmin Bao, Hao Yang, Lu Yuan, Lei Zhang, Houqiang Li, Dong Chen

In this paper, we present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" and make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.

 Ranked #1 on Person Re-Identification on Market-1501 (using extra training data)

Data Augmentation Person Re-Identification +1

Learnable Sampling 3D Convolution for Video Enhancement and Action Recognition

no code implementations22 Nov 2020 Shuyang Gu, Jianmin Bao, Dong Chen

A key challenge in video enhancement and action recognition is to fuse useful information from neighboring frames.

Action Recognition Denoising +3

GreedyFool: Distortion-Aware Sparse Adversarial Attack

1 code implementation NeurIPS 2020 Xiaoyi Dong, Dongdong Chen, Jianmin Bao, Chuan Qin, Lu Yuan, Weiming Zhang, Nenghai Yu, Dong Chen

Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels.

Adversarial Attack

Improving Person Re-identification with Iterative Impression Aggregation

no code implementations21 Sep 2020 Dengpan Fu, Bo Xin, Jingdong Wang, Dong-Dong Chen, Jianmin Bao, Gang Hua, Houqiang Li

Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods.

Person Re-Identification Re-Ranking

PriorGAN: Real Data Prior for Generative Adversarial Nets

1 code implementation30 Jun 2020 Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

To address these two issues, we propose a novel prior that captures the whole real data distribution for GANs, which are called PriorGANs.

GIQA: Generated Image Quality Assessment

1 code implementation ECCV 2020 Shuyang Gu, Jianmin Bao, Dong Chen, Fang Wen

Generative adversarial networks (GANs) have achieved impressive results today, but not all generated images are perfect.

Diversity Image Quality Assessment

Face X-ray for More General Face Forgery Detection

4 code implementations CVPR 2020 Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, Baining Guo

For this reason, face X-ray provides an effective way for detecting forgery generated by most existing face manipulation algorithms.

DeepFake Detection Face Swapping

FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping

10 code implementations31 Dec 2019 Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, Fang Wen

We propose a novel attributes encoder for extracting multi-level target face attributes, and a new generator with carefully designed Adaptive Attentional Denormalization (AAD) layers to adaptively integrate the identity and the attributes for face synthesis.

Face Generation Face Swapping +1

Towards Open-Set Identity Preserving Face Synthesis

no code implementations CVPR 2018 Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, Gang Hua

We then recombine the identity vector and the attribute vector to synthesize a new face of the subject with the extracted attribute.

Attribute Face Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.