Search Results for author: Yuhui Yuan

Found 38 papers, 24 papers with code

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

1 code implementation11 Aug 2024 Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li

In response to this observation, we present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.

Denoising

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

no code implementations14 Jun 2024 Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Lin Liang, Lijuan Wang, Ji Li, Yuhui Yuan

With the combination of these techniques, we deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in 10 different languages.

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

no code implementations12 Jun 2024 Xinzhi Mu, Li Chen, Bohan Chen, Shuyang Gu, Jianmin Bao, Dong Chen, Ji Li, Yuhui Yuan

This task essentially requires generating coherent and consistent visual content within the confines of a font-shaped canvas, as opposed to a traditional rectangular canvas.

Text-to-Image Generation

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

no code implementations6 Jun 2024 Zhanhao Liang, Yuhui Yuan, Shuyang Gu, Bohan Chen, Tiankai Hang, Ji Li, Liang Zheng

To address this limitation, we propose Step-aware Preference Optimization (SPO), a novel post-training approach that independently evaluates and adjusts the denoising performance at each step, using a step-aware preference model and a step-wise resampler to ensure accurate step-aware supervision.

Denoising

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

no code implementations14 Mar 2024 Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan

Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies.

Text-to-Image Generation

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

no code implementations15 Dec 2023 Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang

In this paper, we present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation.

3D Generation Image to 3D +1

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design

no code implementations28 Nov 2023 Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo

Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text.

Image Generation

CCEdit: Creative and Controllable Video Editing via Diffusion Models

no code implementations CVPR 2024 Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

The versatility of our framework is demonstrated through a diverse range of choices in both structure representations and personalized T2I models, as well as the option to provide the edited key frame.

Text-to-Image Generation Video Editing

Mask Frozen-DETR: High Quality Instance Segmentation with One GPU

no code implementations7 Aug 2023 Zhanhao Liang, Yuhui Yuan

In this paper, we aim to study how to build a strong instance segmenter with minimal training time and GPUs, as opposed to the majority of current approaches that pursue more accurate instance segmenter by building more advanced frameworks at the cost of longer training time and higher GPU requirements.

Ranked #3 on Instance Segmentation on COCO minival (using extra training data)

Instance Segmentation object-detection +2

DETR Doesn't Need Multi-Scale or Locality Design

1 code implementation3 Aug 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Decoder

Revisiting DETR Pre-training for Object Detection

no code implementations2 Aug 2023 Yan Ma, Weicong Liang, Bohan Chen, Yiduo Hao, BoJian Hou, Xiangyu Yue, Chao Zhang, Yuhui Yuan

Motivated by the remarkable achievements of DETR-based approaches on COCO object detection and segmentation benchmarks, recent endeavors have been directed towards elevating their performance through self-supervised pre-training of Transformers while preserving a frozen backbone.

Object object-detection +1

Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation

no code implementations ICCV 2023 Changqi Wang, Haoyu Xie, Yuhui Yuan, Chong Fu, Xiangyu Yue

To improve the robustness of representations, powerful methods introduce a pixel-wise contrastive learning approach in latent space (i. e., representation space) that aggregates the representations to their prototypes in a fully supervised manner.

Contrastive Learning Semi-Supervised Semantic Segmentation

detrex: Benchmarking Detection Transformers

1 code implementation12 Jun 2023 Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

DETR Does Not Need Multi-Scale or Locality Design

1 code implementation ICCV 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Decoder

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

4 code implementations3 Oct 2022 Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.

Clustering Depth Estimation +6

DETRs with Hybrid Matching

8 code implementations CVPR 2023 Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, WeiHong Lin, Lei Sun, Chao Zhang, Han Hu

One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections.

Object Detection Pose Estimation +2

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation

2 code implementations8 Mar 2022 Haodi He, Yuhui Yuan, Xiangyu Yue, Han Hu

Given an input image or video, our framework first conducts multi-label classification over the complete label, then sorts the complete label and selects a small subset according to their class confidence scores.

Classification Instance Segmentation +6

HRFormer: High-Resolution Vision Transformer for Dense Predict

2 code implementations NeurIPS 2021 Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Pose Estimation Semantic Segmentation +1

HRFormer: High-Resolution Transformer for Dense Prediction

1 code implementation18 Oct 2021 Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Image Classification Multi-Person Pose Estimation +2

Conditional DETR for Fast Training Convergence

4 code implementations ICCV 2021 Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang

Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention.

Decoder Object +2

SegFix: Model-Agnostic Boundary Refinement for Segmentation

4 code implementations ECCV 2020 Yuhui Yuan, Jingyi Xie, Xilin Chen, Jingdong Wang

We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model.

Segmentation

Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification

1 code implementation ICCV 2019 Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han

On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes.

Human Parsing Person Re-Identification

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

11 code implementations ECCV 2020 Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.

Decoder Object +2

OCNet: Object Context Network for Scene Parsing

8 code implementations4 Sep 2018 Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.

Object Relation +2

Feature Incay for Representation Regularization

no code implementations ICLR 2018 Yuhui Yuan, Kuiyuan Yang, Chao Zhang

Thus, we propose feature incay to also regularize representation learning, which favors feature vectors with large norm when the samples can be correctly classified.

Multi-class Classification Representation Learning

Hard-Aware Deeply Cascaded Embedding

1 code implementation ICCV 2017 Yuhui Yuan, Kuiyuan Yang, Chao Zhang

This motivates us to ensemble a set of models with different complexities in cascaded manner and mine hard examples adaptively, a sample is judged by a series of models with increasing complexities and only updates models that consider the sample as a hard case.

Metric Learning Triplet

Cannot find the paper you are looking for? You can Submit a new open access paper.