Search Results for author: Zhuofan Zong

Found 9 papers, 6 papers with code

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

2 code implementations • 4 Apr 2024 • Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.

Attribute Image Captioning +1

131

Paper
Code

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

1 code implementation • 25 Mar 2024 • Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning.

Paper
Code

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

no code implementations • NeurIPS 2023 • Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo

Text-to-image generation has recently witnessed remarkable achievements.

Ranked #11 on Text-to-Image Generation on MS COCO

Text-to-Image Generation

Paper
Add Code

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

1 code implementation • ICCV 2023 • Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu

The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning.

Ranked #3 on 3D Object Detection on nuScenes Camera Only

3D Object Detection Object

171

Paper
Code

DETRs with Collaborative Hybrid Assignments Training

3 code implementations • ICCV 2023 • Zhuofan Zong, Guanglu Song, Yu Liu

This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN.

Ranked #1 on Object Detection on LVIS v1.0 val (using extra training data)

Instance Segmentation Object Detection +1

27,716

Paper
Code

Large-batch Optimization for Dense Visual Predictions

1 code implementation • 20 Oct 2022 • Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu, Ping Luo

To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts.

Instance Segmentation object-detection +3

Paper
Code

Self-slimmed Vision Transformer

1 code implementation • 24 Nov 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

Paper
Code

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

no code implementations • 23 Oct 2021 • Zhuofan Zong, Qianggang Cao, Biao Leng

Moreover, semantics from non-adjacent levels are diluted in the feature pyramid since only features at adjacent pyramid levels are merged by the local fusion operation in a sequence manner.

object-detection Object Detection

Paper
Add Code

Self-Slimming Vision Transformer

no code implementations • 29 Sep 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.