Search Results for author: Shengju Qian

Found 11 papers, 6 papers with code

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

1 code implementation25 Mar 2024 Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning.

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

1 code implementation7 Dec 2023 Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.

Text Generation

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

2 code implementations21 Sep 2023 Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.

4k Instruction Following +2

TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation

no code implementations15 Apr 2023 Jingyao Li, Pengguang Chen, Shengju Qian, Jiaya Jia

However, existing models easily misidentify input pixels from unseen classes, thus confusing novel classes with semantically-similar ones.

Language Modelling Open Vocabulary Semantic Segmentation +2

StraIT: Non-autoregressive Generation with Stratified Image Transformer

no code implementations1 Mar 2023 Shengju Qian, Huiwen Chang, Yuanzhen Li, Zizhao Zhang, Jiaya Jia, Han Zhang

We propose Stratified Image Transformer(StraIT), a pure non-autoregressive(NAR) generative model that demonstrates superiority in high-quality image synthesis over existing autoregressive(AR) and diffusion models(DMs).

Image Generation

What Makes for Good Tokenizers in Vision Transformer?

no code implementations21 Dec 2022 Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

1 code implementation19 Dec 2021 Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.

Ranked #5 on Image Super-Resolution on Set5 - 2x upscaling (using extra training data)

Denoising Image Super-Resolution

Blending Anti-Aliasing into Vision Transformer

no code implementations NeurIPS 2021 Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.

Temporal Interlacing Network

4 code implementations17 Jan 2020 Hao Shao, Shengju Qian, Yu Liu

In this way, a heavy temporal model is replaced by a simple interlacing operator.

Optical Flow Estimation Video Understanding

Make a Face: Towards Arbitrary High Fidelity Face Manipulation

no code implementations ICCV 2019 Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He

Recent studies have shown remarkable success in face manipulation task with the advance of GANs and VAEs paradigms, but the outputs are sometimes limited to low-resolution and lack of diversity.

Clustering Disentanglement +1

Cannot find the paper you are looking for? You can Submit a new open access paper.