Search Results for author: Shengju Qian

Found 11 papers, 6 papers with code

Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

1 code implementation • 25 Mar 2024 • Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

This paper presents Visual CoT, a novel pipeline that leverages the reasoning capabilities of multi-modal large language models (MLLMs) by incorporating visual Chain-of-Thought (CoT) reasoning.

Paper
Code

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

1 code implementation • 7 Dec 2023 • Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.

Text Generation

Paper
Code

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

2 code implementations • 21 Sep 2023 • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.

4k Instruction Following +2

5,579

Paper
Code

TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation

no code implementations • 15 Apr 2023 • Jingyao Li, Pengguang Chen, Shengju Qian, Jiaya Jia

However, existing models easily misidentify input pixels from unseen classes, thus confusing novel classes with semantically-similar ones.

Language Modelling Open Vocabulary Semantic Segmentation +2

Paper
Add Code

StraIT: Non-autoregressive Generation with Stratified Image Transformer

no code implementations • 1 Mar 2023 • Shengju Qian, Huiwen Chang, Yuanzhen Li, Zizhao Zhang, Jiaya Jia, Han Zhang

We propose Stratified Image Transformer(StraIT), a pure non-autoregressive(NAR) generative model that demonstrates superiority in high-quality image synthesis over existing autoregressive(AR) and diffusion models(DMs).

Image Generation

Paper
Add Code

What Makes for Good Tokenizers in Vision Transformer?

no code implementations • 21 Dec 2022 • Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia

The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.

Paper
Add Code

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

1 code implementation • 19 Dec 2021 • Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.

Ranked #5 on Image Super-Resolution on Set5 - 2x upscaling (using extra training data)

Denoising Image Super-Resolution

119

Paper
Code

Blending Anti-Aliasing into Vision Transformer

no code implementations • NeurIPS 2021 • Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.

Paper
Add Code

Temporal Interlacing Network

4 code implementations • 17 Jan 2020 • Hao Shao, Shengju Qian, Yu Liu

In this way, a heavy temporal model is replaced by a simple interlacing operator.

Optical Flow Estimation Video Understanding

3,876

Paper
Code

Make a Face: Towards Arbitrary High Fidelity Face Manipulation

no code implementations • ICCV 2019 • Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He

Recent studies have shown remarkable success in face manipulation task with the advance of GANs and VAEs paradigms, but the outputs are sometimes limited to low-resolution and lack of diversity.

Clustering Disentanglement +1

Paper
Add Code

Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation

1 code implementation • ICCV 2019 • Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia

Facial landmark detection, or face alignment, is a fundamental task that has been extensively studied.

Ranked #18 on Face Alignment on WFLW

Face Alignment Facial Landmark Detection +1

182

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.