Search Results for author: Shaoteng Liu

Found 11 papers, 7 papers with code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations • 27 Mar 2024 • Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Ranked #8 on Visual Question Answering on MM-Vet

Image Comprehension Visual Dialog +1

2,829

Paper
Code

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

no code implementations • 29 Feb 2024 • Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

1 code implementation • 2 Oct 2023 • Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.

Ranked #3 on Text-based Image Editing on PIE-Bench

Image Generation Text-based Image Editing

184

Paper
Code

Self-supervised Learning by View Synthesis

no code implementations • 22 Apr 2023 • Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose.

3D Classification Self-Supervised Learning

Paper
Add Code

Video-P2P: Video Editing with Cross-attention Control

1 code implementation • 8 Mar 2023 • Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.

Image Generation Video Editing +1

332

Paper
Code

Generative Model Watermarking Based on Human Visual System

no code implementations • 30 Sep 2022 • Li Zhang, Yong liu, Shaoteng Liu, Tianshu Yang, Yexin Wang, Xinpeng Zhang, Hanzhou Wu

Intellectual property protection of deep neural networks is receiving attention from more and more researchers, and the latest research applies model watermarking to generative models for image processing.

Paper
Add Code

On-target Adaptation

1 code implementation • 2 Sep 2021 • Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.

Domain Adaptation

198

Paper
Code

Multi-modal Cooking Workflow Construction for Food Recipes

no code implementations • 20 Aug 2020 • Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.

Common Sense Reasoning

Paper
Add Code

GREEN: a Graph REsidual rE-ranking Network for Grading Diabetic Retinopathy

1 code implementation • 20 Jul 2020 • Shaoteng Liu, Lijun Gong, Kai Ma, Yefeng Zheng

In this paper, we propose a Graph REsidual rE-ranking Network (GREEN) to introduce a class dependency prior into the original image classification network.

Classification General Classification +3