Search Results for author: Shaoteng Liu

Found 11 papers, 7 papers with code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations27 Mar 2024 Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Image Comprehension Visual Dialog +1

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

no code implementations29 Feb 2024 Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.

reinforcement-learning Reinforcement Learning (RL)

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

1 code implementation2 Oct 2023 Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.

Image Generation Text-based Image Editing

Self-supervised Learning by View Synthesis

no code implementations22 Apr 2023 Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose.

3D Classification Self-Supervised Learning

Video-P2P: Video Editing with Cross-attention Control

1 code implementation8 Mar 2023 Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.

Image Generation Video Editing +1

Generative Model Watermarking Based on Human Visual System

no code implementations30 Sep 2022 Li Zhang, Yong liu, Shaoteng Liu, Tianshu Yang, Yexin Wang, Xinpeng Zhang, Hanzhou Wu

Intellectual property protection of deep neural networks is receiving attention from more and more researchers, and the latest research applies model watermarking to generative models for image processing.

On-target Adaptation

1 code implementation2 Sep 2021 Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.

Domain Adaptation

Multi-modal Cooking Workflow Construction for Food Recipes

no code implementations20 Aug 2020 Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.

Common Sense Reasoning

GREEN: a Graph REsidual rE-ranking Network for Grading Diabetic Retinopathy

1 code implementation20 Jul 2020 Shaoteng Liu, Lijun Gong, Kai Ma, Yefeng Zheng

In this paper, we propose a Graph REsidual rE-ranking Network (GREEN) to introduce a class dependency prior into the original image classification network.

Classification General Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.