2 code implementations • 27 Mar 2024 • Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia
We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.
Ranked #8 on Visual Question Answering on MM-Vet
no code implementations • 29 Feb 2024 • Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
1 code implementation • 2 Oct 2023 • Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.
Ranked #3 on Text-based Image Editing on PIE-Bench
no code implementations • 22 Apr 2023 • Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia
In each iteration, the input to VSA is one view (or multiple views) of a 3D object and the output is a synthesized image in another target pose.
1 code implementation • 8 Mar 2023 • Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia
This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.
no code implementations • 30 Sep 2022 • Li Zhang, Yong liu, Shaoteng Liu, Tianshu Yang, Yexin Wang, Xinpeng Zhang, Hanzhou Wu
Intellectual property protection of deep neural networks is receiving attention from more and more researchers, and the latest research applies model watermarking to generative models for image processing.
1 code implementation • 2 Sep 2021 • Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell
Domain adaptation seeks to mitigate the shift between training on the \emph{source} domain and testing on the \emph{target} domain.
no code implementations • 20 Aug 2020 • Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.
1 code implementation • 20 Jul 2020 • Shaoteng Liu, Lijun Gong, Kai Ma, Yefeng Zheng
In this paper, we propose a Graph REsidual rE-ranking Network (GREEN) to introduce a class dependency prior into the original image classification network.
2 code implementations • ICLR 2021 • Dequan Wang, Evan Shelhamer, Shaoteng Liu, Bruno Olshausen, Trevor Darrell
A model must adapt itself to generalize to new and different data during testing.
1 code implementation • CVPR 2020 • Shaoteng Liu, Jingjing Chen, Liangming Pan, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang
This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition.