no code implementations • 11 Mar 2025 • Yongsheng Yu, Ziyun Zeng, Haitian Zheng, Jiebo Luo
Diffusion-based generative models have revolutionized object-oriented image editing, yet their deployment in realistic object removal and insertion remains hampered by challenges such as the intricate interplay of physical effects and insufficient paired training data.
no code implementations • 13 Oct 2024 • Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo
With MMCOMPOSITION, we can quantify and explore the compositionality of the mainstream VLMs.
1 code implementation • 27 May 2024 • Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks.
1 code implementation • 8 Oct 2023 • Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shu-Tao Xia
Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead.
1 code implementation • 2 Oct 2023 • Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan
We identify two crucial design principles: (1) Image tokens should be independent of 2D physical patch positions and instead be produced with a 1D causal dependency, exhibiting intrinsic interdependence that aligns with the left-to-right autoregressive prediction mechanism in LLMs.
1 code implementation • CVPR 2024 • Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell
Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions.
1 code implementation • 22 Aug 2023 • Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Hai-Tao Zheng, Shu-Tao Xia
We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation.
1 code implementation • 16 Jul 2023 • Yuying Ge, Yixiao Ge, Ziyun Zeng, Xintao Wang, Ying Shan
Research on image tokenizers has previously reached an impasse, as frameworks employing quantized visual tokens have lost prominence due to subpar performance and convergence in multimodal comprehension (compared to BLIP-2, etc.)
1 code implementation • 23 May 2023 • Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan
We argue that tuning a text encoder end-to-end, as done in previous work, is suboptimal since it may overfit in terms of styles, thereby losing its original generalization ability to capture the semantics of various language registers.
1 code implementation • 21 Nov 2022 • Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shutao Xia
To capture video semantic information for better hashing learning, we adopt an encoder-decoder structure to reconstruct the video from its temporal-masked frames.
1 code implementation • CVPR 2023 • Ziyun Zeng, Yuying Ge, Xihui Liu, Bin Chen, Ping Luo, Shu-Tao Xia, Yixiao Ge
Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years.
1 code implementation • 7 Feb 2022 • Jinpeng Wang, Bin Chen, Dongliang Liao, Ziyun Zeng, Gongfu Li, Shu-Tao Xia, Jin Xu
By performing Asymmetric-Quantized Contrastive Learning (AQ-CL) across views, HCQ aligns texts and videos at coarse-grained and multiple fine-grained levels.
no code implementations • 11 Sep 2021 • Ziyun Zeng, Jinpeng Wang, Bin Chen, Tao Dai, Shu-Tao Xia, Zhi Wang
To improve fine-grained image hashing, we propose Pyramid Hybrid Pooling Quantization (PHPQ).
1 code implementation • 11 Sep 2021 • Jinpeng Wang, Ziyun Zeng, Bin Chen, Tao Dai, Shu-Tao Xia
The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems.