no code implementations • 23 Jan 2024 • Kexin Li, Tao Jiang, Zongxin Yang, Yi Yang, Yueting Zhuang, Jun Xiao
Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction.
Interactive Video Object Segmentation Semantic Segmentation +1
1 code implementation • 16 Jan 2024 • Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang
Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.
no code implementations • 1 Jan 2024 • Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang
Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details.
1 code implementation • 23 Dec 2023 • MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang
In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.
1 code implementation • 21 Dec 2023 • Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang
For 3D GAN inversion, we introduce two methods which aim to enhance the representation of style codes and alleviate 3D inconsistencies.
no code implementations • 13 Dec 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang
For geometry, we propose to constrain the optimized avatar in a decent global shape with a template avatar.
no code implementations • 10 Dec 2023 • Zechuan Zhang, Zongxin Yang, Yi Yang
A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction.
1 code implementation • NeurIPS 2023 • Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang
Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.
1 code implementation • 18 Sep 2023 • Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao
However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.
1 code implementation • ICCV 2023 • Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang
Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications.
1 code implementation • ICCV 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang
Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).
Ranked #11 on Visual Object Tracking on LaSOT
1 code implementation • ICCV 2023 • Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.
no code implementations • ICCV 2023 • Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang
However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.
no code implementations • 13 Jul 2023 • Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia
Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.
no code implementations • 5 Jul 2023 • Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang
MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.
no code implementations • 5 Jul 2023 • Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang
The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.
1 code implementation • 3 Jul 2023 • Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang
In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.
no code implementations • 10 Jun 2023 • Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing
However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.
1 code implementation • 17 May 2023 • Dewei Zhou, Zongxin Yang, Yi Yang
Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.
Ranked #6 on Low-Light Image Enhancement on LOL
1 code implementation • 11 May 2023 • Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang
This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.
2 code implementations • 8 May 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang
Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.
1 code implementation • CVPR 2023 • Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang
However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.
Ranked #46 on 3D Human Pose Estimation on 3DPW
no code implementations • CVPR 2023 • Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang
Based on these two common practices, the key point of ProD is using the prompting mechanism in the transformer to disentangle the domain-general (DG) and domain-specific (DS) knowledge from the backbone feature.
no code implementations • CVPR 2023 • Jiaxu Miao, Zongxin Yang, Leilei Fan, Yi Yang
In this work, we propose FedSeg, a basic federated learning approach for class-heterogeneous semantic segmentation.
2 code implementations • 18 Oct 2022 • Zongxin Yang, Yi Yang
To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.
Ranked #1 on Semi-Supervised Video Object Segmentation on VOT2020
1 code implementation • 5 Aug 2022 • Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei
In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.
1 code implementation • 26 Jul 2022 • Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang
While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.
1 code implementation • 29 Mar 2022 • Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang
By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.
2 code implementations • 22 Mar 2022 • Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang
This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).
1 code implementation • CVPR 2022 • Yunqiu Xu, Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang
How to align the source and target domains is critical to the CDWSOD accuracy.
Ranked #1 on Weakly Supervised Object Detection on Clipart1k
2 code implementations • NeurIPS 2021 • Zongxin Yang, Yunchao Wei, Yi Yang
The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources.
Ranked #2 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)
no code implementations • 2 Jun 2021 • Chen Liang, Yu Wu, Tianfei Zhou, Wenguan Wang, Zongxin Yang, Yunchao Wei, Yi Yang
Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference.
no code implementations • CVPR 2021 • Zongxin Yang, Xin Yu, Yi Yang
In the first step, the framework learns to segment objects from real and synthetic data in a weakly-supervised fashion, and the segmentation masks will act as a prior for pose estimation.
1 code implementation • 13 Oct 2020 • Zongxin Yang, Yunchao Wei, Yi Yang
This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.
2 code implementations • ECCV 2020 • Zongxin Yang, Yunchao Wei, Yi Yang
This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.
Ranked #8 on Video Object Segmentation on YouTube-VOS 2019
1 code implementation • ICCV 2019 • Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan
The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input.
3 code implementations • CVPR 2020 • Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang
This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.