Search Results for author: Zongxin Yang

Found 37 papers, 24 papers with code

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation16 Jan 2024 Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.

Scheduling

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

no code implementations1 Jan 2024 Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details.

Image to 3D Novel View Synthesis +1

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation23 Dec 2023 MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Controllable 3D Face Generation with Conditional Style Code Diffusion

1 code implementation21 Dec 2023 Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang

For 3D GAN inversion, we introduce two methods which aim to enhance the representation of style codes and alleviate 3D inconsistencies.

Data Augmentation Face Generation

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

no code implementations10 Dec 2023 Zechuan Zhang, Zongxin Yang, Yi Yang

A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction.

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation NeurIPS 2023 Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation18 Sep 2023 Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation ICCV 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Object Representation Learning +6

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation ICCV 2023 Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

no code implementations ICCV 2023 Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

no code implementations13 Jul 2023 Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

no code implementations5 Jul 2023 Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang

MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.

Object Segmentation +4

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

no code implementations5 Jul 2023 Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.

Object Position +4

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation3 Jul 2023 Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Shuffled Autoregression For Motion Interpolation

no code implementations10 Jun 2023 Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing

However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.

Motion Interpolation

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation17 May 2023 Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Denoising Image Generation +1

Segment and Track Anything

1 code implementation11 May 2023 Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations8 May 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation CVPR 2023 Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

3D human pose and shape estimation

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

no code implementations CVPR 2023 Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

Based on these two common practices, the key point of ProD is using the prompting mechanism in the transformer to disentangle the domain-general (DG) and domain-specific (DS) knowledge from the backbone feature.

Cross-Domain Few-Shot Domain Generalization +1

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

2 code implementations18 Oct 2022 Zongxin Yang, Yi Yang

To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

Object Semantic Segmentation +2

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

1 code implementation5 Aug 2022 Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei

In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.

Instance Segmentation Semantic Segmentation +1

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation26 Jul 2022 Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

1 code implementation29 Mar 2022 Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.

Contrastive Learning Semantic Segmentation +3

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations22 Mar 2022 Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Object Segmentation +3

Associating Objects with Transformers for Video Object Segmentation

2 code implementations NeurIPS 2021 Zongxin Yang, Yunchao Wei, Yi Yang

The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources.

Ranked #2 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)

Object One-shot visual object segmentation +2

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

no code implementations CVPR 2021 Zongxin Yang, Xin Yu, Yi Yang

In the first step, the framework learns to segment objects from real and synthetic data in a weakly-supervised fashion, and the segmentation masks will act as a prior for pose estimation.

Object Pose Estimation

Very Long Natural Scenery Image Prediction by Outpainting

1 code implementation ICCV 2019 Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input.

Image Inpainting Image Outpainting

Gated Channel Transformation for Visual Recognition

3 code implementations CVPR 2020 Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

General Classification Image Classification +5

Cannot find the paper you are looking for? You can Submit a new open access paper.