Search Results for author: Zongxin Yang

Found 37 papers, 24 papers with code

Explore Synergistic Interaction Across Frames for Interactive Video Object Segmentation

no code implementations • 23 Jan 2024 • Kexin Li, Tao Jiang, Zongxin Yang, Yi Yang, Yueting Zhuang, Jun Xiao

Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction.

Interactive Video Object Segmentation Semantic Segmentation +1

Paper
Add Code

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation • 16 Jan 2024 • Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.

Scheduling

Paper
Code

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

no code implementations • 1 Jan 2024 • Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details.

Image to 3D Novel View Synthesis +1

Paper
Add Code

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation • 23 Dec 2023 • MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Paper
Code

Controllable 3D Face Generation with Conditional Style Code Diffusion

1 code implementation • 21 Dec 2023 • Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang

For 3D GAN inversion, we introduce two methods which aim to enhance the representation of style codes and alleviate 3D inconsistencies.

Data Augmentation Face Generation

Paper
Code

SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

no code implementations • 13 Dec 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

For geometry, we propose to constrain the optimized avatar in a decent global shape with a template avatar.

Prompt Engineering Text to 3D +1

Paper
Add Code

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

no code implementations • 10 Dec 2023 • Zechuan Zhang, Zongxin Yang, Yi Yang

A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction.

Paper
Add Code

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation • NeurIPS 2023 • Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

Decoder

Paper
Code

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation • 18 Sep 2023 • Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

Paper
Code

Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

1 code implementation • ICCV 2023 • Yuan Gan, Zongxin Yang, Xihang Yue, Lingyun Sun, Yi Yang

Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications.

Talking Head Generation

209

Paper
Code

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation • ICCV 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Ranked #11 on Visual Object Tracking on LaSOT

Object Representation Learning +6

Paper
Code

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation • ICCV 2023 • Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

Paper
Code

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

no code implementations • ICCV 2023 • Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.

Paper
Add Code

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

no code implementations • 13 Jul 2023 • Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.

Paper
Add Code

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

no code implementations • 5 Jul 2023 • Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang

MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.

Object Segmentation +4

Paper
Add Code

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

no code implementations • 5 Jul 2023 • Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.

Object Position +4

Paper
Add Code

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation • 3 Jul 2023 • Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Paper
Code

Shuffled Autoregression For Motion Interpolation

no code implementations • 10 Jun 2023 • Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing

However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.

Motion Interpolation

Paper
Add Code

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation • 17 May 2023 • Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Ranked #6 on Low-Light Image Enhancement on LOL

Denoising Image Generation +1

131

Paper
Code

Segment and Track Anything

1 code implementation • 11 May 2023 • Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

2,453

Paper
Code

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations • 8 May 2023 • Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

564

Paper
Code

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation • CVPR 2023 • Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

Ranked #46 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation

Paper
Code

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

no code implementations • CVPR 2023 • Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

Based on these two common practices, the key point of ProD is using the prompting mechanism in the transformer to disentangle the domain-general (DG) and domain-specific (DS) knowledge from the backbone feature.

Cross-Domain Few-Shot Domain Generalization +1

Paper
Add Code

FedSeg: Class-Heterogeneous Federated Learning for Semantic Segmentation

no code implementations • CVPR 2023 • Jiaxu Miao, Zongxin Yang, Leilei Fan, Yi Yang

In this work, we propose FedSeg, a basic federated learning approach for class-heterogeneous semantic segmentation.

Contrastive Learning Federated Learning +3

Paper
Add Code

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

2 code implementations • 18 Oct 2022 • Zongxin Yang, Yi Yang

To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

Ranked #1 on Semi-Supervised Video Object Segmentation on VOT2020

Object Semantic Segmentation +2

564

Paper
Code

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

1 code implementation • 5 Aug 2022 • Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei

In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.

Instance Segmentation Semantic Segmentation +1

Paper
Code

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation • 26 Jul 2022 • Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

Paper
Code

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

1 code implementation • 29 Mar 2022 • Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.

Contrastive Learning Semantic Segmentation +3

Paper
Code

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations • 22 Mar 2022 • Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Ranked #3 on Semi-Supervised Video Object Segmentation on YouTube-VOS 2019

Object Segmentation +3

564

Paper
Code

H2FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-Domain Weakly Supervised Object Detection

1 code implementation • CVPR 2022 • Yunqiu Xu, Yifan Sun, Zongxin Yang, Jiaxu Miao, Yi Yang

How to align the source and target domains is critical to the CDWSOD accuracy.

Ranked #1 on Weakly Supervised Object Detection on Clipart1k

Domain Adaptation object-detection +1

Paper
Code

Associating Objects with Transformers for Video Object Segmentation

2 code implementations • NeurIPS 2021 • Zongxin Yang, Yunchao Wei, Yi Yang

The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources.

Ranked #2 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)

Object One-shot visual object segmentation +2

564

Paper
Code

Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation

no code implementations • 2 Jun 2021 • Chen Liang, Yu Wu, Tianfei Zhou, Wenguan Wang, Zongxin Yang, Yunchao Wei, Yi Yang

Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference.

Object One-shot visual object segmentation +3

Paper
Add Code

DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency

no code implementations • CVPR 2021 • Zongxin Yang, Xin Yu, Yi Yang

In the first step, the framework learns to segment objects from real and synthetic data in a weakly-supervised fashion, and the segmentation masks will act as a prior for pose estimation.

Object Pose Estimation

Paper
Add Code

Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration

1 code implementation • 13 Oct 2020 • Zongxin Yang, Yunchao Wei, Yi Yang

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.

Ranked #26 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (test-dev)

Object One-shot visual object segmentation +3

323

Paper
Code

Collaborative Video Object Segmentation by Foreground-Background Integration

2 code implementations • ECCV 2020 • Zongxin Yang, Yunchao Wei, Yi Yang

This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.

Ranked #8 on Video Object Segmentation on YouTube-VOS 2019

Object One-shot visual object segmentation +3

1,422

Paper
Code

Very Long Natural Scenery Image Prediction by Outpainting

1 code implementation • ICCV 2019 • Zongxin Yang, Jian Dong, Ping Liu, Yi Yang, Shuicheng Yan

The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input.

Decoder Image Inpainting +1

Paper
Code

Gated Channel Transformation for Visual Recognition

3 code implementations • CVPR 2020 • Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

General Classification Image Classification +5

125

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.