Search Results for author: Yanhao Zhang

Found 21 papers, 9 papers with code

PoseAnimate: Zero-shot high fidelity pose controllable character animation

no code implementations • 21 Apr 2024 • Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi

Image-to-video(I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity with the source image. However, existing approaches suffer from character appearance inconsistency and poor preservation of fine details.

Paper
Add Code

LoopAnimate: Loopable Salient Object Animation

no code implementations • 14 Apr 2024 • Fanyi Wang, Peng Liu, Haotian Hu, Dan Meng, Jingwen Su, Jinjin Xu, Yanhao Zhang, Xiaoming Ren, Zhiwang Zhang

The proposed LoopAnimate, which for the first time extends the single-pass generation length of UNet-based video generation models to 35 frames while maintaining high-quality video generation.

Object Video Generation

Paper
Add Code

Homography Guided Temporal Fusion for Road Line and Marking Segmentation

2 code implementations • ICCV 2023 • Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li

Reliable segmentation of road lines and markings is critical to autonomous driving.

Autonomous Driving Segmentation

152

Paper
Code

Learning with Diversification from Block Sparse Signal

no code implementations • 7 Feb 2024 • Yanhao Zhang, Zhihan Zhu, Yong Xia

This paper introduces a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data.

Sparse Learning

Paper
Add Code

Lightweight high-resolution Subject Matting in the Real World

no code implementations • 12 Dec 2023 • Peng Liu, Fanyi Wang, Jingwen Su, Yanhao Zhang, GuoJun Qi

To alleviate these issues, we propose to construct a saliency object matting dataset HRSOM and a lightweight network PSUNet.

Image Matting object-detection +1

Paper
Add Code

BARET : Balanced Attention based Real image Editing driven by Target-text Inversion

no code implementations • 9 Dec 2023 • Yuming Qiao, Fanyi Wang, Jingwen Su, Yanhao Zhang, Yunjie Yu, Siyu Wu, Guo-Jun Qi

Image editing approaches with diffusion models have been rapidly developed, yet their applicability are subject to requirements such as specific editing types (e. g., foreground or background object editing, style transfer), multiple conditions (e. g., mask, sketch, caption), and time consuming fine-tuning of diffusion models.

Image Reconstruction Style Transfer

Paper
Add Code

View Consistent Purification for Accurate Cross-View Localization

no code implementations • ICCV 2023 • Shan Wang, Yanhao Zhang, Akhil Perincherry, Ankit Vora, Hongdong Li

This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images.

Pose Estimation

Paper
Add Code

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

1 code implementation • 8 Aug 2023 • Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.

Ranked #15 on Weakly-Supervised Semantic Segmentation on COCO 2014 val

Object Localization Weakly supervised Semantic Segmentation +1

Paper
Code

TALL: Thumbnail Layout for Deepfake Video Detection

1 code implementation • ICCV 2023 • Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He

This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies.

Face Swapping

Paper
Code

An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems

1 code implementation • 2 May 2023 • Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes

The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks.

Ranked #3 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Pseudo Label Weakly supervised Semantic Segmentation +1

Paper
Code

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

1 code implementation • CVPR 2023 • Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes

Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.

Contrastive Learning

Paper
Code

GAM : Gradient Attention Module of Optimization for Point Clouds Analysis

1 code implementation • 19 Mar 2023 • Haotian Hu, Fanyi Wang, Jingwen Su, Hongtao Zhou, Yaonong Wang, Laifeng Hu, Yanhao Zhang, Zhiwang Zhang

In point cloud analysis tasks, the existing local feature aggregation descriptors (LFAD) are unable to fully utilize information in the neighborhood of central points.

Paper
Code

Knowledge Distillation from Single to Multi Labels: an Empirical Study

1 code implementation • 15 Mar 2023 • Youcai Zhang, Yuzhuo Qin, Hengwei Liu, Yanhao Zhang, Yaqian Li, Xiaodong Gu

Knowledge distillation (KD) has been extensively studied in single-label image classification.

Classification Image Classification +2

Paper
Code

NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction

1 code implementation • 29 Sep 2022 • Ruyi Zha, Yanhao Zhang, Hongdong Li

This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction (Cone Beam Computed Tomography) that requires no external training data.

Ranked #2 on Novel View Synthesis on X3D

Low-Dose X-Ray Ct Reconstruction Novel View Synthesis

Paper
Code

Satellite Image Based Cross-view Localization for Autonomous Vehicle

no code implementations • 27 Jul 2022 • Shan Wang, Yanhao Zhang, Ankit Vora, Akhil Perincherry, Hongdong Li

This paper introduces a novel approach to cross-view localization that departs from the conventional image retrieval method.

Autonomous Vehicles Image Retrieval +1

Paper
Add Code

RCL: Recurrent Continuous Localization for Temporal Action Detection

no code implementations • CVPR 2022 • Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan

Temporal representation is the cornerstone of modern action detection techniques.

Action Detection

Paper
Add Code

Disentangled Representation Learning for Text-Video Retrieval

2 code implementations • 14 Mar 2022 • Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan, Xian-Sheng Hua

Cross-modality interaction is a critical component in Text-Video Retrieval (TVR), yet there has been little examination of how different influencing factors for computing interaction affect performance.

Ranked #10 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Retrieval +1

2,991

Paper
Code

Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce

no code implementations • 9 Feb 2021 • Yanhao Zhang, Qiang Wang, Pan Pan, Yun Zheng, Cheng Da, Siyang Sun, Yinghui Xu

Nowadays, live-stream and short video shopping in E-commerce have grown exponentially.

Retrieval Video-to-Shop

Paper
Add Code

Large-Scale Visual Search with Binary Distributed Graph at Alibaba

no code implementations • 9 Feb 2021 • Kang Zhao, Pan Pan, Yun Zheng, Yanhao Zhang, Changxu Wang, Yingya Zhang, Yinghui Xu, Rong Jin

For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.

graph construction

Paper
Add Code

Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance

no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Jianmin Wu, Yinghui Xu, Rong Jin

Benefiting from exploration of user click data, our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature.

Paper
Add Code

Visual Search at Alibaba

no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, Rong Jin

We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.

Image Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.