Search Results for author: Yanhao Zhang

Found 24 papers, 10 papers with code

LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image

no code implementations14 Aug 2024 Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding

Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms.

Autonomous Driving Logical Reasoning +2

Zero-shot High-fidelity and Pose-controllable Character Animation

no code implementations21 Apr 2024 Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang

Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity.

LoopAnimate: Loopable Salient Object Animation

no code implementations14 Apr 2024 Fanyi Wang, Peng Liu, Haotian Hu, Dan Meng, Jingwen Su, Jinjin Xu, Yanhao Zhang, Xiaoming Ren, Zhiwang Zhang

The proposed LoopAnimate, which for the first time extends the single-pass generation length of UNet-based video generation models to 35 frames while maintaining high-quality video generation.

Object Video Generation

Learning with Diversification from Block Sparse Signal

no code implementations7 Feb 2024 Yanhao Zhang, Zhihan Zhu, Yong Xia

This paper introduces a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data.

Sparse Learning

Lightweight high-resolution Subject Matting in the Real World

no code implementations12 Dec 2023 Peng Liu, Fanyi Wang, Jingwen Su, Yanhao Zhang, GuoJun Qi

To alleviate these issues, we propose to construct a saliency object matting dataset HRSOM and a lightweight network PSUNet.

Image Matting object-detection +1

BARET : Balanced Attention based Real image Editing driven by Target-text Inversion

no code implementations9 Dec 2023 Yuming Qiao, Fanyi Wang, Jingwen Su, Yanhao Zhang, Yunjie Yu, Siyu Wu, Guo-Jun Qi

Image editing approaches with diffusion models have been rapidly developed, yet their applicability are subject to requirements such as specific editing types (e. g., foreground or background object editing, style transfer), multiple conditions (e. g., mask, sketch, caption), and time consuming fine-tuning of diffusion models.

Image Reconstruction Style Transfer

View Consistent Purification for Accurate Cross-View Localization

no code implementations ICCV 2023 Shan Wang, Yanhao Zhang, Akhil Perincherry, Ankit Vora, Hongdong Li

This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images.

Pose Estimation

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

1 code implementation8 Aug 2023 Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes

Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.

Object Localization Weakly supervised Semantic Segmentation +1

TALL: Thumbnail Layout for Deepfake Video Detection

1 code implementation ICCV 2023 Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He

This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies.

Face Swapping

Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

1 code implementation CVPR 2023 Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes

Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.

Contrastive Learning

GAM : Gradient Attention Module of Optimization for Point Clouds Analysis

1 code implementation19 Mar 2023 Haotian Hu, Fanyi Wang, Jingwen Su, Hongtao Zhou, Yaonong Wang, Laifeng Hu, Yanhao Zhang, Zhiwang Zhang

In point cloud analysis tasks, the existing local feature aggregation descriptors (LFAD) are unable to fully utilize information in the neighborhood of central points.

NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction

1 code implementation29 Sep 2022 Ruyi Zha, Yanhao Zhang, Hongdong Li

This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction (Cone Beam Computed Tomography) that requires no external training data.

Low-Dose X-Ray Ct Reconstruction Novel View Synthesis

Satellite Image Based Cross-view Localization for Autonomous Vehicle

no code implementations27 Jul 2022 Shan Wang, Yanhao Zhang, Ankit Vora, Akhil Perincherry, Hongdong Li

This paper introduces a novel approach to cross-view localization that departs from the conventional image retrieval method.

Autonomous Vehicles Image Retrieval +2

Disentangled Representation Learning for Text-Video Retrieval

2 code implementations14 Mar 2022 Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan, Xian-Sheng Hua

Cross-modality interaction is a critical component in Text-Video Retrieval (TVR), yet there has been little examination of how different influencing factors for computing interaction affect performance.

Ranked #10 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Representation Learning Retrieval +1

Large-Scale Visual Search with Binary Distributed Graph at Alibaba

no code implementations9 Feb 2021 Kang Zhao, Pan Pan, Yun Zheng, Yanhao Zhang, Changxu Wang, Yingya Zhang, Yinghui Xu, Rong Jin

For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.

graph construction

Visual Search at Alibaba

no code implementations9 Feb 2021 Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, Rong Jin

We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.

Image Retrieval

Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance

no code implementations9 Feb 2021 Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Jianmin Wu, Yinghui Xu, Rong Jin

Benefiting from exploration of user click data, our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature.

Cannot find the paper you are looking for? You can Submit a new open access paper.