no code implementations • 14 Aug 2024 • Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding
Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms.
1 code implementation • 31 May 2024 • Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li
3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction.
no code implementations • 21 Apr 2024 • Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang
Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity.
no code implementations • 14 Apr 2024 • Fanyi Wang, Peng Liu, Haotian Hu, Dan Meng, Jingwen Su, Jinjin Xu, Yanhao Zhang, Xiaoming Ren, Zhiwang Zhang
The proposed LoopAnimate, which for the first time extends the single-pass generation length of UNet-based video generation models to 35 frames while maintaining high-quality video generation.
1 code implementation • ICCV 2023 • Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li
Reliable segmentation of road lines and markings is critical to autonomous driving.
no code implementations • 7 Feb 2024 • Yanhao Zhang, Zhihan Zhu, Yong Xia
This paper introduces a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data.
no code implementations • CVPR 2024 • Shan Wang, Chuong Nguyen, Jiawei Liu, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Kaihao Zhang, Hongdong Li
This paper presents a novel aerial-to-ground feature aggregation strategy tailored for the task of cross-view image-based geo-localization.
no code implementations • 12 Dec 2023 • Peng Liu, Fanyi Wang, Jingwen Su, Yanhao Zhang, GuoJun Qi
To alleviate these issues, we propose to construct a saliency object matting dataset HRSOM and a lightweight network PSUNet.
no code implementations • 9 Dec 2023 • Yuming Qiao, Fanyi Wang, Jingwen Su, Yanhao Zhang, Yunjie Yu, Siyu Wu, Guo-Jun Qi
Image editing approaches with diffusion models have been rapidly developed, yet their applicability are subject to requirements such as specific editing types (e. g., foreground or background object editing, style transfer), multiple conditions (e. g., mask, sketch, caption), and time consuming fine-tuning of diffusion models.
no code implementations • ICCV 2023 • Shan Wang, Yanhao Zhang, Akhil Perincherry, Ankit Vora, Hongdong Li
This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images.
1 code implementation • 8 Aug 2023 • Weixuan Sun, Yanhao Zhang, Zhen Qin, Zheyuan Liu, Lin Cheng, Fanyi Wang, Yiran Zhong, Nick Barnes
Given a pair of augmented views, our approach regularizes the activation intensities between a pair of augmented views, while also ensuring that the affinity across regions within each view remains consistent.
Ranked #16 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
Object Localization Weakly supervised Semantic Segmentation +1
1 code implementation • ICCV 2023 • Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He
This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies.
1 code implementation • 2 May 2023 • Weixuan Sun, Zheyuan Liu, Yanhao Zhang, Yiran Zhong, Nick Barnes
The Segment Anything Model (SAM) has demonstrated exceptional performance and versatility, making it a promising tool for various related tasks.
Ranked #3 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)
1 code implementation • CVPR 2023 • Weixuan Sun, Jiayi Zhang, Jianyuan Wang, Zheyuan Liu, Yiran Zhong, Tianpeng Feng, Yandong Guo, Yanhao Zhang, Nick Barnes
Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.
1 code implementation • 19 Mar 2023 • Haotian Hu, Fanyi Wang, Jingwen Su, Hongtao Zhou, Yaonong Wang, Laifeng Hu, Yanhao Zhang, Zhiwang Zhang
In point cloud analysis tasks, the existing local feature aggregation descriptors (LFAD) are unable to fully utilize information in the neighborhood of central points.
1 code implementation • 15 Mar 2023 • Youcai Zhang, Yuzhuo Qin, Hengwei Liu, Yanhao Zhang, Yaqian Li, Xiaodong Gu
Knowledge distillation (KD) has been extensively studied in single-label image classification.
1 code implementation • 29 Sep 2022 • Ruyi Zha, Yanhao Zhang, Hongdong Li
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction (Cone Beam Computed Tomography) that requires no external training data.
Ranked #2 on Novel View Synthesis on X3D
no code implementations • 27 Jul 2022 • Shan Wang, Yanhao Zhang, Ankit Vora, Akhil Perincherry, Hongdong Li
This paper introduces a novel approach to cross-view localization that departs from the conventional image retrieval method.
no code implementations • CVPR 2022 • Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan
Temporal representation is the cornerstone of modern action detection techniques.
2 code implementations • 14 Mar 2022 • Qiang Wang, Yanhao Zhang, Yun Zheng, Pan Pan, Xian-Sheng Hua
Cross-modality interaction is a critical component in Text-Video Retrieval (TVR), yet there has been little examination of how different influencing factors for computing interaction affect performance.
Ranked #10 on Video Retrieval on MSR-VTT-1kA (using extra training data)
no code implementations • 9 Feb 2021 • Yanhao Zhang, Qiang Wang, Pan Pan, Yun Zheng, Cheng Da, Siyang Sun, Yinghui Xu
Nowadays, live-stream and short video shopping in E-commerce have grown exponentially.
no code implementations • 9 Feb 2021 • Kang Zhao, Pan Pan, Yun Zheng, Yanhao Zhang, Changxu Wang, Yingya Zhang, Yinghui Xu, Rong Jin
For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods.
no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, Rong Jin
We hope visual search at Alibaba becomes more widely incorporated into today's commercial applications.
no code implementations • 9 Feb 2021 • Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Jianmin Wu, Yinghui Xu, Rong Jin
Benefiting from exploration of user click data, our networks are more effective to encode richer supervision and better distinguish real-shot images in terms of category and feature.