no code implementations • 10 Nov 2023 • Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan
We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
no code implementations • ICCV 2021 • Haiping Wu, Xiaolong Wang
In this paper, we propose a novel contrastive learning method which explores the cross-video relation by using cycle-consistency for general image representation learning.
14 code implementations • ICCV 2021 • Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
Ranked #3 on Image Classification on Flowers-102 (using extra training data)
no code implementations • AAAI 2020 • Haiping Wu, Bin Xiao
n this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image.
Ranked #21 on 3D Human Pose Estimation on MPI-INF-3DHP (PCK metric)
2 code implementations • ICCV 2019 • Haiping Wu, Yuntao Chen, Naiyan Wang, Zhao-Xiang Zhang
In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection.
Ranked #15 on Video Object Detection on ImageNet VID
27 code implementations • ECCV 2018 • Bin Xiao, Haiping Wu, Yichen Wei
There has been significant progress on pose estimation and increasing interests on pose tracking in recent years.
Ranked #2 on 2D Human Pose Estimation on JHMDB (2D poses only)