Search Results for author: Guan Huang

Found 49 papers, 21 papers with code

Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate

6 code implementations • 21 Apr 2018 • Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, Kun Gai

To the best of our knowledge, this is the first public dataset which contains samples with sequential dependence of click and conversion labels for CVR modeling.

Click-Through Rate Prediction Recommendation Systems +2

7,342

Paper
Code

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

3 code implementations • CVPR 2020 • Junjie Huang, Zheng Zhu, Feng Guo, Guan Huang, Dalong Du

Specifically, by investigating the standard data processing in state-of-the-art approaches mainly including coordinate system transformation and keypoint format transformation (i. e., encoding and decoding), we find that the results obtained by common flipping strategy are unaligned with the original ones in inference.

Ranked #14 on Pose Estimation on COCO test-dev

Pose Estimation

4,971

Paper
Code

AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation

2 code implementations • 17 Aug 2020 • Junjie Huang, Zheng Zhu, Guan Huang, Dalong Du

As AID successfully pushes the performance boundary of human pose estimation problem by considerable margin and sets a new state-of-the-art, we hope AID to be a regular configuration for training human pose estimators.

Ranked #1 on Multi-Person Pose Estimation on COCO minival

Multi-Person Pose Estimation

4,971

Paper
Code

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

2 code implementations • 22 Dec 2021 • JunJie Huang, Guan Huang, Zheng Zhu, Yun Ye, Dalong Du

As a fast version, BEVDet-Tiny scores 31. 2% mAP and 39. 2% NDS on the nuScenes val set.

Ranked #20 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Autonomous Driving +3

1,261

Paper
Code

BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

1 code implementation • 31 Mar 2022 • JunJie Huang, Guan Huang

Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms.

Ranked #17 on 3D Object Detection on nuScenes Camera Only

3D Object Detection object-detection

1,261

Paper
Code

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

1 code implementation • 30 Nov 2022 • JunJie Huang, Guan Huang

We offer an example of deployment to the TensorRT backend in branch dev2. 0 and show how fast the BEVDet paradigm can be processed on it.

1,261

Paper
Code

CAFE: Learning to Condense Dataset by Aligning Features

2 code implementations • CVPR 2022 • Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one.

Dataset Condensation

1,153

Paper
Code

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

1 code implementation • CVPR 2022 • Yongming Rao, Wenliang Zhao, Guangyi Chen, Yansong Tang, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

In this work, we present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.

Image-text matching Instance Segmentation +6

487

Paper
Code

EANet: Enhancing Alignment for Cross-Domain Person Re-identification

3 code implementations • 29 Dec 2018 • Houjing Huang, Wenjie Yang, Xiaotang Chen, Xin Zhao, Kaiqi Huang, Jinbin Lin, Guan Huang, Dalong Du

Person re-identification (ReID) has achieved significant improvement under the single-domain setting.

Domain Adaptation Person Re-Identification

397

Paper
Code

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

1 code implementation • 19 May 2022 • Yunpeng Zhang, Zheng Zhu, Wenzhao Zheng, JunJie Huang, Guan Huang, Jie zhou, Jiwen Lu

Specifically, BEVerse first performs shared feature extraction and lifting to generate 4D BEV representations from multi-timestamp and multi-view images.

Ranked #15 on Robust Camera Only 3D Object Detection on nuScenes-C

3D Object Detection Autonomous Driving +4

364

Paper
Code

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

1 code implementation • 7 Apr 2022 • Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie zhou

In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

Autonomous Driving Monocular Depth Estimation

236

Paper
Code

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

1 code implementation • 15 Apr 2022 • XiaoFeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He, Xingang Wang

Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently.

178

Paper
Code

MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer

1 code implementation • 6 Aug 2022 • Chaoqiang Zhao, Youmin Zhang, Matteo Poggi, Fabio Tosi, Xianda Guo, Zheng Zhu, Guan Huang, Yang Tang, Stefano Mattoccia

Self-supervised monocular depth estimation is an attractive solution that does not require hard-to-source depth labels for training.

Ranked #1 on Monocular Depth Estimation on KITTI

Depth Prediction Monocular Depth Estimation +1

139

Paper
Code

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

1 code implementation • 11 Apr 2022 • Jiayu Zou, Junrui Xiao, Zheng Zhu, JunJie Huang, Guan Huang, Dalong Du, Xingang Wang

In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT).

Autonomous Driving Decision Making +2

120

Paper
Code

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes

1 code implementation • 24 Mar 2021 • Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Paper
Code

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

1 code implementation • CVPR 2021 • Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Paper
Code

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

1 code implementation • CVPR 2023 • XiaoFeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen, Xingang Wang

To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.

Depth Estimation Motion Forecasting

Paper
Code

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

1 code implementation • 19 Aug 2022 • XiaoFeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, Xingang Wang

In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints.

Depth Estimation

Paper
Code

A Simple Baseline for Multi-Camera 3D Object Detection

1 code implementation • 22 Aug 2022 • Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

Autonomous Driving Monocular 3D Object Detection +2

Paper
Code

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

1 code implementation • 1 Jan 2023 • Boyu Zhang, Wenbo Xu, Zheng Zhu, Guan Huang

Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively.

Autonomous Driving

Paper
Code

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations • 10 Nov 2017 • Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

Paper
Add Code

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

no code implementations • 30 Jul 2015 • Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, Rong Pan

We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM).

Distributed Computing TAG +3

Paper
Add Code

Attention-guided Unified Network for Panoptic Segmentation

no code implementations • CVPR 2019 • Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang

This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.

Ranked #24 on Panoptic Segmentation on COCO test-dev

Panoptic Segmentation Segmentation

Paper
Add Code

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations • 14 Dec 2018 • Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Ranked #1 on Action Recognition on UTD-MHAD

Action Recognition Multimodal Activity Recognition +3

Paper
Add Code

State-aware Re-identification Feature for Multi-target Multi-camera Tracking

no code implementations • 4 Jun 2019 • Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang

Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.

Paper
Add Code

Exploiting Offset-guided Network for Pose Estimation and Tracking

no code implementations • 4 Jun 2019 • Rui Zhang, Zheng Zhu, Peng Li, Rui Wu, Chaoxu Guo, Guan Huang, Hailun Xia

Human pose estimation has witnessed a significant advance thanks to the development of deep learning.

Human Detection Pose Estimation +1

Paper
Add Code

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations • 15 Aug 2019 • Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

Paper
Add Code

Instance Scale Normalization for image understanding

no code implementations • 20 Aug 2019 • Zewen He, He Huang, Yudong Wu, Guan Huang, Wensheng Zhang

Scale variation remains a challenging problem for object detection.

Instance Segmentation Object +5

Paper
Add Code

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations • 26 Aug 2019 • Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Object Visual Object Tracking +1

Paper
Add Code

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

no code implementations • 14 Oct 2019 • Junjie Huang, Zheng Zhu, Guan Huang

Human pose estimation are of importance for visual understanding tasks such as action recognition and human-computer interaction.

Action Recognition Multi-Person Pose Estimation +1

Paper
Add Code

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

no code implementations • CVPR 2021 • Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, Jie zhou

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol.

Ranked #1 on Face Verification on IJB-C (training dataset metric)

Attribute Face Recognition +1

Paper
Add Code

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

no code implementations • 6 Apr 2021 • Jiabin Zhang, Zheng Zhu, Jiwen Lu, JunJie Huang, Guan Huang, Jie zhou

To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE).

Human Detection Multi-Person Pose Estimation

Paper
Add Code

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

no code implementations • 7 Jun 2021 • Yiding Liu, Guan Huang, Jiaxiang Liu, Weixue Lu, Suqi Cheng, Yukun Li, Daiting Shi, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin

More importantly, we present a practical system workflow for deploying the model in web-scale retrieval.

Language Modelling Retrieval

Paper
Add Code

Masked Face Recognition Challenge: The WebFace260M Track Report

no code implementations • 16 Aug 2021 • Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jia Guo, Jiwen Lu, Dalong Du, Jie zhou

There are second phase of the challenge till October 1, 2021 and on-going leaderboard.

Face Recognition

Paper
Add Code

Face-NMS: A Core-set Selection Approach for Efficient Face Recognition

no code implementations • 10 Sep 2021 • Yunze Chen, JunJie Huang, Jiagang Zhu, Zheng Zhu, Tian Yang, Guan Huang, Dalong Du

The current research on this problem mainly focuses on designing an efficient Fully-connected layer (FC) to reduce GPU memory consumption caused by a large number of identities.

Face Recognition object-detection +1

Paper
Add Code

Hand gesture detection in tests performed by older adults

no code implementations • 27 Oct 2021 • Guan Huang, Son N. Tran, Quan Bai, Jane Alty

We have implemented a hand gesture detector to detect the gestures in the hand movement tests and our detection mAP is 0. 782 which is better than the state-of-the-art.

Paper
Add Code

Joint Demand Prediction for Multimodal Systems: A Multi-task Multi-relational Spatiotemporal Graph Neural Network Approach

no code implementations • 15 Dec 2021 • Yuebing Liang, Guan Huang, Zhan Zhao

Despite some recent efforts, existing approaches to multimodal demand prediction are generally not flexible enough to account for multiplex networks with diverse spatial units and heterogeneous spatiotemporal correlations across different modes.

Management

Paper
Add Code

Bike Sharing Demand Prediction based on Knowledge Sharing across Modes: A Graph-based Deep Learning Approach

no code implementations • 18 Mar 2022 • Yuebing Liang, Guan Huang, Zhan Zhao

Bike sharing is an increasingly popular part of urban transportation systems.

Paper
Add Code

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

no code implementations • 21 Apr 2022 • Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie zhou

For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively.

Face Recognition

Paper
Add Code

Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based Baseline

no code implementations • ICCV 2021 • Xianda Guo, Zheng Zhu, Tian Yang, Beibei Lin, JunJie Huang, Jiankang Deng, Guan Huang, Jie zhou, Jiwen Lu

To the best of our knowledge, this is the first large-scale dataset for gait recognition in the wild.

Gait Recognition in the Wild Neural Architecture Search

Paper
Add Code

Dimension Embeddings for Monocular 3D Object Detection

no code implementations • CVPR 2022 • Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Dalong Du, Jie zhou, Jiwen Lu

In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.

Monocular 3D Object Detection Object +1

Paper
Add Code

A Comprehensive Review on Deep Supervision: Theories and Applications

no code implementations • 6 Jul 2022 • Renjie Li, Xinyi Wang, Guan Huang, Wenli Yang, Kaining Zhang, Xiaotong Gu, Son N. Tran, Saurabh Garg, Jane Alty, Quan Bai

Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network.

Paper
Add Code

Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

no code implementations • 31 Oct 2022 • Wenli Yang, Guan Huang, Renjie Li, Jiahao Yu, Yanyu Chen, Quan Bai, Beyong Kang

Convolutional neural network (CNN) models have seen advanced improvements in performance in various domains, but lack of interpretability is a major barrier to assurance and regulation during operation for acceptance and deployment of AI-assisted applications.

Feature Correlation

Paper
Add Code

Cross-Mode Knowledge Adaptation for Bike Sharing Demand Prediction using Domain-Adversarial Graph Neural Networks

no code implementations • 16 Nov 2022 • Yuebing Liang, Guan Huang, Zhan Zhao

Existing methods for bike sharing demand prediction are mostly based on its own historical demand variation, essentially regarding it as a closed system and neglecting the interaction between different transportation modes.

Paper
Add Code

Deep trip generation with graph neural networks for bike sharing system expansion

no code implementations • 20 Mar 2023 • Yuebing Liang, Fangyi Ding, Guan Huang, Zhan Zhao

For station-based BSSs, this means planning new stations based on existing ones over time, which requires prediction of the number of trips generated by these new stations across the whole system.

regression

Paper
Add Code

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

no code implementations • 18 Sep 2023 • XiaoFeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu

The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering.

Autonomous Driving Video Generation

Paper
Add Code

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

no code implementations • 18 Jan 2024 • XiaoFeng Wang, Zheng Zhu, Guan Huang, Boyuan Wang, Xinze Chen, Jiwen Lu

World models play a crucial role in understanding and predicting the dynamics of the world, which is essential for video generation.

Video Editing Video Generation

Paper
Add Code

Generative Design of Crystal Structures by Point Cloud Representations and Diffusion Model

1 code implementation • 24 Jan 2024 • Zhelin Li, Rami Mrad, Runxian Jiao, Guan Huang, Jun Shan, Shibing Chu, Yuanping Chen

Efficiently generating energetically stable crystal structures has long been a challenge in material design, primarily due to the immense arrangement of atoms in a crystal lattice.

Paper
Code

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

no code implementations • 11 Mar 2024 • Guosheng Zhao, XiaoFeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang

DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e. g., vehicles abruptly cut in) in a user-friendly manner.

Autonomous Driving Language Modelling +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.