Search Results for author: Guan Huang

Found 49 papers, 21 papers with code

Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate

6 code implementations21 Apr 2018 Xiao Ma, Liqin Zhao, Guan Huang, Zhi Wang, Zelin Hu, Xiaoqiang Zhu, Kun Gai

To the best of our knowledge, this is the first public dataset which contains samples with sequential dependence of click and conversion labels for CVR modeling.

Click-Through Rate Prediction Recommendation Systems +2

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

3 code implementations CVPR 2020 Junjie Huang, Zheng Zhu, Feng Guo, Guan Huang, Dalong Du

Specifically, by investigating the standard data processing in state-of-the-art approaches mainly including coordinate system transformation and keypoint format transformation (i. e., encoding and decoding), we find that the results obtained by common flipping strategy are unaligned with the original ones in inference.

Pose Estimation

AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation

2 code implementations17 Aug 2020 Junjie Huang, Zheng Zhu, Guan Huang, Dalong Du

As AID successfully pushes the performance boundary of human pose estimation problem by considerable margin and sets a new state-of-the-art, we hope AID to be a regular configuration for training human pose estimators.

Multi-Person Pose Estimation

BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

1 code implementation31 Mar 2022 JunJie Huang, Guan Huang

Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms.

3D Object Detection object-detection

BEVPoolv2: A Cutting-edge Implementation of BEVDet Toward Deployment

1 code implementation30 Nov 2022 JunJie Huang, Guan Huang

We offer an example of deployment to the TensorRT backend in branch dev2. 0 and show how fast the BEVDet paradigm can be processed on it.

CAFE: Learning to Condense Dataset by Aligning Features

2 code implementations CVPR 2022 Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one.

Dataset Condensation

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

1 code implementation7 Apr 2022 Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie zhou

In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

Autonomous Driving Monocular Depth Estimation

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

1 code implementation15 Apr 2022 XiaoFeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He, Xingang Wang

Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently.

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

1 code implementation11 Apr 2022 Jiayu Zou, Junrui Xiao, Zheng Zhu, JunJie Huang, Guan Huang, Dalong Du, Xingang Wang

In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT).

Autonomous Driving Decision Making +2

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes

1 code implementation24 Mar 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

1 code implementation CVPR 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

1 code implementation CVPR 2023 XiaoFeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen, Xingang Wang

To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.

Depth Estimation Motion Forecasting

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

1 code implementation19 Aug 2022 XiaoFeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, Xingang Wang

In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints.

Depth Estimation

A Simple Baseline for Multi-Camera 3D Object Detection

1 code implementation22 Aug 2022 Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

Autonomous Driving Monocular 3D Object Detection +2

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

1 code implementation1 Jan 2023 Boyu Zhang, Wenbo Xu, Zheng Zhu, Guan Huang

Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively.

Autonomous Driving

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations10 Nov 2017 Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

no code implementations30 Jul 2015 Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, Rong Pan

We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM).

Distributed Computing TAG +3

Attention-guided Unified Network for Panoptic Segmentation

no code implementations CVPR 2019 Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang

This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.

Panoptic Segmentation Segmentation

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations14 Dec 2018 Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Action Recognition Multimodal Activity Recognition +3

State-aware Re-identification Feature for Multi-target Multi-camera Tracking

no code implementations4 Jun 2019 Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang

Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations15 Aug 2019 Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations26 Aug 2019 Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Object Visual Object Tracking +1

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

no code implementations14 Oct 2019 Junjie Huang, Zheng Zhu, Guan Huang

Human pose estimation are of importance for visual understanding tasks such as action recognition and human-computer interaction.

Action Recognition Multi-Person Pose Estimation +1

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

no code implementations CVPR 2021 Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, Jie zhou

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Attribute Face Recognition +1

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

no code implementations6 Apr 2021 Jiabin Zhang, Zheng Zhu, Jiwen Lu, JunJie Huang, Guan Huang, Jie zhou

To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE).

Human Detection Multi-Person Pose Estimation

Face-NMS: A Core-set Selection Approach for Efficient Face Recognition

no code implementations10 Sep 2021 Yunze Chen, JunJie Huang, Jiagang Zhu, Zheng Zhu, Tian Yang, Guan Huang, Dalong Du

The current research on this problem mainly focuses on designing an efficient Fully-connected layer (FC) to reduce GPU memory consumption caused by a large number of identities.

Face Recognition object-detection +1

Hand gesture detection in tests performed by older adults

no code implementations27 Oct 2021 Guan Huang, Son N. Tran, Quan Bai, Jane Alty

We have implemented a hand gesture detector to detect the gestures in the hand movement tests and our detection mAP is 0. 782 which is better than the state-of-the-art.

Joint Demand Prediction for Multimodal Systems: A Multi-task Multi-relational Spatiotemporal Graph Neural Network Approach

no code implementations15 Dec 2021 Yuebing Liang, Guan Huang, Zhan Zhao

Despite some recent efforts, existing approaches to multimodal demand prediction are generally not flexible enough to account for multiplex networks with diverse spatial units and heterogeneous spatiotemporal correlations across different modes.

Management

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

no code implementations21 Apr 2022 Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie zhou

For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively.

Face Recognition

Dimension Embeddings for Monocular 3D Object Detection

no code implementations CVPR 2022 Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Dalong Du, Jie zhou, Jiwen Lu

In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.

Monocular 3D Object Detection Object +1

A Comprehensive Review on Deep Supervision: Theories and Applications

no code implementations6 Jul 2022 Renjie Li, Xinyi Wang, Guan Huang, Wenli Yang, Kaining Zhang, Xiaotong Gu, Son N. Tran, Saurabh Garg, Jane Alty, Quan Bai

Deep supervision, or known as 'intermediate supervision' or 'auxiliary supervision', is to add supervision at hidden layers of a neural network.

Hybrid CNN -Interpreter: Interpret local and global contexts for CNN-based Models

no code implementations31 Oct 2022 Wenli Yang, Guan Huang, Renjie Li, Jiahao Yu, Yanyu Chen, Quan Bai, Beyong Kang

Convolutional neural network (CNN) models have seen advanced improvements in performance in various domains, but lack of interpretability is a major barrier to assurance and regulation during operation for acceptance and deployment of AI-assisted applications.

Feature Correlation

Cross-Mode Knowledge Adaptation for Bike Sharing Demand Prediction using Domain-Adversarial Graph Neural Networks

no code implementations16 Nov 2022 Yuebing Liang, Guan Huang, Zhan Zhao

Existing methods for bike sharing demand prediction are mostly based on its own historical demand variation, essentially regarding it as a closed system and neglecting the interaction between different transportation modes.

Deep trip generation with graph neural networks for bike sharing system expansion

no code implementations20 Mar 2023 Yuebing Liang, Fangyi Ding, Guan Huang, Zhan Zhao

For station-based BSSs, this means planning new stations based on existing ones over time, which requires prediction of the number of trips generated by these new stations across the whole system.

regression

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

no code implementations18 Sep 2023 XiaoFeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu

The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering.

Autonomous Driving Video Generation

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

no code implementations18 Jan 2024 XiaoFeng Wang, Zheng Zhu, Guan Huang, Boyuan Wang, Xinze Chen, Jiwen Lu

World models play a crucial role in understanding and predicting the dynamics of the world, which is essential for video generation.

Video Editing Video Generation

Generative Design of Crystal Structures by Point Cloud Representations and Diffusion Model

1 code implementation24 Jan 2024 Zhelin Li, Rami Mrad, Runxian Jiao, Guan Huang, Jun Shan, Shibing Chu, Yuanping Chen

Efficiently generating energetically stable crystal structures has long been a challenge in material design, primarily due to the immense arrangement of atoms in a crystal lattice.

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

no code implementations11 Mar 2024 Guosheng Zhao, XiaoFeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang

DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e. g., vehicles abruptly cut in) in a user-friendly manner.

Autonomous Driving Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.