Search Results for author: Zheng Zhu

Found 90 papers, 53 papers with code

DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving

no code implementations7 May 2024 Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, Liping Jing, Yiming Nie, Bin Dai

In this paper, we address this challenge by introducing a world model-based autonomous driving 4D representation learning framework, dubbed \emph{DriveWorld}, which is capable of pre-training from multi-camera driving videos in a spatio-temporal fashion.

3D Object Detection Motion Forecasting +4

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

1 code implementation6 May 2024 Zheng Zhu, XiaoFeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.

Autonomous Driving Decision Making +1

DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation

no code implementations11 Mar 2024 Guosheng Zhao, XiaoFeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang

DriveDreamer-2 is the first world model to generate customized driving videos, it can generate uncommon driving videos (e. g., vehicles abruptly cut in) in a user-friendly manner.

Autonomous Driving Language Modelling +2

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

no code implementations18 Jan 2024 XiaoFeng Wang, Zheng Zhu, Guan Huang, Boyuan Wang, Xinze Chen, Jiwen Lu

World models play a crucial role in understanding and predicting the dynamics of the world, which is essential for video generation.

Video Editing Video Generation

Generative Pretraining at Scale: Transformer-Based Encoding of Transactional Behavior for Fraud Detection

no code implementations22 Dec 2023 Ze Yu Zhao, Zheng Zhu, Guilin Li, Wenhan Wang, Bo wang

In this work, we introduce an innovative autoregressive model leveraging Generative Pretrained Transformer (GPT) architectures, tailored for fraud detection in payment systems.

Anomaly Detection Fraud Detection

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

1 code implementation1 Dec 2023 Xianda Guo, Juntao Lu, Chenming Zhang, Yiqi Wang, Yiqun Duan, Tian Yang, Zheng Zhu, Long Chen

Based on OpenStereo, we conducted experiments and have achieved or surpassed the performance metrics reported in the original paper.

Autonomous Driving Autonomous Navigation +1

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

1 code implementation9 Nov 2023 Licheng Wen, Xuemeng Yang, Daocheng Fu, XiaoFeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving.

Autonomous Driving Common Sense Reasoning +4

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching

1 code implementation23 Oct 2023 Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Kaipeng Zhang, Wei Jiang, Yang You

Dataset distillation plays a crucial role in creating compact datasets with similar training performance compared with original large-scale ones.

Transfer Learning

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

no code implementations18 Sep 2023 XiaoFeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu

The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering.

Autonomous Driving Video Generation

Introspective Deep Metric Learning

2 code implementations11 Sep 2023 Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images.

Image Retrieval Metric Learning

Evidential Detection and Tracking Collaboration: New Problem, Benchmark and Algorithm for Robust Anti-UAV System

1 code implementation27 Jun 2023 Xue-Feng Zhu, Tianyang Xu, Jian Zhao, Jia-Wei Liu, Kai Wang, Gang Wang, Jianan Li, Qiang Wang, Lei Jin, Zheng Zhu, Junliang Xing, Xiao-Jun Wu

Still, previous works have simplified such an anti-UAV task as a tracking problem, where the prior information of UAVs is always provided; such a scheme fails in real-world anti-UAV tasks (i. e. complex scenes, indeterminate-appear and -reappear UAVs, and real-time UAV surveillance).

One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception

no code implementations22 Jun 2023 Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, Wenjun Zeng

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC).

Depth Estimation Representation Learning

Multi-Prompt with Depth Partitioned Cross-Modal Learning

1 code implementation10 May 2023 Yingjie Tian, Yiqi Wang, Xianda Guo, Zheng Zhu, Long Chen

In recent years, soft prompt learning methods have been proposed to fine-tune large-scale vision-language pre-trained models for various downstream tasks.

Domain Generalization

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

2 code implementations ICCV 2023 Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.

3D Object Detection Autonomous Driving +2

DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

1 code implementation15 Mar 2023 Jiayu Zou, Zheng Zhu, Yun Ye, Xingang Wang

Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation.

3D Object Detection Autonomous Driving +3

A Simple Baseline for Supervised Surround-view Depth Estimation

no code implementations14 Mar 2023 Xianda Guo, Wenjie Yuan, Yunpeng Zhang, Tian Yang, Chenming Zhang, Zheng Zhu, Long Chen

The former is achieved by the self-attention module within each view, while the latter is realized by the adjacent attention module, which computes the attention across multi-cameras to exchange the multi-scale representations across surround-view feature maps.

Autonomous Driving Monocular Depth Estimation

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

1 code implementation9 Mar 2023 Yiqun Duan, Xianda Guo, Zheng Zhu

We propose DiffusionDepth, a new approach that reformulates monocular depth estimation as a denoising diffusion process.

Decoder Denoising +1

DiM: Distilling Dataset into Generative Model

2 code implementations8 Mar 2023 Kai Wang, Jianyang Gu, Daquan Zhou, Zheng Zhu, Wei Jiang, Yang You

To the best of our knowledge, we are the first to achieve higher accuracy on complex architectures than simple ones, such as 75. 1\% with ResNet-18 and 72. 6\% with ConvNet-3 on ten images per class of CIFAR-10.

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

1 code implementation ICCV 2023 XiaoFeng Wang, Zheng Zhu, Wenbo Xu, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu, Xingang Wang

Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.

Autonomous Driving Benchmarking

DREAM: Efficient Dataset Distillation by Representative Matching

2 code implementations ICCV 2023 Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Wei Jiang, Yang You

Although there are various matching objectives, currently the strategy for selecting original images is limited to naive random sampling.

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

1 code implementation CVPR 2023 Shuai Shen, Wenliang Zhao, Zibin Meng, Wanhua Li, Zheng Zhu, Jie zhou, Jiwen Lu

In this way, the proposed DiffTalk is capable of producing high-quality talking head videos in synchronization with the source audio, and more importantly, it can be naturally generalized across different identities without any further fine-tuning.

Denoising Talking Head Generation

Detachable Novel Views Synthesis of Dynamic Scenes Using Distribution-Driven Neural Radiance Fields

1 code implementation1 Jan 2023 Boyu Zhang, Wenbo Xu, Zheng Zhu, Guan Huang

Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively.

Autonomous Driving

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

1 code implementation CVPR 2023 XiaoFeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen, Xingang Wang

To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.

Depth Estimation Motion Forecasting

Token-Label Alignment for Vision Transformers

1 code implementation ICCV 2023 Han Xiao, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

Data mixing strategies (e. g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs).

Image Classification Semantic Segmentation +1

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions

1 code implementation ICCV 2023 Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

The pretrain-finetune paradigm in modern computer vision facilitates the success of self-supervised learning, which tends to achieve better transferability than supervised learning.

Image Classification object-detection +3

A Simple Baseline for Multi-Camera 3D Object Detection

1 code implementation22 Aug 2022 Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

Autonomous Driving Monocular 3D Object Detection +2

Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning

1 code implementation19 Aug 2022 XiaoFeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, Xingang Wang

In contrast, multi-frame depth estimation methods improve the depth accuracy thanks to the success of Multi-View Stereo (MVS), which directly makes use of geometric constraints.

Depth Estimation

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search

1 code implementation CVPR 2022 Han Xiao, Ziwei Wang, Zheng Zhu, Jie zhou, Jiwen Lu

Differentiable architecture search (DARTS) acquires the optimal architectures by optimizing the architecture parameters with gradient descent, which significantly reduces the search cost.

Neural Architecture Search

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

1 code implementation28 May 2022 Jianfei Yang, Xiangyu Peng, Kai Wang, Zheng Zhu, Jiashi Feng, Lihua Xie, Yang You

Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain.

Domain Adaptation Knowledge Distillation

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels

1 code implementation30 Apr 2022 Kai Wang, Xiangyu Peng, Shuo Yang, Jianfei Yang, Zheng Zhu, Xinchao Wang, Yang You

This paradigm, however, is prone to significant degeneration under heavy label noise, as the number of clean samples is too small for conventional methods to behave well.

Learning with noisy labels

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

no code implementations21 Apr 2022 Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie zhou

For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively.

Face Recognition

MVSTER: Epipolar Transformer for Efficient Multi-View Stereo

1 code implementation15 Apr 2022 XiaoFeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He, Xingang Wang

Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently.

HFT: Lifting Perspective Representations via Hybrid Feature Transformation

1 code implementation11 Apr 2022 Jiayu Zou, Junrui Xiao, Zheng Zhu, JunJie Huang, Guan Huang, Dalong Du, Xingang Wang

In order to reap the benefits and avoid the drawbacks of CBFT and CFFT, we propose a novel framework with a Hybrid Feature Transformation module (HFT).

Autonomous Driving Decision Making +2

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

1 code implementation7 Apr 2022 Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie zhou

In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

Autonomous Driving Monocular Depth Estimation

GaitStrip: Gait Recognition via Effective Strip-based Feature Representations and Multi-Level Framework

1 code implementation8 Mar 2022 Ming Wang, Beibei Lin, Xianda Guo, Lincheng Li, Zheng Zhu, Jiande Sun, Shunli Zhang, Xin Yu

ECM consists of the Spatial-Temporal feature extractor (ST), the Frame-Level feature extractor (FL) and SPB, and has two obvious advantages: First, each branch focuses on a specific representation, which can be used to improve the robustness of the network.

Gait Recognition

CAFE: Learning to Condense Dataset by Aligning Features

2 code implementations CVPR 2022 Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one.

Dataset Condensation

Dimension Embeddings for Monocular 3D Object Detection

no code implementations CVPR 2022 Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Dalong Du, Jie zhou, Jiwen Lu

In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.

Monocular 3D Object Detection Object +1

Face-NMS: A Core-set Selection Approach for Efficient Face Recognition

no code implementations10 Sep 2021 Yunze Chen, JunJie Huang, Jiagang Zhu, Zheng Zhu, Tian Yang, Guan Huang, Dalong Du

The current research on this problem mainly focuses on designing an efficient Fully-connected layer (FC) to reduce GPU memory consumption caused by a large number of identities.

Face Recognition object-detection +1

Masked Face Recognition Challenge: The InsightFace Track Report

1 code implementation18 Aug 2021 Jiankang Deng, Jia Guo, Xiang An, Zheng Zhu, Stefanos Zafeiriou

In this workshop, we organize Masked Face Recognition (MFR) challenge and focus on bench-marking deep face recognition methods under the existence of facial masks.

Face Recognition

Global Filter Networks for Image Classification

4 code implementations NeurIPS 2021 Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie zhou

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases.

Ranked #9 on Image Classification on Stanford Cars (using extra training data)

Classification Domain Generalization +1

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

1 code implementation CVPR 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

An Efficient Training Approach for Very Large Scale Face Recognition

1 code implementation CVPR 2022 Kai Wang, Shuo Wang, Panpan Zhang, Zhipeng Zhou, Zheng Zhu, Xiaobo Wang, Xiaojiang Peng, Baigui Sun, Hao Li, Yang You

This method adopts Dynamic Class Pool (DCP) for storing and updating the identities features dynamically, which could be regarded as a substitute for the FC layer.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

no code implementations6 Apr 2021 Jiabin Zhang, Zheng Zhu, Jiwen Lu, JunJie Huang, Guan Huang, Jie zhou

To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE).

Human Detection Multi-Person Pose Estimation

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes

1 code implementation24 Mar 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

no code implementations CVPR 2021 Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, Jie zhou

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Attribute Face Recognition +1

Joint predictions of multi-modal ride-hailing demands: a deep multi-task multigraph learning-based approach

no code implementations11 Nov 2020 Jintao Ke, Siyuan Feng, Zheng Zhu, Hai Yang, Jieping Ye

To address this issue, we propose a deep multi-task multi-graph learning approach, which combines two components: (1) multiple multi-graph convolutional (MGC) networks for predicting demands for different service modes, and (2) multi-task learning modules that enable knowledge sharing across multiple MGC networks.

Graph Learning Multi-Task Learning

PiaNet: A pyramid input augmented convolutional neural network for GGO detection in 3D lung CT scans

no code implementations11 Sep 2020 Weihua Liu, Xiabi Liua, Xiongbiao Luo, Murong Wang, Guanghui Han, Xinming Zhao, Zheng Zhu

In the first stage, the feature-extraction module is embedded into a classifier network that is trained on a large data set of GGO and non-GGO patches, which are generated by performing data augmentation from a small number of annotated CT scans.

Computed Tomography (CT) Data Augmentation +1

AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation

2 code implementations17 Aug 2020 Junjie Huang, Zheng Zhu, Guan Huang, Dalong Du

As AID successfully pushes the performance boundary of human pose estimation problem by considerable margin and sets a new state-of-the-art, we hope AID to be a regular configuration for training human pose estimators.

Multi-Person Pose Estimation

The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

3 code implementations CVPR 2020 Junjie Huang, Zheng Zhu, Feng Guo, Guan Huang, Dalong Du

Specifically, by investigating the standard data processing in state-of-the-art approaches mainly including coordinate system transformation and keypoint format transformation (i. e., encoding and decoding), we find that the results obtained by common flipping strategy are unaligned with the original ones in inference.

Pose Estimation

Predicting origin-destination ride-sourcing demand with a spatio-temporal encoder-decoder residual multi-graph convolutional network

1 code implementation17 Oct 2019 Jintao Ke, Xiaoran Qin, Hai Yang, Zhengfei Zheng, Zheng Zhu, Jieping Ye

To overcome this challenge, we propose the Spatio-Temporal Encoder-Decoder Residual Multi-Graph Convolutional network (ST-ED-RMGC), a novel deep learning model for predicting ride-sourcing demand of various OD pairs.

Decoder Management

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

no code implementations14 Oct 2019 Junjie Huang, Zheng Zhu, Guan Huang

Human pose estimation are of importance for visual understanding tasks such as action recognition and human-computer interaction.

Action Recognition Multi-Person Pose Estimation +1

The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera

no code implementations24 Sep 2019 Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang

In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.


EPOSIT: An Absolute Pose Estimation Method for Pinhole and Fish-Eye Cameras

1 code implementation19 Sep 2019 Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma

This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.

Pose Estimation

Human Following for Wheeled Robot with Monocular Pan-tilt Camera

no code implementations13 Sep 2019 Zheng Zhu, Hongxuan Ma, Wei Zou

Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.

Optical Flow Estimation Visual Tracking

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations26 Aug 2019 Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Object Visual Object Tracking +1

Camera Pose Correction in SLAM Based on Bias Values of Map Points

no code implementations24 Aug 2019 Zhaobing Kang, Wei Zou, Zheng Zhu

Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.

feature selection Pose Estimation

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations15 Aug 2019 Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

State-aware Re-identification Feature for Multi-target Multi-camera Tracking

no code implementations4 Jun 2019 Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang

Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations14 Dec 2018 Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Action Recognition Multimodal Activity Recognition +3

Identity-Enhanced Network for Facial Expression Recognition

no code implementations11 Dec 2018 Yanwei Li, Xingang Wang, Shilei Zhang, Lingxi Xie, Wenqi Wu, Hongyuan Yu, Zheng Zhu

Facial expression recognition is a challenging task, arguably because of large intra-class variations and high inter-class similarities.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Attention-guided Unified Network for Panoptic Segmentation

no code implementations CVPR 2019 Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang

This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level.

Panoptic Segmentation Segmentation

Multi-hierarchical Independent Correlation Filters for Visual Tracking

1 code implementation26 Nov 2018 Shuai Bai, Zhiqun He, Ting-Bing Xu, Zheng Zhu, Yuan Dong, Hongliang Bai

For visual tracking, most of the traditional correlation filters (CF) based methods suffer from the bottleneck of feature redundancy and lack of motion information.

Motion Estimation Visual Object Tracking +1

An Efficient Optical Flow Based Motion Detection Method for Non-stationary Scenes

no code implementations18 Nov 2018 Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Motion Detection Motion Detection In Non-Stationary Scenes +1

Optical Flow Based Online Moving Foreground Analysis

no code implementations18 Nov 2018 Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.

Clustering Moving Object Detection +2

Distractor-aware Siamese Networks for Visual Object Tracking

1 code implementation ECCV 2018 Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, Weiming Hu

During the off-line training phase, an effective sampling strategy is introduced to control this distribution and make the model focus on the semantic distractors.

Incremental Learning Object +2

Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes

no code implementations13 Jul 2018 Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu

Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Moving Object Detection object-detection +1

High Performance Visual Tracking With Siamese Region Proposal Network

5 code implementations CVPR 2018 Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, Xiaolin Hu

Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks.

Region Proposal Visual Object Tracking +2

End-to-end Video-level Representation Learning for Action Recognition

1 code implementation11 Nov 2017 Jiagang Zhu, Wei Zou, Zheng Zhu

From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.

Action Recognition Optical Flow Estimation +2

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations10 Nov 2017 Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

End-to-end Flow Correlation Tracking with Spatial-temporal Attention

no code implementations CVPR 2018 Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan

Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.

Optical Flow Estimation

Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

1 code implementation12 Sep 2017 Jiagang Zhu, Wei Zou, Zheng Zhu

For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.

Action Classification Action Recognition +3

Targeted Learning with Daily EHR Data

no code implementations27 May 2017 Oleg Sofrygin, Zheng Zhu, Julie A Schmittdiel, Alyce S. Adams, Richard W. Grant, Mark J. Van Der Laan, Romain Neugebauer

Electronic health records (EHR) data provide a cost and time-effective opportunity to conduct cohort studies of the effects of multiple time-point interventions in the diverse patient population found in real-world clinical settings.

Evolutionary Approaches to Optimization Problems in Chimera Topologies

no code implementations17 Aug 2016 Roberto Santana, Zheng Zhu, Helmut G. Katzgraber

In this paper we investigate for the first time the use of Evolutionary Algorithms (EAs) on Ising spin glass instances defined on the Chimera topology.

Evolutionary Algorithms

Cannot find the paper you are looking for? You can Submit a new open access paper.