Search Results for author: Jiwen Lu

Found 232 papers, 144 papers with code

Deep Hashing with Active Pairwise Supervision

no code implementations ECCV 2020 Ziwei Wang, Quan Zheng, Jiwen Lu, Jie zhou

n this paper, we propose a Deep Hashing method with Active Pairwise Supervision(DH-APS).

Deep Hashing

Rotation-robust Intersection over Union for 3D Object Detection

no code implementations ECCV 2020 Yu Zheng, Danyang Zhang, Sinan Xie, Jiwen Lu, Jie zhou

In this paper, we propose a Rotation-robust Intersection over Union ($ extit{RIoU}$) for 3D object detection, which aims to jointly learn the overlap of rotated bounding boxes.

3D Object Detection Object +1

Temporal Coherence or Temporal Motion: Which is More Critical for Video-based Person Re-identification?

no code implementations ECCV 2020 Guangyi Chen, Yongming Rao, Jiwen Lu, Jie zhou

Specifically, we disentangle the video representation into the temporal coherence and motion parts and randomly change the scale of the temporal motion features as the adversarial noise.

Video-Based Person Re-Identification

Structural Deep Metric Learning for Room Layout Estimation

no code implementations ECCV 2020 Wenzhao Zheng, Jiwen Lu, Jie zhou

We employ a metric model and a layout encoder to map the RGB images and the ground-truth layouts to the embedding space, respectively, and a layout decoder to map the embeddings to the corresponding layouts, where the whole framework is trained in an end-to-end manner.

Decoder Metric Learning +1

Deep Credible Metric Learning for Unsupervised Domain Adaptation Person Re-identification

no code implementations ECCV 2020 Guangyi Chen, Yuhao Lu, Jiwen Lu, Jie Zhou

Experimental results demonstrate that our DCML method explores credible and valuable training data and improves the performance of unsupervised domain adaptation.

Metric Learning Person Re-Identification +2

EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models

no code implementations19 Mar 2025 Yinan Liang, Ziwei Wang, Xiuwei Xu, Jie zhou, Jiwen Lu

While multimodal large language models demonstrate strong performance in complex reasoning tasks, they pose significant challenges related to model complexity during deployment, especially for resource-limited devices.

MM-Vet Multimodal Reasoning +2

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

no code implementations18 Mar 2025 Minglei Shi, Ziyang Yuan, Haotian Yang, Xintao Wang, Mingwu Zheng, Xin Tao, Wenliang Zhao, Wenzhao Zheng, Jie zhou, Jiwen Lu, Pengfei Wan, Di Zhang, Kun Gai

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels.

Text-to-Image Generation

UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

no code implementations13 Mar 2025 Hang Yin, Xiuwei Xu, Lingqing Zhao, Ziwei Wang, Jie zhou, Jiwen Lu

Specifically, we conduct graph matching between the scene graph and goal graph at each time instant and propose different strategies to generate long-term goal of exploration according to different matching states.

Graph Matching

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

1 code implementation14 Feb 2025 Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie zhou, Jiwen Lu

To this end, we propose text-guided pruning (TGP) and completion-based addition (CBA) to deeply fuse 3D scene representation and text features in an efficient way by gradual region pruning and target completion.

3D Object Detection 3D visual grounding +1

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

1 code implementation6 Feb 2025 Zuyan Liu, Yuhao Dong, Jiahui Wang, Ziwei Liu, Winston Hu, Jiwen Lu, Yongming Rao

Our training pipeline begins with the most distinct modalities: image and text, then gradually expands the skill sets of the model using speech data that connects language and audio knowledge, and video data that connects all modalities.

cross-modal alignment Language Modeling +1

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

1 code implementation26 Jan 2025 Jiajun Dong, Chengkun Wang, Wenzhao Zheng, Lei Chen, Jiwen Lu, Yansong Tang

In this paper, we propose GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting as a solution.

Quantization

Preventing Local Pitfalls in Vector Quantization via Optimal Transport

1 code implementation19 Dec 2024 Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation.

Image Reconstruction Quantization

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

1 code implementation13 Dec 2024 Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie zhou, Jiwen Lu

3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings.

Autonomous Driving Prediction

Doe-1: Closed-Loop Autonomous Driving with Large World Model

1 code implementation12 Dec 2024 Wenzhao Zheng, Zetian Xia, Yuanhui Huang, Sicheng Zuo, Jie zhou, Jiwen Lu

In this paper, we explore a closed-loop framework for autonomous driving and propose a large Driving wOrld modEl (Doe-1) for unified perception, prediction, and planning.

Autonomous Driving Decision Making +4

Owl-1: Omni World Model for Consistent Long Video Generation

1 code implementation12 Dec 2024 Yuanhui Huang, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Di Zhang, Jie zhou, Jiwen Lu

As videos are observations of the underlying evolving world, we propose to model the long-term developments in a latent space and use VGMs to film them into videos.

Video Generation

GPD-1: Generative Pre-training for Driving

2 code implementations11 Dec 2024 Zixun Xie, Sicheng Zuo, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jie zhou, Jiwen Lu, Shanghang Zhang

We represent each scene with ego, agent, and map tokens and formulate autonomous driving as a unified token generation problem.

Autonomous Driving Decision Making +4

Bridging the Divide: Reconsidering Softmax and Linear Attention

1 code implementation9 Dec 2024 Dongchen Han, Yifan Pu, Zhuofan Xia, Yizeng Han, Xuran Pan, Xiu Li, Jiwen Lu, Shiji Song, Gao Huang

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs.

Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model

1 code implementation6 Dec 2024 Lening Wang, Wenzhao Zheng, Dalong Du, Yunpeng Zhang, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jie zhou, Jiwen Lu, Shanghang Zhang

To address these limitations, we propose a Spatial-Temporal simulAtion for drivinG (Stag-1) model to reconstruct real-world scenes and design a controllable generative network to achieve 4D simulation.

Autonomous Driving Scene Understanding +1

GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction

1 code implementation5 Dec 2024 Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jiwen Lu

To address this, we propose a probabilistic Gaussian superposition model which interprets each Gaussian as a probability distribution of its neighborhood being occupied and conforms to probabilistic multiplication to derive the overall geometry.

3D Semantic Occupancy Prediction Autonomous Driving

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

2 code implementations5 Dec 2024 Yuqi Wu, Wenzhao Zheng, Sicheng Zuo, Yuanhui Huang, Jie zhou, Jiwen Lu

3D occupancy prediction provides a comprehensive description of the surrounding scenes and has become an essential task for 3D perception.

Prediction Scene Understanding

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

1 code implementation20 Nov 2024 Ziyi Wang, Yanbo Wang, Xumin Yu, Jie zhou, Jiwen Lu

In our approach, we developed a mask generator based on the denoising UNet from a pre-trained diffusion model, leveraging its capability for precise textual control over dense pixel representations and enhancing the open-world adaptability of the generated masks.

3D geometry 3D Semantic Segmentation +3

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

1 code implementation24 Oct 2024 Xin Fei, Wenzhao Zheng, Yueqi Duan, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Jiwen Lu

We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views.

GlobalMamba: Global Image Serialization for Vision Mamba

1 code implementation14 Oct 2024 Chengkun Wang, Wenzhao Zheng, Jie zhou, Jiwen Lu

In this paper, we propose a global image serialization method to transform the image into a sequence of causal tokens, which contain global information of the 2D image.

Image Classification Mamba +3

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

1 code implementation14 Oct 2024 Chengkun Wang, Wenzhao Zheng, Yuanhui Huang, Jie zhou, Jiwen Lu

Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM).

Instance Segmentation Mamba +4

Q-VLM: Post-training Quantization for Large Vision-Language Models

1 code implementation10 Oct 2024 Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

On the contrary, we mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy searching with low search cost.

Language Modeling Language Modelling +1

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation

no code implementations10 Oct 2024 Hang Yin, Xiuwei Xu, Zhenyu Wu, Jie zhou, Jiwen Lu

Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects, which lacks enough scene context for in-depth reasoning.

Object

OPONeRF: One-Point-One NeRF for Robust Neural Rendering

1 code implementation30 Sep 2024 Yu Zheng, Yueqi Duan, Kangfu Zheng, Hongru Yan, Jiwen Lu, Jie zhou

In this paper, we propose a One-Point-One NeRF (OPONeRF) framework for robust scene rendering.

NeRF Neural Rendering

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

1 code implementation26 Sep 2024 Wenliang Zhao, Minglei Shi, Xumin Yu, Jie zhou, Jiwen Lu

By integrating FlowTurbo into different flow-based models, we obtain an acceleration ratio of 53. 1%$\sim$58. 3% on class-conditional generation and 29. 8%$\sim$38. 5% on text-to-image generation.

Text-to-Image Generation

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

1 code implementation5 Sep 2024 Wenliang Zhao, Haolin Wang, Jie zhou, Jiwen Lu

Diffusion probabilistic models (DPMs) have shown remarkable performance in visual synthesis but are computationally expensive due to the need for multiple evaluations during the sampling.

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

1 code implementation21 Aug 2024 Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie zhou, Jiwen Lu

In this paper, we aim to leverage Segment Anything Model (SAM) for real-time 3D instance segmentation in an online setting.

3D Instance Segmentation Semantic Segmentation

Temporal Feature Matters: A Framework for Diffusion Model Quantization

no code implementations28 Jul 2024 Yushi Huang, Ruihao Gong, Xianglong Liu, Jing Liu, Yuhang Li, Jiwen Lu, DaCheng Tao

However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising.

Denoising Image Generation +1

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

1 code implementation25 Jul 2024 Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Yongming Rao, Ranjay Krishna, Jiwen Lu

Surrounding less important caches are then merged with these anchors, enhancing the preservation of contextual information in the KV caches while yielding an arbitrary acceleration ratio.

Instruction Following Text Generation

GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

1 code implementation21 Jun 2024 Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang

GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations.

3D Generation

Embodied Instruction Following in Unknown Environments

no code implementations17 Jun 2024 Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan

For the task planner, we generate the feasible step-by-step plans for human goal accomplishment according to the task completion process and the known visual clues.

Instruction Following Task Planning

FlowIE: Efficient Image Enhancement via Rectified Flow

1 code implementation CVPR 2024 Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang, Jie zhou, Jiwen Lu

Image enhancement holds extensive applications in real-world scenarios due to complex environments and limitations of imaging devices.

Image Enhancement

OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving

1 code implementation30 May 2024 Lening Wang, Wenzhao Zheng, Yilong Ren, Han Jiang, Zhiyong Cui, Haiyang Yu, Jiwen Lu

To address this, we propose a diffusion-based 4D occupancy generation model, OccSora, to simulate the development of the 3D world for autonomous driving.

Autonomous Driving Decision Making

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

1 code implementation27 May 2024 Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie zhou, Jiwen Lu

To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features.

3D Semantic Occupancy Prediction Autonomous Driving +2

Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection

1 code implementation27 May 2024 Shuai Zeng, Wenzhao Zheng, Jiwen Lu, Haibin Yan

While conventional methods focus on generating pseudo-labels for unlabeled samples as supplements for training, the structural nature of 3D point cloud data facilitates the composition of objects and backgrounds to synthesize realistic scenes.

3D Object Detection Autonomous Driving +1

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

1 code implementation6 May 2024 Zheng Zhu, XiaoFeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.

Autonomous Driving Decision Making +2

X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

1 code implementation CVPR 2024 Shuofeng Sun, Yongming Rao, Jiwen Lu, Haibin Yan

However, we contend that such implicit high-dimensional structure modeling approch inadequately represents the local geometric structure of point clouds due to the absence of explicit structural information.

Segmentation

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

1 code implementation CVPR 2024 Yixuan Zhu, Ao Li, Yansong Tang, Wenliang Zhao, Jie zhou, Jiwen Lu

The recovery of occluded human meshes presents challenges for current methods due to the difficulty in extracting effective image features under severe occlusion.

Denoising Human Mesh Recovery

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

1 code implementation CVPR 2024 Hancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang

Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint.

Dimensionality Reduction

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

1 code implementation19 Mar 2024 Zuyan Liu, Yuhao Dong, Yongming Rao, Jie zhou, Jiwen Lu

In the realm of vision-language understanding, the proficiency of models in interpreting and reasoning over visual content has become a cornerstone for numerous applications.

visual instruction following Visual Question Answering

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

1 code implementation16 Mar 2024 Zhiheng Li, Muheng Li, Jixuan Fan, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou

The appearance embedding models the characteristics of low-resolution inputs to deal with photometric variations at different scales, and the pixel-based deformation field learns RGB differences which result from the deviations between the real-world and simulated degradations at arbitrary coordinates.

Super-Resolution

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

1 code implementation13 Mar 2024 Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang

Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots.

Simulated Gaussian Manipulation

Memory-based Adapters for Online 3D Scene Perception

no code implementations CVPR 2024 Xiuwei Xu, Chong Xia, Ziwei Wang, Linqing Zhao, Yueqi Duan, Jie zhou, Jiwen Lu

To this end, we propose an adapter-based plug-and-play module for the backbone of 3D scene perception model, which constructs memory to cache and aggregate the extracted RGB-D features to empower offline models with temporal learning ability.

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

1 code implementation CVPR 2024 JianJian Cao, Peng Ye, Shengze Li, Chong Yu, Yansong Tang, Jiwen Lu, Tao Chen

To this end, we propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs.

Path Choice Matters for Clear Attribution in Path Methods

1 code implementation19 Jan 2024 Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

Rigorousness and clarity are both essential for interpretations of DNNs to engender human trust.

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

no code implementations18 Jan 2024 XiaoFeng Wang, Zheng Zhu, Guan Huang, Boyuan Wang, Xinze Chen, Jiwen Lu

World models play a crucial role in understanding and predicting the dynamics of the world, which is essential for video generation.

Video Editing Video Generation

MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection

no code implementations CVPR 2024 Haowen Sun, Yueqi Duan, Juncheng Yan, Yifan Liu, Jiwen Lu

Nowadays leveraging 2D images and pre-trained models to guide 3D point cloud feature representation has shown a remarkable potential to boost the performance of 3D fundamental models.

Point Cloud Segmentation Scene Segmentation

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

no code implementations12 Dec 2023 Guanxing Lu, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang

Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.

Instruction Following

Segment and Caption Anything

1 code implementation CVPR 2024 Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

Caption Generation object-detection +2

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

1 code implementation27 Nov 2023 Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu

In this paper, we explore a new framework of learning a world model, OccWorld, in the 3D Occupancy space to simultaneously predict the movement of the ego car and the evolution of the surrounding scenes.

Autonomous Driving

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

1 code implementation CVPR 2024 Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie zhou, Jiwen Lu

Our SelfOcc outperforms the previous best method SceneRF by 58. 7% using a single frame as input on SemanticKITTI and is the first self-supervised work that produces reasonable 3D occupancy for surround cameras on nuScenes.

Autonomous Driving Monocular Depth Estimation +1

Fast Shapley Value Estimation: A Unified Approach

1 code implementation2 Nov 2023 Borui Zhang, Baotong Tian, Wenzhao Zheng, Jie zhou, Jiwen Lu

Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks.

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

1 code implementation NeurIPS 2023 Yinan Liang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI.

Image Classification

Anyview: Generalizable Indoor 3D Object Detection with Variable Frames

no code implementations9 Oct 2023 Zhenyu Wu, Xiuwei Xu, Ziwei Wang, Chong Xia, Linqing Zhao, Jiwen Lu, Haibin Yan

Existing methods only consider fixed frames of input data for a single detector, such as monocular RGB-D images or point clouds reconstructed from dense multi-view RGB-D images.

3D Object Detection Object +2

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

1 code implementation ICCV 2023 Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou

By this means, our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space.

TCOVIS: Temporally Consistent Online Video Instance Segmentation

1 code implementation ICCV 2023 Junlong Li, Bingyao Yu, Yongming Rao, Jie zhou, Jiwen Lu

The core of our method consists of a global instance assignment strategy and a spatio-temporal enhancement module, which improve the temporal consistency of the features from two aspects.

Instance Segmentation Semantic Segmentation +1

DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

1 code implementation18 Sep 2023 XiaoFeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu

The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering.

Autonomous Driving Video Generation

Introspective Deep Metric Learning

2 code implementations11 Sep 2023 Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images.

Image Retrieval Metric Learning

PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction

1 code implementation31 Aug 2023 Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie zhou, Jiwen Lu

To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently.

3D Semantic Occupancy Prediction Autonomous Driving +2

Embodied Task Planning with Large Language Models

1 code implementation4 Jul 2023 Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan

Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments.

Task Planning

Towards Accurate Post-training Quantization for Diffusion Models

1 code implementation CVPR 2024 Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

On the contrary, we design group-wise quantization functions for activation discretization in different timesteps and sample the optimal timestep for informative calibration image generation, so that our quantized diffusion model can reduce the discretization errors with negligible computational overhead.

Data Free Quantization Image Generation

3D Small Object Detection with Dynamic Spatial Pruning

1 code implementation5 May 2023 Xiuwei Xu, Zhihao Sun, Ziwei Wang, Hongmin Liu, Jie zhou, Jiwen Lu

Specifically, we theoretically derive a dynamic spatial pruning (DSP) strategy to prune the redundant spatial representation of 3D scene in a cascade manner according to the distribution of objects.

3D Object Detection Decoder +3

Dense Hybrid Proposal Modulation for Lane Detection

1 code implementation28 Apr 2023 Yuejian Wu, Linqing Zhao, Jiwen Lu, Haibin Yan

In addition to the shape and location constraints, we design a quality-aware classification loss to adaptively supervise each positive proposal so that the discriminative power can be further boosted.

Lane Detection

Learning Accurate Performance Predictors for Ultrafast Automated Model Compression

1 code implementation13 Apr 2023 Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie zhou

On the contrary, we obtain the optimal efficient networks by directly optimizing the compression policy with an accurate performance predictor, where the ultrafast automated model compression for various computational cost constraint is achieved without complex compression policy search and evaluation.

Image Classification Model Compression +3

LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images

1 code implementation11 Apr 2023 Hui Li, Tianyang Xu, Xiao-Jun Wu, Jiwen Lu, Josef Kittler

In particular we adopt a learnable representation approach to the fusion task, in which the construction of the fusion network architecture is guided by the optimisation algorithm producing the learnable model.

Representation Learning

Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis

no code implementations CVPR 2023 Xiuwei Xu, Ziwei Wang, Jie zhou, Jiwen Lu

In this paper, we propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis.

Binarization Quantization

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

2 code implementations ICCV 2023 Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.

3D Object Detection Autonomous Driving +3

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

1 code implementation ICCV 2023 XiaoFeng Wang, Zheng Zhu, Wenbo Xu, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu, Xingang Wang

Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.

Autonomous Driving Benchmarking +1

Unleashing Text-to-Image Diffusion Models for Visual Perception

2 code implementations ICCV 2023 Wenliang Zhao, Yongming Rao, Zuyan Liu, Benlin Liu, Jie zhou, Jiwen Lu

In this paper, we propose VPD (Visual Perception with a pre-trained Diffusion model), a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.

Denoising Image Segmentation +4

Category-level Shape Estimation for Densely Cluttered Objects

no code implementations23 Feb 2023 Zhenyu Wu, Ziwei Wang, Jiwen Lu, Haibin Yan

Then we fuse the feature maps representing the visual information of multi-view RGB images and the pixel affinity learned from the clutter point cloud, where the acquired instance segmentation masks of multi-view RGB images are projected to partition the clutter point cloud.

Instance Segmentation Object +3

AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware Transformers

1 code implementation11 Jan 2023 Xumin Yu, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie zhou

In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.

Denoising Inductive Bias +1

DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation

1 code implementation CVPR 2023 Shuai Shen, Wenliang Zhao, Zibin Meng, Wanhua Li, Zheng Zhu, Jie zhou, Jiwen Lu

In this way, the proposed DiffTalk is capable of producing high-quality talking head videos in synchronization with the source audio, and more importantly, it can be naturally generalized across different identities without any further fine-tuning.

Denoising Talking Head Generation

Deep Factorized Metric Learning

1 code implementation CVPR 2023 Chengkun Wang, Wenzhao Zheng, Junlong Li, Jie zhou, Jiwen Lu

Learning a generalizable and comprehensive similarity metric to depict the semantic discrepancies between images is the foundation of many computer vision tasks.

Image Classification Metric Learning

DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion

1 code implementation CVPR 2023 Wenliang Zhao, Yongming Rao, Weikang Shi, Zuyan Liu, Jie zhou, Jiwen Lu

Unlike previous work that relies on carefully designed network architectures and loss functions to fuse the information from the source and target faces, we reformulate the face swapping as a conditional inpainting task, performed by a powerful diffusion model guided by the desired face attributes (e. g., identity and landmarks).

Face Swapping

CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

no code implementations ICCV 2023 Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie zhou, Jiwen Lu

Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance.

Attribute Clustering +2

Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

1 code implementation18 Dec 2022 Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

Deep learning has revolutionized human society, yet the black-box nature of deep neural networks hinders further application to reliability-demanded industries.

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction

1 code implementation CVPR 2023 Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie zhou, Xiu Li

With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision.

Action Generation Action Recognition +2

Diffusion-SDF: Text-to-Shape via Voxelized Diffusion

1 code implementation CVPR 2023 Muheng Li, Yueqi Duan, Jie zhou, Jiwen Lu

With the rising industrial attention to 3D virtual modeling technology, generating novel 3D content based on specified conditions (e. g. text) has become a hot issue.

Cross-Modal Adapter for Text-Video Retrieval

1 code implementation17 Nov 2022 Haojun Jiang, Jianke Zhang, Rui Huang, Chunjiang Ge, Zanlin Ni, Jiwen Lu, Jie zhou, Shiji Song, Gao Huang

However, as pre-trained models are scaling up, fully fine-tuning them on text-video retrieval datasets has a high risk of overfitting.

parameter-efficient fine-tuning Retrieval +1

Planning Irregular Object Packing via Hierarchical Reinforcement Learning

no code implementations17 Nov 2022 Sichao Huang, Ziwei Wang, Jie zhou, Jiwen Lu

We compare our approach with existing robotic packing methods for irregular objects in a physics simulator.

Hierarchical Reinforcement Learning Object +4

Probabilistic Deep Metric Learning for Hyperspectral Image Classification

1 code implementation15 Nov 2022 Chengkun Wang, Wenzhao Zheng, Xian Sun, Jiwen Lu, Jie zhou

We propose to learn a global probabilistic distribution for each pixel in the patch and a probabilistic metric to model the distance between distributions.

Classification Hyperspectral Image Classification +1

Dynamics-aware Adversarial Attack of Adaptive Neural Networks

1 code implementation15 Oct 2022 An Tao, Yueqi Duan, Yingqi Wang, Jiwen Lu, Jie zhou

To address this issue, we propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.

Adversarial Attack Computational Efficiency

Token-Label Alignment for Vision Transformers

1 code implementation ICCV 2023 Han Xiao, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

Data mixing strategies (e. g., CutMix) have shown the ability to greatly improve the performance of convolutional neural networks (CNNs).

Image Classification Semantic Segmentation +1

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions

1 code implementation ICCV 2023 Chengkun Wang, Wenzhao Zheng, Zheng Zhu, Jie zhou, Jiwen Lu

The pretrain-finetune paradigm in modern computer vision facilitates the success of self-supervised learning, which tends to achieve better transferability than supervised learning.

Image Classification object-detection +3

A Simple Baseline for Multi-Camera 3D Object Detection

1 code implementation22 Aug 2022 Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Jie zhou, Jiwen Lu

First, we extract multi-scale features and generate the perspective object proposals on each monocular image.

Autonomous Driving Monocular 3D Object Detection +2

Shap-CAM: Visual Explanations for Convolutional Neural Networks based on Shapley Value

no code implementations7 Aug 2022 Quan Zheng, Ziwei Wang, Jie zhou, Jiwen Lu

Explaining deep convolutional neural networks has been recently drawing increasing attention since it helps to understand the networks' internal operations and why they make certain decisions.

Decision Making Fairness

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

8 code implementations28 Jul 2022 Yongming Rao, Wenliang Zhao, Yansong Tang, Jie zhou, Ser-Nam Lim, Jiwen Lu

In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework.

Image Classification Object Detection +2

Learning Series-Parallel Lookup Tables for Efficient Image Super-Resolution

1 code implementation26 Jul 2022 Cheng Ma, Jingyi Zhang, Jie zhou, Jiwen Lu

On the other hand, we propose a parallel network which includes two branches of cascaded lookup tables which process different components of the input low-resolution images.

Image Super-Resolution

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

1 code implementation24 Jul 2022 Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie zhou, Jiwen Lu

Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images.

3D geometry NeRF +2

Label2Label: A Language Modeling Framework for Multi-Attribute Learning

1 code implementation18 Jul 2022 Wanhua Li, Zhexuan Cao, Jianjiang Feng, Jie zhou, Jiwen Lu

As each sample is annotated with multiple attribute labels, these "words" will naturally form an unordered but meaningful "sentence", which depicts the semantic information of the corresponding sample.

Attribute Clothing Attribute Recognition +5

Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition

1 code implementation17 Jul 2022 Yansong Tang, Xingyu Liu, Xumin Yu, Danyang Zhang, Jiwen Lu, Jie zhou

Different from the conventional adversarial learning-based approaches for UDA, we utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.

Action Recognition Self-Supervised Learning +2

MetaAge: Meta-Learning Personalized Age Estimators

1 code implementation12 Jul 2022 Wanhua Li, Jiwen Lu, Abudukelimu Wuerkaixi, Jianjiang Feng, Jie zhou

Unlike most existing personalized methods that learn the parameters of a personalized estimator for each person in the training set, our method learns the mapping from identity information to age estimator parameters.

Age Estimation Meta-Learning +1

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

1 code implementation4 Jul 2022 Yongming Rao, Zuyan Liu, Wenliang Zhao, Jie zhou, Jiwen Lu

We extend our method to hierarchical models including CNNs and hierarchical vision Transformers as well as more complex dense prediction tasks that require structured feature maps by formulating a more generic dynamic spatial sparsification framework with progressive sparsification and asymmetric computation for different spatial locations.

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search

1 code implementation CVPR 2022 Han Xiao, Ziwei Wang, Zheng Zhu, Jie zhou, Jiwen Lu

Differentiable architecture search (DARTS) acquires the optimal architectures by optimizing the architecture parameters with gradient descent, which significantly reduces the search cost.

Neural Architecture Search

SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

1 code implementation CVPR 2022 Ziyi Wang, Yongming Rao, Xumin Yu, Jie zhou, Jiwen Lu

Conventional point cloud semantic segmentation methods usually employ an encoder-decoder architecture, where mid-level features are locally aggregated to extract geometric information.

Decoder Image Segmentation +3

Introspective Deep Metric Learning for Image Retrieval

2 code implementations9 May 2022 Wenzhao Zheng, Chengkun Wang, Jie zhou, Jiwen Lu

This paper proposes an introspective deep metric learning (IDML) framework for uncertainty-aware comparisons of images.

Image Classification Image Retrieval +2

WebFace260M: A Benchmark for Million-Scale Deep Face Recognition

no code implementations21 Apr 2022 Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Dalong Du, Jiwen Lu, Jie zhou

For a comprehensive evaluation of face matchers, three recognition tasks are performed under standard, masked and unbiased settings, respectively.

Face Recognition

HyperDet3D: Learning a Scene-conditioned 3D Object Detector

no code implementations CVPR 2022 Yu Zheng, Yueqi Duan, Jiwen Lu, Jie zhou, Qi Tian

A bathtub in a library, a sink in an office, a bed in a laundry room -- the counter-intuition suggests that scene provides important prior knowledge for 3D object detection, which instructs to eliminate the ambiguous detection of similar objects.

3D Object Detection Object +1

FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment

1 code implementation CVPR 2022 Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie zhou, Jiwen Lu

Most existing action quality assessment methods rely on the deep features of an entire video to predict the score, which is less reliable due to the non-transparent inference process and poor interpretability.

Action Quality Assessment

SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation

1 code implementation7 Apr 2022 Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Yongming Rao, Guan Huang, Jiwen Lu, Jie zhou

In this paper, we propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.

Autonomous Driving Monocular Depth Estimation

Attributable Visual Similarity Learning

1 code implementation CVPR 2022 Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.

Ranked #3 on Metric Learning on CARS196 (using extra training data)

Metric Learning Semantic Similarity +1

LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

1 code implementation28 Mar 2022 Yi Wei, Zibu Wei, Yongming Rao, Jiaxin Li, Jie zhou, Jiwen Lu

In this paper, we propose the LiDAR Distillation to bridge the domain gap induced by different LiDAR beams for 3D object detection.

3D Object Detection object-detection

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

1 code implementation CVPR 2022 Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie zhou, Jiwen Lu

The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach.

Ranked #6 on Action Segmentation on GTEA (using extra training data)

Action Segmentation Action Understanding +1

Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion

1 code implementation CVPR 2022 Tianpei Gu, Guangyi Chen, Junlong Li, Chunze Lin, Yongming Rao, Jie zhou, Jiwen Lu

Human behavior has the nature of indeterminacy, which requires the pedestrian trajectory prediction system to model the multi-modality of future motion states.

Diversity Pedestrian Trajectory Prediction +2

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement

2 code implementations CVPR 2022 Xiuwei Xu, Yifan Wang, Yu Zheng, Yongming Rao, Jie zhou, Jiwen Lu

In this paper, we propose a weakly-supervised approach for 3D object detection, which makes it possible to train a strong 3D detector with position-level annotations (i. e. annotations of object centers).

3D Object Detection Diversity +4

Content-aware Warping for View Synthesis

1 code implementation22 Jan 2022 Mantang Guo, Junhui Hou, Jing Jin, Hui Liu, Huanqiang Zeng, Jiwen Lu

To this end, we propose content-aware warping, which adaptively learns the interpolation weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network.

Novel View Synthesis

Adaptive neighborhood Metric learning

no code implementations20 Jan 2022 Kun Song, Junwei Han, Gong Cheng, Jiwen Lu, Feiping Nie

In this paper, we reveal that metric learning would suffer from serious inseparable problem if without informative sample mining.

Metric Learning Triplet

Dimension Embeddings for Monocular 3D Object Detection

no code implementations CVPR 2022 Yunpeng Zhang, Wenzhao Zheng, Zheng Zhu, Guan Huang, Dalong Du, Jie zhou, Jiwen Lu

In this paper, we propose a general method to learn appropriate embeddings for dimension estimation in monocular 3D object detection.

Monocular 3D Object Detection Object +1

Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network

1 code implementation17 Dec 2021 An Tao, Yueqi Duan, He Wang, Ziyi Wu, Pengliang Ji, Haowen Sun, Jie zhou, Jiwen Lu

It results in a serious issue of lagged gradient, making the learned attack at the current step ineffective due to the architecture changes afterward.

3D Classification 3D Semantic Segmentation +2

Inconsistency-aware Uncertainty Estimation for Semi-supervised Medical Image Segmentation

1 code implementation17 Oct 2021 Yinghuan Shi, Jian Zhang, Tong Ling, Jiwen Lu, Yefeng Zheng, Qian Yu, Lei Qi, Yang Gao

In semi-supervised medical image segmentation, most previous works draw on the common assumption that higher entropy means higher uncertainty.

Image Segmentation Segmentation +2

Structure-Preserving Image Super-Resolution

1 code implementation26 Sep 2021 Cheng Ma, Yongming Rao, Jiwen Lu, Jie zhou

Firstly, we propose SPSR with gradient guidance (SPSR-G) by exploiting gradient maps of images to guide the recovery in two aspects.

Image Super-Resolution SSIM

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

1 code implementation ICCV 2021 Yi Wei, Shaohui Liu, Yongming Rao, Wang Zhao, Jiwen Lu, Jie zhou

In this work, we present a new multi-view depth estimation method that utilizes both conventional reconstruction and learning-based priors over the recently proposed neural radiance fields (NeRF).

Depth Estimation NeRF

Diverse Sample Generation: Pushing the Limit of Generative Data-free Quantization

1 code implementation1 Sep 2021 Haotong Qin, Yifu Ding, Xiangguo Zhang, Jiakai Wang, Xianglong Liu, Jiwen Lu

We first give a theoretical analysis that the diversity of synthetic samples is crucial for the data-free quantization, while in existing approaches, the synthetic data completely constrained by BN statistics experimentally exhibit severe homogenization at distribution and sample levels.

Data Free Quantization Image Classification

Deep Relational Metric Learning

1 code implementation ICCV 2021 Wenzhao Zheng, Borui Zhang, Jiwen Lu, Jie zhou

This paper presents a deep relational metric learning (DRML) framework for image clustering and retrieval.

Image Clustering Metric Learning +1

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

1 code implementation ICCV 2021 Xumin Yu, Yongming Rao, Ziyi Wang, Zuyan Liu, Jiwen Lu, Jie zhou

In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr that adopts a transformer encoder-decoder architecture for point cloud completion.

 Ranked #1 on Point Cloud Completion on ShapeNet (Chamfer Distance L2 metric)

Decoder Inductive Bias +2

Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

1 code implementation ICCV 2021 Yongming Rao, Guangyi Chen, Jiwen Lu, Jie zhou

Unlike most existing methods that learn visual attention based on conventional likelihood, we propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process.

Causal Inference counterfactual +8

RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection

2 code implementations ICCV 2021 Yongming Rao, Benlin Liu, Yi Wei, Jiwen Lu, Cho-Jui Hsieh, Jie zhou

In particular, we propose to generate random layouts of a scene by making use of the objects in the synthetic CAD dataset and learn the 3D scene representation by applying object-level contrastive learning on two random scenes generated from the same set of synthetic objects.

3D Object Detection Contrastive Learning +3

Towards Interpretable Deep Metric Learning with Structural Matching

1 code implementation ICCV 2021 Wenliang Zhao, Yongming Rao, Ziyi Wang, Jiwen Lu, Jie zhou

Our method is model-agnostic, which can be applied to off-the-shelf backbone networks and metric learning methods.

Metric Learning

Person Re-identification via Attention Pyramid

1 code implementation11 Aug 2021 Guangyi Chen, Tianpei Gu, Jiwen Lu, Jin-An Bao, Jie zhou

Experimental results demonstrate the superiority of our method, which outperforms the state-of-the-art methods by a large margin with limited computational cost.

Person Re-Identification

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

1 code implementation ICCV 2021 Ziwei Wang, Han Xiao, Jiwen Lu, Jie zhou

On the contrary, our GMPQ searches the mixed-quantization policy that can be generalized to largescale datasets with only a small amount of data, so that the search cost is significantly reduced without performance degradation.

Quantization

Personalized Trajectory Prediction via Distribution Discrimination

1 code implementation ICCV 2021 Guangyi Chen, Junlong Li, Nuoxing Zhou, Liangliang Ren, Jiwen Lu

In this paper, we present a distribution discrimination (DisDis) method to predict personalized motion patterns by distinguishing the potential distributions.

Diversity Prediction +1

Human Trajectory Prediction via Counterfactual Analysis

1 code implementation ICCV 2021 Guangyi Chen, Junlong Li, Jiwen Lu, Jie zhou

Most existing methods learn to predict future trajectories by behavior clues from history trajectories and interaction clues from environments.

Autonomous Vehicles counterfactual +2

Similarity-Aware Fusion Network for 3D Semantic Segmentation

1 code implementation4 Jul 2021 Linqing Zhao, Jiwen Lu, Jie zhou

To address this, we employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds and utilize them to guide the fusion of two modalities to further exploit complementary information.

Ranked #25 on Semantic Segmentation on ScanNet (test mIoU metric)

3D Semantic Segmentation

Global Filter Networks for Image Classification

4 code implementations NeurIPS 2021 Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie zhou

Recent advances in self-attention and pure multi-layer perceptrons (MLP) models for vision have shown great potential in achieving promising performance with fewer inductive biases.

 Ranked #1 on Image Classification on ImageNet (Hardware Burden metric)

Classification Domain Generalization +1

Pseudo Facial Generation With Extreme Poses for Face Recognition

no code implementations CVPR 2021 Guoli Wang, Jiaqi Ma, Qian Zhang, Jiwen Lu, Jie zhou

Many of them settle it by generating fake frontal faces from extreme ones, whereas they are tough to maintain the identity information with high computational consumption and uncontrolled disturbances.

Face Recognition

Deep Compositional Metric Learning

1 code implementation CVPR 2021 Wenzhao Zheng, Chengkun Wang, Jiwen Lu, Jie zhou

In this paper, we propose a deep compositional metric learning (DCML) framework for effective and generalizable similarity measurement between images.

Metric Learning

Self-Supervised Video Hashing via Bidirectional Transformers

1 code implementation CVPR 2021 Shuyan Li, Xiu Li, Jiwen Lu, Jie zhou

Most existing unsupervised video hashing methods are built on unidirectional models with less reliable training objectives, which underuse the correlations among frames and the similarity structure between videos.

Decoder Retrieval +1

Structure-Aware Face Clustering on a Large-Scale Graph With 107 Nodes

1 code implementation CVPR 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

1 code implementation NeurIPS 2021 Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie zhou, Cho-Jui Hsieh

Based on this observation, we propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input.

 Ranked #1 on Image Classification on ImageNet (Hardware Burden metric)

Blocking Efficient ViTs

FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection

1 code implementation17 May 2021 Yi Wei, Shang Su, Jiwen Lu, Jie zhou

To tackle this problem, we propose frustum-aware geometric reasoning (FGR) to detect vehicles in point clouds without any 3D annotations.

3D Object Detection object-detection

SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation

no code implementations6 Apr 2021 Jiabin Zhang, Zheng Zhu, Jiwen Lu, JunJie Huang, Guan Huang, Jie zhou

To make a better trade-off between accuracy and efficiency, we propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE).

Human Detection Multi-Person Pose Estimation

Meta-Mining Discriminative Samples for Kinship Verification

no code implementations CVPR 2021 Wanhua Li, Shiwei Wang, Jiwen Lu, Jianjiang Feng, Jie zhou

In the end, the samples in the unbalanced train batch are re-weighted by the learned meta-miner to optimize the kinship models.

Kinship Verification

Structure-Aware Face Clustering on a Large-Scale Graph with $\bf{10^{7}}$ Nodes

1 code implementation24 Mar 2021 Shuai Shen, Wanhua Li, Zheng Zhu, Guan Huang, Dalong Du, Jiwen Lu, Jie zhou

To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method.

Clustering Face Clustering +1

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

no code implementations CVPR 2021 Zheng Zhu, Guan Huang, Jiankang Deng, Yun Ye, JunJie Huang, Xinze Chen, Jiagang Zhu, Tian Yang, Jiwen Lu, Dalong Du, Jie zhou

In this paper, we contribute a new million-scale face benchmark containing noisy 4M identities/260M faces (WebFace260M) and cleaned 2M identities/42M faces (WebFace42M) training data, as well as an elaborately designed time-constrained evaluation protocol.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Attribute Face Recognition +1

Separable Structure Modeling for Semi-supervised Video Object Segmentation

1 code implementation18 Feb 2021 Wencheng Zhu, Jiahao Li, Jiwen Lu, Jie zhou

Specifically, we first compute a pixel-wise similarity matrix by using representations of reference and target pixels and then select top-rank reference pixels for target pixel classification.

Object One-shot visual object segmentation +1

Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search

no code implementations2 Feb 2021 Cheng Ma, Jiwen Lu, Jie zhou

As hashing becomes an increasingly appealing technique for large-scale image retrieval, multi-label hashing is also attracting more attention for the ability to exploit multi-level semantic contents.

Clustering Deep Hashing +4

SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images

no code implementations19 Jan 2021 Lei He, Jiwen Lu, Guanghui Wang, Shiyu Song, Jie zhou

In this paper, we first introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks through an analysis of the imaging process, then propose a Semantic Object Segmentation and Depth Estimation Network (SOSD-Net) based on the objectness assumption.

Monocular Depth Estimation Multi-Task Learning +3

Frequency-Aware Spatiotemporal Transformers for Video Inpainting Detection

no code implementations ICCV 2021 Bingyao Yu, Wanhua Li, Xiu Li, Jiwen Lu, Jie zhou

In this paper, we propose a frequency-aware spatiotemporal transformers for deep In this paper, we propose a Frequency-Aware Spatiotemporal Transformer (FAST) for video inpainting detection, which aims to simultaneously mine the traces of video inpainting from spatial, temporal, and frequency domains.

Decoder Video Inpainting

SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation

1 code implementation18 Dec 2020 An Tao, Yueqi Duan, Yi Wei, Jiwen Lu, Jie zhou

Most existing point cloud instance and semantic segmentation methods rely heavily on strong supervision signals, which require point-level labels for every point in the scene.

3D Instance Segmentation 3D Semantic Segmentation +1

PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds

1 code implementation CVPR 2021 Yi Wei, Ziyi Wang, Yongming Rao, Jiwen Lu, Jie zhou

In this paper, we propose a Point-Voxel Recurrent All-Pairs Field Transforms (PV-RAFT) method to estimate scene flow from point clouds.

All Scene Flow Estimation

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization

1 code implementation1 Dec 2020 Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou

In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised video summarization.

Ranked #2 on Video Summarization on TvSum (using extra training data)

regression Supervised Video Summarization

Graph-Based Social Relation Reasoning

1 code implementation ECCV 2020 Wanhua Li, Yueqi Duan, Jiwen Lu, Jianjiang Feng, Jie zhou

Human beings are fundamentally sociable -- that we generally organize our social lives in terms of relations with other people.

Relation Relational Reasoning +1

Latent Fingerprint Registration via Matching Densely Sampled Points

no code implementations12 May 2020 Shan Gu, Jianjiang Feng, Jiwen Lu, Jie zhou

Given a pair of fingerprints to match, we bypass the minutiae extraction step and take uniformly sampled points as key points.

Clustering

Graph-based Kinship Reasoning Network

no code implementations22 Apr 2020 Wanhua Li, Yingqiang Zhang, Kangchen Lv, Jiwen Lu, Jianjiang Feng, Jie zhou

In this paper, we propose a graph-based kinship reasoning (GKR) network for kinship verification, which aims to effectively perform relational reasoning on the extracted features of an image pair.

Kinship Verification Relational Reasoning

Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds

1 code implementation CVPR 2020 Yongming Rao, Jiwen Lu, Jie zhou

Based on this hypothesis, we propose to learn point cloud representation by bidirectional reasoning between the local structures at different abstraction hierarchies and the global shape without human supervision.

3D Object Classification General Classification +2

Structure-Preserving Super Resolution with Gradient Guidance

2 code implementations CVPR 2020 Cheng Ma, Yongming Rao, Yean Cheng, Ce Chen, Jiwen Lu, Jie zhou

In this paper, we propose a structure-preserving super resolution method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details.

Generative Adversarial Network Image Super-Resolution +1

Deep Face Super-Resolution with Iterative Collaboration between Attentive Recovery and Landmark Estimation

1 code implementation CVPR 2020 Cheng Ma, Zhenyu Jiang, Yongming Rao, Jiwen Lu, Jie zhou

In this paper, we propose a deep face super-resolution (FSR) method with iterative collaboration between two recurrent networks which focus on facial image recovery and landmark estimation respectively.

Super-Resolution

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation

no code implementations20 Mar 2020 Yansong Tang, Jiwen Lu, Jie zhou

We believe the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community.

Action Detection

BiDet: An Efficient Binarized Object Detector

2 code implementations CVPR 2020 Ziwei Wang, Ziyi Wu, Jiwen Lu, Jie zhou

Conventional network binarization methods directly quantize the weights and activations in one-stage or two-stage detectors with constrained representational capacity, so that the information redundancy in the networks causes numerous false positives and degrades the performance significantly.

Binarization Object +2

DotFAN: A Domain-transferred Face Augmentation Network for Pose and Illumination Invariant Face Recognition

no code implementations23 Feb 2020 Hao-Chiang Shao, Kang-Yu Liu, Chia-Wen Lin, Jiwen Lu

With their aid, DotFAN can learn a disentangled face representation and effectively generate face images of various facial attributes while preserving the identity of augmented faces.

Diversity Face Recognition

P$^2$GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation

no code implementations19 Dec 2019 Peiyu Yu, Yongming Rao, Jiwen Lu, Jie zhou

Humans are able to perform fast and accurate object pose estimation even under severe occlusion by exploiting learned object model priors from everyday life.

6D Pose Estimation 6D Pose Estimation using RGB +1

Automatic Data Augmentation by Learning the Deterministic Policy

1 code implementation18 Oct 2019 Yinghuan Shi, Tiexin Qin, Yong liu, Jiwen Lu, Yang Gao, Dinggang Shen

By introducing an unified optimization goal, DeepAugNet intends to combine the data augmentation and the deep model training in an end-to-end training manner which is realized by simultaneously training a hybrid architecture of dueling deep Q-learning algorithm and a surrogate deep model.

Data Augmentation Deep Reinforcement Learning +1

Improving Sample-based Evaluation for Generative Adversarial Networks

no code implementations ICLR 2019 Shaohui Liu*, Yi Wei*, Jiwen Lu, Jie zhou

Unlike most existing evaluation frameworks which transfer the representation of ImageNet inception model to map images onto the feature space, our framework uses a specialized encoder to acquire fine-grained domain-specific representation.

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection

no code implementations CVPR 2019 Lijie Liu, Jiwen Lu, Chunjing Xu, Qi Tian, Jie zhou

In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively.

Monocular 3D Object Detection Object +2

BridgeNet: A Continuity-Aware Probabilistic Network for Age Estimation

no code implementations CVPR 2019 Wanhua Li, Jiwen Lu, Jianjiang Feng, Chunjing Xu, Jie zhou, Qi Tian

Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process.

Age Estimation MORPH

Hardness-Aware Deep Metric Learning

2 code implementations CVPR 2019 Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, Jie zhou

This paper presents a hardness-aware deep metric learning (HDML) framework.

Ranked #30 on Metric Learning on CUB-200-2011 (using extra training data)

Image Retrieval Metric Learning

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

no code implementations CVPR 2019 Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie zhou

There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks.

Action Detection

Graininess-Aware Deep Feature Learning for Pedestrian Detection

no code implementations ECCV 2018 Chunze Lin, Jiwen Lu, Gang Wang, Jie zhou

In this paper, we propose a graininess-aware deep feature learning method for pedestrian detection.

Pedestrian Detection

Deep Reinforcement Learning with Iterative Shift for Visual Tracking

no code implementations ECCV 2018 Liangliang Ren, Xin Yuan, Jiwen Lu, Ming Yang, Jie Zhou

Visual tracking is confronted by the dilemma to locate a target both}accurately and efficiently, and make decisions online whether and how to adapt the appearance model or even restart tracking.

Deep Reinforcement Learning Motion Estimation +5

Deep Variational Metric Learning

no code implementations ECCV 2018 Xudong Lin, Yueqi Duan, Qiyuan Dong, Jiwen Lu, Jie zhou

Deep metric learning has been extensively explored recently, which trains a deep neural network to produce discriminative embedding features.

Metric Learning

Relaxation-Free Deep Hashing via Policy Gradient

no code implementations ECCV 2018 Xin Yuan, Liangliang Ren, Jiwen Lu, Jie zhou

In this paper, we propose a simple yet effective relaxation-free method to learn more effective binary codes via policy gradient for scalable image search.

Deep Hashing Image Retrieval

Collaborative Deep Reinforcement Learning for Multi-Object Tracking

no code implementations ECCV 2018 Liangliang Ren, Jiwen Lu, Zifeng Wang, Qi Tian, Jie zhou

To address this, we develop a deep prediction-decision network in our C-DRL, which simultaneously detects and predicts objects under a unified network via deep reinforcement learning.

Deep Reinforcement Learning Multi-Object Tracking +3

Dual-Agent Deep Reinforcement Learning for Deformable Face Tracking

no code implementations ECCV 2018 Minghao Guo, Jiwen Lu, Jie zhou

In this paper, we propose a dual-agent deep reinforcement learning (DADRL) method for deformable face tracking, which generates bounding boxes and detects facial landmarks interactively from face videos.

Deep Reinforcement Learning Facial Landmark Detection +2

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition

no code implementations CVPR 2018 Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, Jie zhou

In this paper, we propose a deep progressive reinforcement learning (DPRL) method for action recognition in skeleton-based videos, which aims to distil the most informative frames and discard ambiguous frames in sequences for recognizing actions.

Action Recognition Deep Reinforcement Learning +4

Deep Adversarial Metric Learning

no code implementations CVPR 2018 Yueqi Duan, Wenzhao Zheng, Xudong Lin, Jiwen Lu, Jie zhou

Learning an effective distance metric between image pairs plays an important role in visual analysis, where the training procedure largely relies on hard negative samples.

Metric Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.