Search Results for author: Di Huang

Found 89 papers, 34 papers with code

Face Aging Effect Simulation using Hidden Factor Analysis Joint Sparse Representation

no code implementations • 4 Nov 2015 • Hongyu Yang, Di Huang, Yunhong Wang, Heng Wang, Yuanyan Tang

Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images.

Paper
Add Code

Defect detection for patterned fabric images based on GHOG and low-rank decomposition

no code implementations • 18 Feb 2017 • Chunlei Li, Guangshuai Gao, Zhoufeng Liu, Di Huang, Sheng Liu, Miao Yu

In order to accurately detect defects in patterned fabric images, a novel detection algorithm based on Gabor-HOG (GHOG) and low-rank decomposition is proposed in this paper.

Computational Efficiency Defect Detection

Paper
Add Code

Receptive Field Block Net for Accurate and Fast Object Detection

7 code implementations • ECCV 2018 • Songtao Liu, Di Huang, Yunhong Wang

Current top-performing object detectors depend on deep CNN backbones, such as ResNet-101 and Inception, benefiting from their powerful feature representations but suffering from high computational costs.

object-detection Real-Time Object Detection

1,397

Paper
Code

Feature Map Pooling for Cross-View Gait Recognition Based on Silhouette Sequence Images

no code implementations • 26 Nov 2017 • Qiang Chen, Yunhong Wang, Zheng Liu, Qingjie Liu, Di Huang

In this paper, we develop a novel convolutional neural network based approach to extract and aggregate useful information from gait silhouette sequence images instead of simply representing the gait process by averaging silhouette images.

Gait Recognition

Paper
Add Code

Learning Face Age Progression: A Pyramid Architecture of GANs

1 code implementation • CVPR 2018 • Hongyu Yang, Di Huang, Yunhong Wang, Anil K. Jain

The two underlying requirements of face age progression, i. e. aging accuracy and identity permanence, are not well studied in the literature.

Generative Adversarial Network

Paper
Code

Learning Continuous Face Age Progression: A Pyramid of GANs

no code implementations • 10 Jan 2019 • Hongyu Yang, Di Huang, Yunhong Wang, Anil K. Jain

The two underlying requirements of face age progression, i. e. aging accuracy and identity permanence, are not well studied in the literature.

Face Recognition Generative Adversarial Network +1

Paper
Add Code

Adaptive NMS: Refining Pedestrian Detection in a Crowd

no code implementations • CVPR 2019 • Songtao Liu, Di Huang, Yunhong Wang

Pedestrian detection in a crowd is a very challenging issue.

Ranked #17 on Object Detection on CrowdHuman (full body)

Object Detection Pedestrian Detection

Paper
Add Code

CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural Networks

1 code implementation • 28 May 2019 • Weicheng Li, Rui Wang, Zhongzhi Luan, Di Huang, Zidong Du, Yunji Chen, Depei Qian

Convolutional Neural Network (CNN) based Deep Learning (DL) has achieved great progress in many real-life applications.

Image Classification

Paper
Code

Benchmarks for Graph Embedding Evaluation

1 code implementation • 19 Aug 2019 • Palash Goyal, Di Huang, Ankita Goswami, Sujit Rokka Chhetri, Arquimedes Canedo, Emilio Ferrara

We use the comparisons on our 100 benchmark graphs to define GFS-score, that can be applied to any embedding method to quantify its performance.

Benchmarking Graph Embedding +1

Paper
Code

Graph Representation Ensemble Learning

1 code implementation • 6 Sep 2019 • Palash Goyal, Di Huang, Sujit Rokka Chhetri, Arquimedes Canedo, Jaya Shree, Evan Patterson

In this work, we introduce the problem of graph representation ensemble learning and provide a first of its kind framework to aggregate multiple graph embedding methods efficiently.

Ensemble Learning Graph Embedding +2

Paper
Code

ArduCode: Predictive Framework for Automation Engineering

no code implementations • 6 Sep 2019 • Arquimedes Canedo, Palash Goyal, Di Huang, Amit Pandey, Gustavo Quiros

We show that machine learning can be leveraged to assist the automation engineer in classifying automation, finding similar code snippets, and reasoning about the hardware selection of sensors and actuators.

BIG-bench Machine Learning Decision Making

Paper
Add Code

Synthetic vs Real: Deep Learning on Controlled Noise

no code implementations • 25 Sep 2019 • Lu Jiang, Di Huang, Weilong Yang

Performing controlled experiments on noisy data is essential in thoroughly understanding deep learning across a spectrum of noise levels.

Paper
Add Code

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

no code implementations • 1 Nov 2019 • Xishan Zhang, Shaoli Liu, Rui Zhang, Chang Liu, Di Huang, Shiyi Zhou, Jiaming Guo, Yu Kang, Qi Guo, Zidong Du, Yunji Chen

Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers.

Image Classification Machine Translation +2

Paper
Add Code

Learning Spatial Fusion for Single-Shot Object Detection

1 code implementation • 21 Nov 2019 • Songtao Liu, Di Huang, Yunhong Wang

Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection.

Ranked #136 on Object Detection on COCO test-dev

Object object-detection +1

1,042

Paper
Code

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

2 code implementations • ICML 2020 • Lu Jiang, Di Huang, Mason Liu, Weilong Yang

Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting.

Ranked #12 on Image Classification on WebVision-1000

Image Classification

32,792

Paper
Code

Distraction-Aware Feature Learning for Human Attribute Recognition via Coarse-to-Fine Attention Mechanism

no code implementations • 26 Nov 2019 • Mingda Wu, Di Huang, Yuanfang Guo, Yunhong Wang

Recently, Human Attribute Recognition (HAR) has become a hot topic due to its scientific challenges and application potentials, where localizing attributes is a crucial stage but not well handled.

Attribute

Paper
Add Code

A Feasible Framework for Arbitrary-Shaped Scene Text Recognition

2 code implementations • 10 Dec 2019 • Jinjin Zhang, Wei Wang, Di Huang, Qingjie Liu, Yunhong Wang

Deep learning based methods have achieved surprising progress in Scene Text Recognition (STR), one of classic problems in computer vision.

Instance Segmentation Language Modelling +4

819

Paper
Code

DWM: A Decomposable Winograd Method for Convolution Acceleration

no code implementations • 3 Feb 2020 • Di Huang, Xishan Zhang, Rui Zhang, Tian Zhi, Deyuan He, Jiaming Guo, Chang Liu, Qi Guo, Zidong Du, Shaoli Liu, Tianshi Chen, Yunji Chen

In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions.

Paper
Add Code

Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation

no code implementations • CVPR 2020 • Yangtao Zheng, Di Huang, Songtao Liu, Yunhong Wang

Thanks to this coarse-to-fine feature adaptation, domain knowledge in foreground regions can be effectively transferred.

object-detection Object Detection

Paper
Add Code

Improving Object Detection with Selective Self-supervised Self-training

no code implementations • ECCV 2020 • Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong

They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets.

Image Classification Image Retrieval +4

Paper
Add Code

Multi-Scale Positive Sample Refinement for Few-Shot Object Detection

4 code implementations • ECCV 2020 • Jiaxi Wu, Songtao Liu, Di Huang, Yunhong Wang

Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances, and is useful when manual annotation is time-consuming or data acquisition is limited.

Ranked #16 on Few-Shot Object Detection on MS-COCO (30-shot)

Few-Shot Object Detection Object +1

177

Paper
Code

PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection

no code implementations • 18 Dec 2020 • Yanan Zhang, Di Huang, Yunhong Wang

LiDAR-based 3D object detection is an important task for autonomous driving and current approaches suffer from sparse and partial point clouds of distant and occluded objects.

Ranked #4 on 3D Object Detection on KITTI Cars Hard val

3D Object Detection Autonomous Driving +2

Paper
Add Code

MRDet: A Multi-Head Network for Accurate Oriented Object Detection in Aerial Images

no code implementations • 24 Dec 2020 • Ran Qin, Qingjie Liu, Guangshuai Gao, Di Huang, Yunhong Wang

Objects in aerial images usually have arbitrary orientations and are densely located over the ground, making them extremely challenge to be detected.

object-detection Object Detection In Aerial Images +2

Paper
Add Code

Student-Teacher Feature Pyramid Matching for Anomaly Detection

10 code implementations • 7 Mar 2021 • Guodong Wang, Shumin Han, Errui Ding, Di Huang

Anomaly detection is a challenging task and usually formulated as an one-class learning problem for the unexpectedness of anomalies.

Ranked #20 on Anomaly Detection on VisA

Image Classification Unsupervised Anomaly Detection

3,121

Paper
Code

Magnifying Subtle Facial Motions for Effective 4D Expression Recognition

no code implementations • 5 May 2021 • Qingkai Zhen, Di Huang, Yunhong Wang, Hassen Drira, Boulbaba Ben Amor, Mohamed Daoudi

In this paper, an effective pipeline to automatic 4D Facial Expression Recognition (4D FER) is proposed.

Emotion Classification Facial Expression Recognition +1

Paper
Add Code

Pixel Sampling for Style Preserving Face Pose Editing

no code implementations • 14 Jun 2021 • Xiangnan Yin, Di Huang, Hongyu Yang, Zehua Fu, Yunhong Wang, Liming Chen

The existing auto-encoder based face pose editing methods primarily focus on modeling the identity preserving ability during pose synthesis, but are less able to preserve the image style properly, which refers to the color, brightness, saturation, etc.

Facial Inpainting

Paper
Add Code

Weakly-Supervised Photo-realistic Texture Generation for 3D Face Reconstruction

no code implementations • 14 Jun 2021 • Xiangnan Yin, Di Huang, Zehua Fu, Yunhong Wang, Liming Chen

Missing textures in the incomplete UV map are further full-filled by the UV generator.

3D Face Reconstruction Image Generation +1

Paper
Add Code

Double-Dot Network for Antipodal Grasp Detection

no code implementations • 3 Aug 2021 • Yao Wang, Yangtao Zheng, Boyang Gao, Di Huang

This paper proposes a new deep learning approach to antipodal grasp detection, named Double-Dot Network (DD-Net).

object-detection Object Detection

Paper
Add Code

Recurrent Graph Neural Networks for Rumor Detection in Online Forums

1 code implementation • 8 Aug 2021 • Di Huang, Jacob Bartel, John Palowitch

The widespread adoption of online social networks in daily life has created a pressing need for effectively classifying user-generated content.

Paper
Code

Image Inpainting via Conditional Texture and Structure Dual Generation

4 code implementations • ICCV 2021 • Xiefan Guo, Hongyu Yang, Di Huang

Deep generative approaches have recently made considerable progress in image inpainting by introducing structure priors.

Image Inpainting Texture Synthesis

172

Paper
Code

PR-GCN: A Deep Graph Convolutional Network with Point Refinement for 6D Pose Estimation

no code implementations • ICCV 2021 • Guangyuan Zhou, Huiqun Wang, Jiaxin Chen, Di Huang

This paper proposes a novel deep learning approach, namely Graph Convolutional Network with Point Refinement (PR-GCN), to simultaneously address the issues above in a unified way.

6D Pose Estimation

Paper
Add Code

iDARTS: Improving DARTS by Node Normalization and Decorrelation Discretization

no code implementations • 25 Aug 2021 • Huiqun Wang, Ruijie Yang, Di Huang, Yunhong Wang

Differentiable ARchiTecture Search (DARTS) uses a continuous relaxation of network representation and dramatically accelerates Neural Architecture Search (NAS) by almost thousands of times in GPU-day.

Ranked #9 on Neural Architecture Search on CIFAR-10

Neural Architecture Search

Paper
Add Code

Identity-aware Graph Memory Network for Action Detection

no code implementations • 26 Aug 2021 • Jingcheng Ni, Jie Qin, Di Huang

Action detection plays an important role in high-level video understanding and media interpretation.

Action Detection Temporal Localization +1

Paper
Add Code

Boundary Guided Context Aggregation for Semantic Segmentation

1 code implementation • 27 Oct 2021 • Haoxiang Ma, Hongyu Yang, Di Huang

The recent studies on semantic segmentation are starting to notice the significance of the boundary information, where most approaches see boundaries as the supplement of semantic details.

Boundary Detection Semantic Segmentation

Paper
Code

Segmentation-Reconstruction-Guided Facial Image De-occlusion

no code implementations • 15 Dec 2021 • Xiangnan Yin, Di Huang, Zehua Fu, Yunhong Wang, Liming Chen

The proposed model consists of a 3D face reconstruction module, a face segmentation module, and an image generation module.

3D Face Reconstruction Image Generation

Paper
Add Code

UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone Imagery

no code implementations • 20 Dec 2021 • Yecheng Huang, Jiaxin Chen, Di Huang

This paper proposes a novel approach to object detection on drone imagery, namely Multi-Proxy Detection Network with Unified Foreground Packing (UFPMP-Det).

object-detection Object Detection

Paper
Add Code

ACGNet: Action Complement Graph Network for Weakly-supervised Temporal Action Localization

1 code implementation • 21 Dec 2021 • Zichen Yang, Jie Qin, Di Huang

Weakly-supervised temporal action localization (WTAL) in untrimmed videos has emerged as a practical but challenging task since only video-level labels are available.

Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization

Paper
Code

ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo

1 code implementation • CVPR 2022 • Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Di Huang

The network is mainly composed of two components: a context-aware local retouching layer (LRL) and an adaptive blend pyramid layer (BPL).

4k Photo Retouching

Paper
Code

ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations

2 code implementations • CVPR 2022 • Mingwu Zheng, Hongyu Yang, Di Huang, Liming Chen

Precise representations of 3D faces are beneficial to various computer vision and graphics applications.

Ranked #2 on Face Alignment on FaceScape

Face Alignment Face Model

153

Paper
Code

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

no code implementations • CVPR 2022 • Yanan Zhang, Jiaxin Chen, Di Huang

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection.

3D Object Detection Autonomous Driving +4

Paper
Add Code

Beyond 3DMM: Learning to Capture High-fidelity 3D Face Shape

no code implementations • 9 Apr 2022 • Xiangyu Zhu, Chang Yu, Di Huang, Zhen Lei, Hao Wang, Stan Z. Li

3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori.

Vocal Bursts Intensity Prediction

Paper
Add Code

Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint

no code implementations • CVPR 2022 • Jiaxi Wu, Jiaxin Chen, Di Huang

Active learning is a promising alternative to alleviate the issue of high annotation cost in the computer vision tasks by consciously selecting more informative samples to label.

Active Learning object-detection +1

Paper
Add Code

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection

no code implementations • CVPR 2022 • Jiaxi Wu, Jiaxin Chen, Mengzhe He, Yiru Wang, Bo Li, Bingqi Ma, Weihao Gan, Wei Wu, Yali Wang, Di Huang

Specifically, TRKP adopts the teacher-student framework, where the multi-head teacher network is built to extract knowledge from labeled source domains and guide the student network to learn detectors in unlabeled target domain.

Disentanglement Domain Adaptation +2

Paper
Add Code

Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement

no code implementations • 7 May 2022 • Bing Li, Jiaxin Chen, Dongming Zhang, Xiuguo Bao, Di Huang

To address the two issues above, this paper proposes a novel framework, namely Attentive Cross-modal Interaction Network with Motion Enhancement (MEACI-Net).

Action Recognition Denoising +2

Paper
Add Code

Neural Program Synthesis with Query

no code implementations • ICLR 2022 • Di Huang, Rui Zhang, Xing Hu, Xishan Zhang, Pengwei Jin, Nan Li, Zidong Du, Qi Guo, Yunji Chen

In this work, we propose a query-based framework that trains a query neural network to generate informative input-output examples automatically and interactively from a large query space.

Program Synthesis

Paper
Add Code

Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles

1 code implementation • 20 Jul 2022 • Guodong Wang, Yunhong Wang, Jie Qin, Dongming Zhang, Xiuguo Bao, Di Huang

Video Anomaly Detection (VAD) is an important topic in computer vision.

Ranked #4 on Anomaly Detection on ShanghaiTech

Anomaly Detection Self-Supervised Learning +1

Paper
Code

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

no code implementations • 12 Aug 2022 • Jingcheng Ni, Nan Zhou, Jie Qin, Qian Wu, Junqi Liu, Boxun Li, Di Huang

Contrastive learning has shown great potential in video representation learning.

Contrastive Learning Representation Learning +4

Paper
Add Code

STS: Surround-view Temporal Stereo for Multi-view 3D Detection

no code implementations • 22 Aug 2022 • Zengran Wang, Chen Min, Zheng Ge, Yinhao Li, Zeming Li, Hongyu Yang, Di Huang

Instead of using a sole monocular depth method, in this work, we propose a novel Surround-view Temporal Stereo (STS) technique that leverages the geometry correspondence between frames across time to facilitate accurate depth learning.

3D Object Detection Depth Estimation +4

Paper
Add Code

Racial Disparities in Pulse Oximetry Cannot Be Fixed With Race-Based Correction

no code implementations • 10 Oct 2022 • Neal Patwari, Di Huang, Kiki Bonetta-Misteli

Studies have shown pulse oximeter measurements of blood oxygenation have statistical bias that is a function of race, which results in higher rates of occult hypoxemia, i. e., missed detection of dangerously low oxygenation, in patients of color.

Paper
Add Code

Reconstructing Hand-Held Objects from Monocular Video

no code implementations • 30 Nov 2022 • Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker.

Hand Pose Estimation Object

Paper
Add Code

Learning Polysemantic Spoof Trace: A Multi-Modal Disentanglement Network for Face Anti-spoofing

no code implementations • 7 Dec 2022 • Kaicheng Li, Hongyu Yang, Binghui Chen, Pengyu Li, Biao Wang, Di Huang

Along with the widespread use of face recognition systems, their vulnerability has become highlighted.

Disentanglement Face Anti-Spoofing +1

Paper
Add Code

Towards Scale Balanced 6-DoF Grasp Detection in Cluttered Scenes

1 code implementation • 10 Dec 2022 • Haoxiang Ma, Di Huang

Moreover, a Scale Balanced Learning (SBL) loss and an Object Balanced Sampling (OBS) strategy are designed, where SBL enlarges the gradients of the samples whose scales are in low frequency by apriori weights while OBS captures more points on small-scale objects with the help of an auxiliary segmentation network.

Ranked #2 on Robotic Grasping on GraspNet-1Billion

Data Augmentation Robotic Grasping

Paper
Code

Ponder: Point Cloud Pre-training via Neural Rendering

no code implementations • ICCV 2023 • Di Huang, Sida Peng, Tong He, Honghui Yang, Xiaowei Zhou, Wanli Ouyang

We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering.

3D Reconstruction Image Generation +2

Paper
Add Code

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

no code implementations • 18 Jan 2023 • Xingyi He, Jiaming Sun, Yuang Wang, Di Huang, Hujun Bao, Xiaowei Zhou

We propose a new method for object pose estimation without CAD models.

Keypoint Detection Object

Paper
Add Code

Online Symbolic Regression with Informative Query

no code implementations • 21 Feb 2023 • Pengwei Jin, Di Huang, Rui Zhang, Xing Hu, Ziyuan Nan, Zidong Du, Qi Guo, Yunji Chen

Symbolic regression, the task of extracting mathematical expressions from the observed data $\{ \vx_i, y_i \}$, plays a crucial role in scientific discovery.

regression Symbolic Regression

Paper
Add Code

RGB-D Grasp Detection via Depth Guided Learning with Cross-modal Attention

no code implementations • 28 Feb 2023 • Ran Qin, Haoxiang Ma, Boyang Gao, Di Huang

Planar grasp detection is one of the most fundamental tasks to robotic manipulation, and the recent progress of consumer-grade RGB-D sensors enables delivering more comprehensive features from both the texture and shape modalities.

Paper
Add Code

Denoising Diffusion Autoencoders are Unified Self-supervised Learners

1 code implementation • ICCV 2023 • Weilai Xiang, Hongyu Yang, Di Huang, Yunhong Wang

Inspired by recent advances in diffusion models, which are reminiscent of denoising autoencoders, we investigate whether they can acquire discriminative representations for classification via generative pre-training.

Contrastive Learning Denoising +3

100

Paper
Code

OcTr: Octree-based Transformer for 3D Object Detection

no code implementations • CVPR 2023 • Chao Zhou, Yanan Zhang, Jiaxin Chen, Di Huang

A key challenge for LiDAR-based 3D object detection is to capture sufficient features from large scale 3D scenes especially for distant or/and occluded objects.

3D Object Detection Object +1

Paper
Add Code

NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images

1 code implementation • CVPR 2023 • Mingwu Zheng, Haiyu Zhang, Hongyu Yang, Di Huang

Realistic face rendering from multi-view images is beneficial to various computer vision and graphics applications.

Neural Rendering

114

Paper
Code

Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images

1 code implementation • CVPR 2023 • Bowei Du, Yecheng Huang, Jiaxin Chen, Di Huang

Object detection on drone images with low-latency is an important but challenging task on the resource-constrained unmanned aerial vehicle (UAV) platform.

object-detection Object Detection

Paper
Code

Align-DETR: Improving DETR with Simple IoU-aware BCE loss

1 code implementation • 15 Apr 2023 • Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di Huang

We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.

object-detection Object Detection

Paper
Code

FR-Net:A Light-weight FFT Residual Net For Gaze Estimation

no code implementations • 4 May 2023 • Tao Xu, Bo Wu, Ruilong Fan, Yun Zhou, Di Huang

Furthermore, our method outperforms existing lightweight methods in terms of accuracy and efficiency for the gaze estimation task.

Gaze Estimation

Paper
Add Code

ANPL: Towards Natural Programming with Interactive Decomposition

1 code implementation • NeurIPS 2023 • Di Huang, Ziyuan Nan, Xing Hu, Pengwei Jin, Shaohui Peng, Yuanbo Wen, Rui Zhang, Zidong Du, Qi Guo, Yewen Pu, Yunji Chen

We deploy ANPL on the Abstraction and Reasoning Corpus (ARC), a set of unique tasks that are challenging for state-of-the-art AI systems, showing it outperforms baseline programming systems that (a) without the ability to decompose tasks interactively and (b) without the guarantee that the modules can be correctly composed together.

Ranked #4 on Code Generation on HumanEval

Code Generation Program Synthesis

Paper
Code

Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

1 code implementation • 3 Jun 2023 • Pucheng Dang, Xing Hu, Kaidi Xu, Jinhao Duan, Di Huang, Husheng Han, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen

Unlearning techniques are proposed to prevent third parties from exploiting unauthorized data, which generate unlearnable samples by adding imperceptible perturbations to data for public publishing.

Paper
Code

MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators

no code implementations • 19 Jun 2023 • Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans.

Paper
Add Code

Unilaterally Aggregated Contrastive Learning with Hierarchical Augmentation for Anomaly Detection

no code implementations • ICCV 2023 • Guodong Wang, Yunhong Wang, Jie Qin, Dongming Zhang, Xiuguo Bao, Di Huang

Anomaly detection (AD), aiming to find samples that deviate from the training distribution, is essential in safety-critical applications.

Anomaly Detection Contrastive Learning +2

Paper
Add Code

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

1 code implementation • ICCV 2023 • Nan Zhou, Jiaxin Chen, Di Huang

Furthermore, to alleviate the interference by semantic drift, we develop the semantic calibration (SC) module to align the global shape and class centers of the pretrained and downstream feature distributions.

General Knowledge Image Classification

Paper
Code

Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning

no code implementations • 4 Sep 2023 • Shaohui Peng, Xing Hu, Qi Yi, Rui Zhang, Jiaming Guo, Di Huang, Zikang Tian, Ruizhi Chen, Zidong Du, Qi Guo, Yunji Chen, Ling Li

Large language models (LLMs) show their powerful automatic reasoning and planning capability with a wealth of semantic knowledge about the human world.

Imitation Learning Instruction Following +2

Paper
Add Code

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #1 on 3D Semantic Segmentation on ScanNet++ (using extra training data)

3D Object Detection 3D Reconstruction +5

294

Paper
Code

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

1 code implementation • 12 Oct 2023 • Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, Xiaofei He, Wanli Ouyang

In the context of autonomous driving, the significance of effective feature learning is widely acknowledged.

3D Object Detection 3D Semantic Segmentation +3

123

Paper
Code

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

1 code implementation • 29 Oct 2023 • Srikumar Sastry, Subash Khanal, Aayush Dhakal, Di Huang, Nathan Jacobs

We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological mapping of bird species around the world.

Contrastive Learning Cross-Modal Retrieval +4

Paper
Code

Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence

no code implementations • 1 Dec 2023 • Yajie Liu, Pu Ge, Haoxiang Ma, Shichao Fan, Qingjie Liu, Di Huang, Yunhong Wang

Referring image segmentation (RIS) aims to segment objects in an image conditioning on free-from text descriptions.

Image Segmentation Semantic Segmentation

Paper
Add Code

ImFace++: A Sophisticated Nonlinear 3D Morphable Face Model with Implicit Neural Representations

1 code implementation • 7 Dec 2023 • Mingwu Zheng, Haiyu Zhang, Hongyu Yang, Liming Chen, Di Huang

Accurate representations of 3D faces are of paramount importance in various computer vision and graphics applications.

Face Model Face Reconstruction

153

Paper
Code

Assessing and Understanding Creativity in Large Language Models

no code implementations • 23 Jan 2024 • Yunpu Zhao, Rui Zhang, Wenyi Li, Di Huang, Jiaming Guo, Shaohui Peng, Yifan Hao, Yuanbo Wen, Xing Hu, Zidong Du, Qi Guo, Ling Li, Yunji Chen

This paper aims to establish an efficient framework for assessing the level of creativity in LLMs.

Language Modelling Large Language Model

Paper
Add Code

Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning

no code implementations • 4 Feb 2024 • Haoyi Zhu, Yating Wang, Di Huang, Weicai Ye, Wanli Ouyang, Tong He

In this study, we explore the influence of different observation spaces on robot learning, focusing on three predominant modalities: RGB, RGB-D, and point cloud.

Zero-shot Generalization

Paper
Add Code

FiT: Flexible Vision Transformer for Diffusion Model

2 code implementations • 19 Feb 2024 • Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai

Nature is infinitely resolution-free.

Image Cropping

321

Paper
Code

NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth Supervision for Indoor Multi-View 3D Detection

1 code implementation • 22 Feb 2024 • Chenxi Huang, Yuenan Hou, Weicai Ye, Di Huang, Xiaoshui Huang, Binbin Lin, Deng Cai, Wanli Ouyang

We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors.

Depth Estimation Depth Prediction +1

Paper
Code

Deep Common Feature Mining for Efficient Video Semantic Segmentation

no code implementations • 5 Mar 2024 • Yaoyan Zheng, Hongyu Yang, Di Huang

Recent advancements in video semantic segmentation have made substantial progress by exploiting temporal correlations.

Semantic Segmentation Video Semantic Segmentation

Paper
Add Code

Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision

no code implementations • 6 Mar 2024 • Yajie Liu, Pu Ge, Qingjie Liu, Di Huang

Recently, learning open-vocabulary semantic segmentation from text supervision has achieved promising downstream performance.

Contrastive Learning Open Vocabulary Semantic Segmentation +3

Paper
Add Code

DrFER: Learning Disentangled Representations for 3D Facial Expression Recognition

no code implementations • 13 Mar 2024 • Hebeizi Li, Hongyu Yang, Di Huang

Facial Expression Recognition (FER) has consistently been a focal point in the field of facial analysis.

3D Facial Expression Recognition Disentanglement +1

Paper
Add Code

Agent3D-Zero: An Agent for Zero-shot 3D Understanding

no code implementations • 18 Mar 2024 • Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence.

Language Modelling Scene Understanding

Paper
Add Code

Sim-to-Real Grasp Detection with Global-to-Local RGB-D Adaptation

no code implementations • 18 Mar 2024 • Haoxiang Ma, Ran Qin, Modi shi, Boyang Gao, Di Huang

This paper focuses on the sim-to-real issue of RGB-D grasp detection and formulates it as a domain adaptation problem.

Domain Adaptation

Paper
Add Code

GVGEN: Text-to-3D Generation with Volumetric Representation

no code implementations • 19 Mar 2024 • Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline.

3D Generation 3D Reconstruction +1

Paper
Add Code

A Survey on Long Video Generation: Challenges, Methods, and Prospects

no code implementations • 25 Mar 2024 • Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai

Video generation is a rapidly advancing research area, garnering significant attention due to its broad range of applications.

Video Generation

Paper
Add Code

Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge

1 code implementation • 2 Apr 2024 • Haoxiang Ma, Modi shi, Boyang Gao, Di Huang

We focus on the generalization ability of the 6-DoF grasp detection method in this paper.

Robotic Grasping

Paper
Code

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

1 code implementation • 6 Apr 2024 • Xiefan Guo, Jinlin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, Di Huang

Recent strides in the development of diffusion models, exemplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visually compelling images.

valid

Paper
Code

iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection

no code implementations • 8 Apr 2024 • Nan Zhou, Jiaxin Chen, Di Huang

It innovatively incorporates a cross-layer dynamic connection (CDC) for input prompt tokens from adjacent layers, enabling effective sharing of task-relevant information.

Image Classification Semantic Segmentation +1

Paper
Add Code

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

no code implementations • 14 Apr 2024 • Xuening Yuan, Hongyu Yang, Yueming Zhao, Di Huang

Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain.

Text to 3D Text-to-Image Generation

Paper
Add Code

Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction

1 code implementation • ECCV 2020 • Xiangyu Zhu, Fan Yang, Di Huang, Chang Yu, Hao Wang, Jianzhu Guo, Zhen Lei, Stan Z. Li

However, most of their training data is constructed by 3D Morphable Model, whose space spanned is only a small part of the shape space.

3D Face Reconstruction

104

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.