Search Results for author: Dingkang Liang

Found 29 papers, 22 papers with code

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

1 code implementation1 Sep 2024 Dingyuan Zhang, Dingkang Liang, Zichang Tan, Xiaoqing Ye, Cheng Zhang, Jingdong Wang, Xiang Bai

Slow inference speed is one of the most crucial concerns for deploying multi-view 3D detectors to tasks with high real-time requirements like autonomous driving.

Autonomous Driving

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

1 code implementation4 Aug 2024 Mingxin Huang, Yuliang Liu, Dingkang Liang, Lianwen Jin, Xiang Bai

To address this issue, we introduce a Complementary Image Pyramid (CIP), a simple, effective, and plug-and-play solution designed to mitigate semantic discontinuity during high-resolution image processing.

document understanding

A Unified Framework for 3D Scene Understanding

1 code implementation3 Jul 2024 Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

We propose UniSeg3D, a unified 3D scene understanding framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary segmentation tasks within a single model.

Contrastive Learning Knowledge Distillation +4

SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection

1 code implementation1 Jul 2024 Dingkang Liang, Wei Hua, Chunsheng Shi, Zhikang Zou, Xiaoqing Ye, Xiang Bai

Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC).

Object object-detection +4

MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

1 code implementation7 Jun 2024 Xingkui Zhu, Yiran Guan, Dingkang Liang, Yuchao Chen, Yuliang Liu, Xiang Bai

The sparsely activated mixture of experts (MoE) model presents a promising alternative to traditional densely activated (dense) models, enhancing both quality and computational efficiency.

Computational Efficiency

Anomaly Detection by Adapting a pre-trained Vision Language Model

no code implementations14 Mar 2024 Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Bai

Recently, large vision and language models have shown their success when adapting them to many downstream tasks.

Anomaly Detection Language Modelling +1

Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis

1 code implementation CVPR 2024 Xin Zhou, Dingkang Liang, Wei Xu, Xingkui Zhu, Yihan Xu, Zhikang Zou, Xiang Bai

To achieve this goal, we freeze the parameters of the default pre-trained models and then propose the Dynamic Adapter, which generates a dynamic scale for each token, considering the token significance to the downstream task.

3D Parameter-Efficient Fine-Tuning for Classification Transfer Learning

PointMamba: A Simple State Space Model for Point Cloud Analysis

1 code implementation16 Feb 2024 Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai

Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.

Mamba

A Discrepancy Aware Framework for Robust Anomaly Detection

1 code implementation11 Oct 2023 Yuxuan Cai, Dingkang Liang, Dongliang Luo, Xinwei He, Xin Yang, Xiang Bai

To alleviate this issue, we present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies across different anomaly detection benchmarks.

Anomaly Detection Decoder +3

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

1 code implementation4 Jun 2023 Dingyuan Zhang, Dingkang Liang, Hongcheng Yang, Zhikang Zou, Xiaoqing Ye, Zhe Liu, Xiang Bai

In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks.

3D Object Detection Image Segmentation +3

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

1 code implementation12 May 2023 Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai

We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.

Contrastive Learning Optical Character Recognition (OCR)

SOOD: Towards Semi-Supervised Oriented Object Detection

1 code implementation CVPR 2023 Wei Hua, Dingkang Liang, Jingyu Li, Xiaolong Liu, Zhikang Zou, Xiaoqing Ye, Xiang Bai

Semi-Supervised Object Detection (SSOD), aiming to explore unlabeled data for boosting object detectors, has become an active task in recent years.

Object object-detection +4

Super-Resolution Information Enhancement For Crowd Counting

1 code implementation13 Mar 2023 Jiahao Xie, Wei Xu, Dingkang Liang, Zhanyu Ma, Kongming Liang, Weidong Liu, Rui Wang, Ling Jin

As the proposed method requires SR labels, we further propose a Super-Resolution Crowd Counting dataset (SR-Crowd).

Crowd Counting Super-Resolution

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

3 code implementations23 Jul 2022 Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism.

Decoder Handwritten Mathmatical Expression Recognition +1

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

no code implementations23 Mar 2022 Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai

Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.

Benchmarking Scene Text Detection +1

An End-to-End Transformer Model for Crowd Localization

1 code implementation26 Feb 2022 Dingkang Liang, Wei Xu, Xiang Bai

Crowd localization, predicting head positions, is a more practical and high-level task than simply counting.

Decoder

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

no code implementations3 Sep 2021 Xinwei He, Silin Cheng, Dingkang Liang, Song Bai, Xi Wang, Yingying Zhu

To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification.

3D Object Classification 3D Object Retrieval +3

TransCrowd: weakly-supervised crowd counting with transformers

1 code implementation19 Apr 2021 Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai

Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm.

Crowd Counting

Focal Inverse Distance Transform Maps for Crowd Localization

3 code implementations16 Feb 2021 Dingkang Liang, Wei Xu, Yingying Zhu, Yu Zhou

Most regression-based methods utilize convolution neural networks (CNN) to regress a density map, which can not accurately locate the instance in the extremely dense scene, attributed to two crucial reasons: 1) the density map consists of a series of blurry Gaussian blobs, 2) severe overlaps exist in the dense region of the density map.

Crowd Counting SSIM

Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting

no code implementations15 Dec 2020 Wei Xu, Dingkang Liang, Yixiao Zheng, Zhanyu Ma

In this paper, we propose a simple yet efficient counting network based on point-level annotations.

Object Object Counting

AutoScale: Learning to Scale for Crowd Counting and Localization

2 code implementations20 Dec 2019 Chenfeng Xu, Dingkang Liang, Yongchao Xu, Song Bai, Wei Zhan, Xiang Bai, Masayoshi Tomizuka

A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels.

Crowd Counting Model Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.