Search Results for author: Peng Gao

Found 137 papers, 63 papers with code

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML

2 code implementations • 5 May 2021 • Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao

In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm.

Ranked #1 on Table Recognition on PubTabNet

Line Detection Table Recognition

38,218

Paper
Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,714

Paper
Code

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

7 code implementations • 28 Mar 2023 • Renrui Zhang, Jiaming Han, Chris Liu, Peng Gao, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Yu Qiao

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model.

Ranked #2 on Music Question Answering on MusicQA

Instruction Following Language Modelling +3

5,773

Paper
Code

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

3 code implementations • 28 Apr 2023 • Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.

Ranked #6 on Visual Question Answering (VQA) on InfiMM-Eval

Instruction Following Optical Character Recognition (OCR) +7

5,479

Paper
Code

ImageBind-LLM: Multi-modality Instruction Tuning

2 code implementations • 7 Sep 2023 • Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder.

Instruction Following Text Generation

5,479

Paper
Code

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

2 code implementations • 12 Jan 2022 • Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

2,959

Paper
Code

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

1 code implementation • 13 Nov 2023 • Ziyi Lin, Chris Liu, Renrui Zhang, Peng Gao, Longtian Qiu, Han Xiao, Han Qiu, Chen Lin, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He, Hongsheng Li, Yu Qiao

We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings.

Ranked #2 on Visual Question Answering on BenchLMM

Described Object Detection Language Modelling +4

2,463

Paper
Code

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

1 code implementation • 8 Feb 2024 • Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.

Ranked #4 on Video Question Answering on MVBench

Benchmarking Language Modelling +4

2,463

Paper
Code

Personalize Segment Anything Model with One Shot

1 code implementation • 4 May 2023 • Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.

Ranked #1 on Personalized Segmentation on PerSeg

Personalized Segmentation Segmentation +4

1,411

Paper
Code

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

7 code implementations • 24 Jan 2022 • Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Ranked #151 on Image Classification on ImageNet

Image Classification object-detection +5

775

Paper
Code

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

2 code implementations • 16 May 2023 • Siyuan Huang, Bo Zhang, Botian Shi, Peng Gao, Yikang Li, Hongsheng Li

In this paper, different from previous 2D DG works, we focus on the 3D DG problem and propose a Single-dataset Unified Generalization (SUG) framework that only leverages a single source dataset to alleviate the unforeseen domain differences faced by a well-trained source model.

3D Point Cloud Classification Domain Generalization +2

563

Paper
Code

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

2 code implementations • 25 Aug 2023 • Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

LWC modulates the extreme values of weights by optimizing the clipping threshold.

Common Sense Reasoning Computational Efficiency +3

547

Paper
Code

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation • 6 Nov 2021 • Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

463

Paper
Code

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

3 code implementations • 19 Jul 2022 • Renrui Zhang, Zhang Wei, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

On top of that, the performance of Tip-Adapter can be further boosted to be state-of-the-art on ImageNet by fine-tuning the cache model for 10$\times$ fewer epochs than existing methods, which is both effective and efficient.

Retrieval Transfer Learning

463

Paper
Code

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

3 code implementations • CVPR 2023 • Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Hongsheng Li, Yu Qiao, Peng Gao

Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge.

Few-Shot Learning Representation Learning

463

Paper
Code

ConvMAE: Masked Convolution Meets Masked Autoencoders

4 code implementations • 8 May 2022 • Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao

Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation.

Computational Efficiency Image Classification +2

449

Paper
Code

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

1 code implementation • 9 Mar 2023 • Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning.

Contrastive Learning

449

Paper
Code

OneLLM: One Framework to Align All Modalities with Language

1 code implementation • 6 Dec 2023 • Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

In detail, we first train an image projection module to connect a vision encoder with LLM.

Ranked #73 on Visual Question Answering on MM-Vet

Question Answering Visual Question Answering

439

Paper
Code

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

2 code implementations • 14 Mar 2023 • Renrui Zhang, Liuhui Wang, Ziyu Guo, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.

Ranked #1 on Training-free 3D Part Segmentation on ShapeNet-Part

3D Point Cloud Classification Training-free 3D Part Segmentation +1

435

Paper
Code

Starting From Non-Parametric Networks for 3D Point Cloud Analysis

1 code implementation • CVPR 2023 • Renrui Zhang, Liuhui Wang, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

435

Paper
Code

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

4 code implementations • 5 Apr 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.

Few-Shot Learning Scene Segmentation +1

435

Paper
Code

You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction

1 code implementation • 30 May 2022 • Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada

Challenging illumination conditions (low-light, under-exposure and over-exposure) in the real world not only cast an unpleasant visual appearance but also taint the computer vision tasks.

Ranked #2 on Image Enhancement on Exposure-Errors

Low-Light Image Enhancement object-detection +2

413

Paper
Code

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

2 code implementations • 9 Oct 2021 • Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Prompt Engineering Representation Learning

383

Paper
Code

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

5 code implementations • 1 Sep 2023 • Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng

We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.

Ranked #5 on 3D Question Answering (3D-QA) on 3D MM-Vet

3D Generation 3D Question Answering (3D-QA) +4

377

Paper
Code

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

1 code implementation • 15 Jun 2023 • Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo

Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning.

Hallucination Image Captioning +3

358

Paper
Code

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

1 code implementation • 7 Aug 2023 • Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo

Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach.

Hallucination Visual Reasoning

358

Paper
Code

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

1 code implementation • ICCV 2023 • Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li

In this paper, we introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.

Autonomous Driving Monocular 3D Object Detection +2

306

Paper
Code

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations • CVPR 2022 • Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

Ranked #3 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Open-Vocabulary Instance Segmentation Few-Shot Learning +6

290

Paper
Code

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

Ranked #2 on 3D Open-Vocabulary Instance Segmentation on STPLS3D

3D Classification 3D Object Detection +11

290

Paper
Code

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

1 code implementation • 18 May 2023 • Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li

This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks.

Language Modelling Large Language Model +2

253

Paper
Code

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #4 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +5

197

Paper
Code

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

2 code implementations • CVPR 2023 • Renrui Zhang, Liuhui Wang, Yu Qiao, Peng Gao, Hongsheng Li

Pre-training by numerous image data has become de-facto for robust 2D representations.

Ranked #2 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Point Cloud Linear Classification Few-Shot 3D Point Cloud Classification

197

Paper
Code

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation • 18 Nov 2020 • Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Clustering Object +2

164

Paper
Code

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations • 19 Jan 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

object-detection Object Detection

164

Paper
Code

Fast Convergence of DETR with Spatially Modulated Co-Attention

1 code implementation • ICCV 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

However, DETR suffers from its slow convergence.

object-detection Object Detection

164

Paper
Code

SCOPE: Scalable Composite Optimization for Learning on Spark

1 code implementation • 30 Jan 2016 • Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li

Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods.

Stochastic Optimization

155

Paper
Code

Frozen CLIP Models are Efficient Video Learners

2 code implementations • 6 Aug 2022 • Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li

Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.

Ranked #26 on Action Classification on Kinetics-400 (using extra training data)

Action Classification Video Recognition

155

Paper
Code

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results

2 code implementations • 22 Dec 2021 • Liang Pan, Tong Wu, Zhongang Cai, Ziwei Liu, Xumin Yu, Yongming Rao, Jiwen Lu, Jie zhou, Mingye Xu, Xiaoyuan Luo, Kexue Fu, Peng Gao, Manning Wang, Yali Wang, Yu Qiao, Junsheng Zhou, Xin Wen, Peng Xiang, Yu-Shen Liu, Zhizhong Han, Yuanjie Yan, Junyi An, Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández, Qinlong Wang, Yang Yang

Based on the MVP dataset, this paper reports methods and results in the Multi-View Partial Point Cloud Challenge 2021 on Completion and Registration.

3D Reconstruction Point Cloud Completion +2

149

Paper
Code

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

1 code implementation • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao

The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks.

Computational Efficiency Few-Shot Learning

116

Paper
Code

Learning Where to Focus for Efficient Video Object Detection

1 code implementation • ECCV 2020 • Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.

Ranked #22 on Video Object Detection on ImageNet VID

Object object-detection +1

Paper
Code

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

1 code implementation • 13 Oct 2022 • Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.

Quantization

Paper
Code

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Ranked #4 on Long-tail Learning on Places-LT (using extra training data)

Contrastive Learning Language Modelling +3

Paper
Code

Container: Context Aggregation Network

4 code implementations • 2 Jun 2021 • Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Ranked #462 on Image Classification on ImageNet

Image Classification Inductive Bias +5

Paper
Code

Container: Context Aggregation Networks

2 code implementations • NeurIPS 2021 • Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Inductive Bias Instance Segmentation +4

Paper
Code

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

1 code implementation • 4 Jan 2024 • Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making.

Data Visualization Decision Making +2

Paper
Code

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

1 code implementation • 25 May 2023 • Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao

In this paper, we propose MUTR, a Multi-modal Unified Temporal transformer for Referring video object segmentation.

Ranked #1 on Referring Expression Segmentation on Referring Expressions for DAVIS 2016 & 2017

Object Referring Expression Segmentation +3

Paper
Code

Prototypical Contrast Adaptation for Domain Adaptive Semantic Segmentation

1 code implementation • 14 Jul 2022 • Zhengkai Jiang, Yuxi Li, Ceyuan Yang, Peng Gao, Yabiao Wang, Ying Tai, Chengjie Wang

Unsupervised Domain Adaptation (UDA) aims to adapt the model trained on the labeled source domain to an unlabeled target domain.

Ranked #14 on Unsupervised Domain Adaptation on SYNTHIA-to-Cityscapes

Contrastive Learning Semantic Segmentation +1

Paper
Code

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

1 code implementation • 24 Aug 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao

However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.

3D Semantic Segmentation Few-shot 3D semantic segmentation +1

Paper
Code

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

1 code implementation • Briefings in Bioinformatics 2021 • Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song

In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level.

Drug Discovery

Paper
Code

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

1 code implementation • 29 Mar 2024 • Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

Instruction Following Language Modelling +4

Paper
Code

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

1 code implementation • CVPR 2023 • Sheng Xu, Yanjing Li, Mingbao Lin, Peng Gao, Guodong Guo, Jinhu Lu, Baochang Zhang

At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.

object-detection Object Detection +1

Paper
Code

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

1 code implementation • 2 Mar 2023 • Rongyao Fang, Peng Gao, Aojun Zhou, Yingjie Cai, Si Liu, Jifeng Dai, Hongsheng Li

The first method is One-to-many Matching via Data Augmentation (denoted as DataAug-DETR).

Data Augmentation object-detection +1

Paper
Code

RomeBERT: Robust Training of Multi-Exit BERT

1 code implementation • 24 Jan 2021 • Shijie Geng, Peng Gao, Zuohui Fu, Yongfeng Zhang

In this paper, we leverage gradient regularized self-distillation for RObust training of Multi-Exit BERT (RomeBERT), which can effectively solve the performance imbalance problem between early and late exits.

Natural Language Understanding

Paper
Code

POS-BERT: Point Cloud One-Stage BERT Pre-Training

1 code implementation • 3 Apr 2022 • Kexue Fu, Peng Gao, Shaolei Liu, Renrui Zhang, Yu Qiao, Manning Wang

We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process.

Contrastive Learning Language Modelling +3

Paper
Code

Recurrent Bilinear Optimization for Binary Neural Networks

2 code implementations • 4 Sep 2022 • Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lv, Guodong Guo

To address this issue, Recurrent Bilinear Optimization is proposed to improve the learning process of BNNs (RBONNs) by associating the intrinsic bilinear variables in the back propagation process.

object-detection Object Detection

Paper
Code

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

1 code implementation • 29 Nov 2023 • Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.

Language Modelling Large Language Model

Paper
Code

IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors

1 code implementation • 7 Oct 2022 • Sheng Xu, Yanjing Li, Bohan Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Peng Gao, Jinhu Lv

This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student.

Knowledge Distillation object-detection +1

Paper
Code

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

1 code implementation • 8 Jul 2022 • Tong Zhang, Peng Gao, Hao Dong, Yin Zhuang, Guanqun Wang, Wei zhang, He Chen

Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning.

Land Cover Classification object-detection +3

Paper
Code

Resilient Binary Neural Network

1 code implementation • 2 Feb 2023 • Sheng Xu, Yanjing Li, Teli Ma, Mingbao Lin, Hao Dong, Baochang Zhang, Peng Gao, Jinhu Lv

In this paper, we introduce a Resilient Binary Neural Network (ReBNN) to mitigate the frequent oscillation for better BNNs' training.

Paper
Code

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

1 code implementation • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

Paper
Code

SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection

1 code implementation • 25 Jun 2018 • Peng Gao, Xusheng Xiao, Ding Li, Zhichun Li, Kangkook Jee, Zhen-Yu Wu, Chung Hwan Kim, Sanjeev R. Kulkarni, Prateek Mittal

To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies.

Cryptography and Security Databases

Paper
Code

Masked AutoDecoder is Effective Multi-Task Vision Generalist

1 code implementation • 12 Mar 2024 • Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu

Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction.

Paper
Code

Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence

1 code implementation • 26 Oct 2020 • Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, Zheng Qin, Fengyuan Xu, Prateek Mittal, Sanjeev R. Kulkarni, Dawn Song

Log-based cyber threat hunting has emerged as an important solution to counter sophisticated attacks.

Paper
Code

A Complementary Tracking Model with Multiple Features

no code implementations • 20 Apr 2018 • Peng Gao, Yipeng Ma, Chao Li, Ke Song, Fei Wang, Liyi Xiao

Discriminative Correlation Filters based tracking algorithms exploiting conventional handcrafted features have achieved impressive results both in terms of accuracy and robustness.

Visual Tracking

Paper
Add Code

High Performance Visual Tracking with Circular and Structural Operators

no code implementations • 23 Apr 2018 • Peng Gao, Yipeng Ma, Ke Song, Chao Li, Fei Wang, Liyi Xiao, Yan Zhang

Based on the proposed circular and structural operators, a set of primal confidence score maps can be obtained by circular correlating feature maps with their corresponding structural correlation filters.

Computational Efficiency Visual Tracking +1

Paper
Add Code

A Novel Low-cost FPGA-based Real-time Object Tracking System

no code implementations • 16 Apr 2018 • Peng Gao, Ruyue Yuan, Zhicong Lin, Linsheng Zhang, Yan Zhang

In current visual object tracking system, the CPU or GPU-based visual object tracking systems have high computational cost and consume a prohibitive amount of power.

Object Visual Object Tracking

Paper
Add Code

A Novel Parallel Ray-Casting Algorithm

no code implementations • 16 Apr 2018 • Yan Zhang, Peng Gao, Xiao-Qing Li

The Ray-Casting algorithm is an important method for fast real-time surface display from 3D medical images.

Paper
Add Code

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

no code implementations • 19 Apr 2018 • Peng Gao, Yipeng Ma, Ke Song, Chao Li, Fei Wang, Liyi Xiao

To the best of our knowledge, we are the first to incorporate the advantages of DCF and SOSVM for TIR object tracking.

Object Thermal Infrared Object Tracking

Paper
Add Code

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Ranked #14 on Visual Question Answering (VQA) on CLEVR

Question Answering Visual Question Answering

Paper
Add Code

FPGA-based Acceleration System for Visual Tracking

no code implementations • 12 Oct 2018 • Ke Song, Chun Yuan, Peng Gao, Yunxu Sun

In order to improve the tracking speed and reduce the overall power consumption of visual tracking, this paper proposes a real-time visual tracking algorithm based on DSST(Discriminative Scale Space Tracking) approach.

Real-Time Visual Tracking

Paper
Add Code

Efficient Multi-level Correlating for Visual Tracking

no code implementations • 13 Oct 2018 • Yipeng Ma, Chun Yuan, Peng Gao, Fei Wang

Correlation filter (CF) based tracking algorithms have demonstrated favorable performance recently.

Visual Tracking

Paper
Add Code

Siamese Attentional Keypoint Network for High Performance Visual Tracking

no code implementations • 23 Apr 2019 • Peng Gao, Ruyue Yuan, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

In this paper, we investigate the impacts of three main aspects of visual tracking, i. e., the backbone network, the attentional mechanism, and the detection component, and propose a Siamese Attentional Keypoint Network, dubbed SATIN, for efficient tracking and accurate localization.

Visual Tracking Vocal Bursts Intensity Prediction

Paper
Add Code

Learning Cascaded Siamese Networks for High Performance Visual Tracking

no code implementations • 8 May 2019 • Peng Gao, Yipeng Ma, Ruyue Yuan, Liyi Xiao, Fei Wang

In order to achieve high performance visual tracking in various negative scenarios, a novel cascaded Siamese network is proposed and developed based on two different deep learning networks: a matching subnetwork and a classification subnetwork.

Classification General Classification +2

Paper
Add Code

FPGA-based Binocular Image Feature Extraction and Matching System

no code implementations • 13 May 2019 • Qi Ni, Fei Wang, Ziwei Zhao, Peng Gao

Image feature extraction and matching is a fundamental but computation intensive task in machine vision.

Image Compression Stereo Matching +1

Paper
Add Code

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering

no code implementations • CVPR 2019 • Peng Gao, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven C. H. Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Question Answering Visual Question Answering

Paper
Add Code

Joint Multi-frame Detection and Segmentation for Multi-cell Tracking

no code implementations • 26 Jun 2019 • Zibin Zhou, Fei Wang, Wenjuan Xi, Huaying Chen, Peng Gao, Chengkang He

Another UNet is utilized to acquire primary segmentation.

Cell Tracking Mitosis Detection +1

Paper
Add Code

Multi-modality Latent Interaction Network for Visual Question Answering

no code implementations • ICCV 2019 • Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.

Language Modelling Question Answering +1

Paper
Add Code

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

no code implementations • 27 Aug 2019 • Peng Gao, Qiquan Zhang, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge.

Visual Tracking

Paper
Add Code

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

no code implementations • 27 Aug 2019 • Zhencai Hu, Peng Gao, Fei Wang

To solve the problem of dimensional explosion in the air combat, the proposed method is implemented through feature selection, trajectory sampling, function approximation and Bellman backup operation in the air combat simulation environment.

Decision Making feature selection

Paper
Add Code

Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks

no code implementations • WS 2019 • Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, Guotong Xie

We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention.

Ranked #17 on Common Sense Reasoning on ReCoRD

Common Sense Reasoning Machine Reading Comprehension +2

Paper
Add Code

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

no code implementations • 3 Jan 2020 • Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Question Answering Video Description +1

Paper
Add Code

Extreme Low-Light Imaging with Multi-granulation Cooperative Networks

no code implementations • 16 May 2020 • Keqi Wang, Peng Gao, Steven Hoi, Qian Guo, Yuhua Qian

Low-light imaging is challenging since images may appear to be dark and noised due to low signal-to-noise ratio, complex image content, and the variety in shooting scenes in extreme low-light condition.

Paper
Add Code

Character Matters: Video Story Understanding with Character-Aware Relations

no code implementations • 9 May 2020 • Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo

Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots.

Question Answering

Paper
Add Code

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

no code implementations • 8 Jul 2020 • Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.

Answer Generation Graph Representation Learning

Paper
Add Code

Reconstruction Regularized Deep Metric Learning for Multi-label Image Classification

no code implementations • 27 Jul 2020 • Changsheng Li, Chong Liu, Lixin Duan, Peng Gao, Kai Zheng

In this paper, we present a novel deep metric learning method to tackle the multi-label image classification problem.

General Classification Metric Learning +1

Paper
Add Code

Contrastive Visual-Linguistic Pretraining

no code implementations • 26 Jul 2020 • Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su

We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning.

Contrastive Learning regression +2

Paper
Add Code

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

no code implementations • 25 Jul 2020 • Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang

At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains.

Contrastive Learning Domain Adaptation

Paper
Add Code

Multi-Pass Transformer for Machine Translation

no code implementations • 23 Sep 2020 • Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux

In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.

Machine Translation Neural Architecture Search +1

Paper
Add Code

A Predictive Autoscaler for Elastic Batch Jobs

no code implementations • 10 Oct 2020 • Peng Gao

Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service.

Scheduling Time Series +1

Paper
Add Code

Multi-view Sensor Fusion by Integrating Model-based Estimation and Graph Learning for Collaborative Object Localization

no code implementations • 16 Nov 2020 • Peng Gao, Rui Guo, HongSheng Lu, Hao Zhang

Collaborative object localization aims to collaboratively estimate locations of objects observed from multiple views or perspectives, which is a critical ability for multi-agent systems such as connected vehicles.

Autonomous Driving Graph Learning +3

Paper
Add Code

Pre-training Entity Relation Encoder with Intra-span and Inter-span Information

no code implementations • EMNLP 2020 • Yijun Wang, Changzhi Sun, Yuanbin Wu, Junchi Yan, Peng Gao, Guotong Xie

In particular, a span encoder is trained to recover a random shuffling of tokens in a span, and a span pair encoder is trained to predict positive pairs that are from the same sentences and negative pairs that are from different sentences using contrastive loss.

Relation Relation Extraction +1

Paper
Add Code

Semi-supervised Active Learning for Instance Segmentation via Scoring Predictions

no code implementations • 9 Dec 2020 • Jun Wang, Shaoguo Wen, Kaixing Chen, Jianghua Yu, Xin Zhou, Peng Gao, Changsheng Li, Guotong Xie

Active learning generally involves querying the most representative samples for human labeling, which has been widely studied in many fields such as image classification and object detection.

Active Learning Image Classification +5

Paper
Add Code

Learn molecular representations from large-scale unlabeled molecules for drug discovery

no code implementations • 21 Dec 2020 • Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song

Here, we proposed a novel Molecular Pre-training Graph-based deep learning framework, named MPG, that leans molecular representations from large-scale unlabeled molecules.

Drug Discovery

Paper
Add Code

A System for Efficiently Hunting for Cyber Threats in Computer Systems Using Threat Intelligence

no code implementations • 17 Jan 2021 • Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, Haoyuan Liu, Zheng Qin, Fengyuan Xu, Prateek Mittal, Sanjeev R. Kulkarni, Dawn Song

Log-based cyber threat hunting has emerged as an important solution to counter sophisticated cyber attacks.

Paper
Add Code

A System for Automated Open-Source Threat Intelligence Gathering and Management

no code implementations • 19 Jan 2021 • Peng Gao, Xiaoyuan Liu, Edward Choi, Bhavna Soman, Chinmaya Mishra, Kate Farris, Dawn Song

SecurityKG collects OSCTI reports from various sources, uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors, and constructs a security knowledge graph.

Management

Paper
Add Code

Sharp upper bounds for moments of quadratic Dirichlet $L$-functions

no code implementations • 21 Jan 2021 • Peng Gao

We establish unconditional sharp upper bounds of the $k$-th moments of the family of quadratic Dirichlet $L$-functions at the central point for $0 \leq k \leq 2$.

Number Theory

Paper
Add Code

Self-supervised learning for fast and scalable time series hyper-parameter tuning

no code implementations • 10 Feb 2021 • Peiyi Zhang, Xiaodong Jiang, Ginger M Holt, Nikolay Pavlovich Laptev, Caner Komurlu, Peng Gao, Yang Yu

Hyper-parameters of time series models play an important role in time series analysis.

Self-Supervised Learning Time Series +1

Paper
Add Code

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex

no code implementations • 5 May 2021 • Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, Rong Xiao

This paper presents our solution for the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX.

Data Augmentation Scene Text Recognition

Paper
Add Code

Dual-stream Network for Visual Recognition

no code implementations • NeurIPS 2021 • Mingyuan Mao, Renrui Zhang, Honghui Zheng, Peng Gao, Teli Ma, Yan Peng, Errui Ding, Baochang Zhang, Shumin Han

Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images.

Image Classification Instance Segmentation +3

Paper
Add Code

Scalable Transformers for Neural Machine Translation

no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation NMT +1

Paper
Add Code

Oriented Object Detection with Transformer

no code implementations • 6 Jun 2021 • Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN.

Object object-detection +2

Paper
Add Code

Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model

no code implementations • 24 Jun 2021 • Yixuan Qiao, Hao Chen, Jun Wang, Yihao Chen, Xianbin Ye, Ziliang Li, Xianbiao Qi, Peng Gao, Guotong Xie

TextVQA requires models to read and reason about text in images to answer questions about them.

Language Modelling Masked Language Modeling +2

Paper
Add Code

ClueReader: Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension

no code implementations • 2 Jul 2021 • Peng Gao, Feng Gao, Peng Wang, Jian-Cheng Ni, Fei Wang, Hamido Fujita

Multi-hop machine reading comprehension is a challenging task in natural language processing as it requires more reasoning ability across multiple documents.

Graph Attention Machine Reading Comprehension

Paper
Add Code

Dense Contrastive Visual-Linguistic Pretraining

no code implementations • 24 Sep 2021 • Lei Shi, Kai Shuang, Shijie Geng, Peng Gao, Zuohui Fu, Gerard de Melo, Yunpeng Chen, Sen Su

To overcome these issues, we propose unbiased Dense Contrastive Visual-Linguistic Pretraining (DCVLP), which replaces the region regression and classification with cross-modality region contrastive learning that requires no annotations.

Contrastive Learning Data Augmentation +2

Paper
Add Code

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations • 13 Oct 2021 • Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

Paper
Add Code

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks

no code implementations • 26 Oct 2021 • Pengyong Li, Jun Wang, Ziliang Li, Yixuan Qiao, Xianggen Liu, Fei Ma, Peng Gao, Seng Song, Guotong Xie

Self-supervised learning has gradually emerged as a powerful technique for graph representation learning.

Binary Classification Graph Classification +2

Paper
Add Code

Asynchronous Collaborative Localization by Integrating Spatiotemporal Graph Learning with Model-Based Estimation

no code implementations • 5 Nov 2021 • Peng Gao, Brian Reily, Rui Guo, HongSheng Lu, Qingzhao Zhu, Hao Zhang

In this paper, we introduce a novel approach that integrates uncertainty-aware spatiotemporal graph learning and model-based state estimation for a team of robots to collaboratively localize objects.

Graph Learning Object +1

Paper
Add Code

Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks

no code implementations • 9 Dec 2021 • Jun Wang, Zhoujing Li, Yixuan Qiao, Qiming Qin, Peng Gao, Guotong Xie

This paper presents a novel superpixel based approach combining DNN and a modified segmentation method, to detect damaged buildings from VHR imagery.

Denoising Semantic Similarity +1

Paper
Add Code

RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images

no code implementations • 7 Jan 2022 • Ziteng Cui, Yingying Zhu, Lin Gu, Guo-Jun Qi, Xiaoxiao Li, Peng Gao, Zenghui Zhang, Tatsuya Harada

Image restoration algorithms such as super resolution (SR) are indispensable pre-processing modules for object detection in degraded images.

Image Restoration Object +4

Paper
Add Code

TerViT: An Efficient Ternary Vision Transformer

no code implementations • 20 Jan 2022 • Sheng Xu, Yanjing Li, Teli Ma, Bohan Zeng, Baochang Zhang, Peng Gao, Jinhu Lv

Vision transformers (ViTs) have demonstrated great potential in various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.

Paper
Add Code

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

no code implementations • 9 Feb 2022 • Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang

Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method.

Contrastive Learning Knowledge Distillation +1

Paper
Add Code

CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer

no code implementations • 2 Mar 2022 • Xianbin Ye, Ziliang Li, Fei Ma, Zongbi Yi, Pengyong Li, Jun Wang, Peng Gao, Yixuan Qiao, Guotong Xie

Anti-cancer drug discoveries have been serendipitous, we sought to present the Open Molecular Graph Learning Benchmark, named CandidateDrug4Cancer, a challenging and realistic benchmark dataset to facilitate scalable, robust, and reproducible graph machine learning research for anti-cancer drug discovery.

Drug Discovery Graph Learning

Paper
Add Code

Learning Decoupling Features Through Orthogonality Regularization

no code implementations • 31 Mar 2022 • Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou

Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively.

Keyword Spotting Speaker Verification

Paper
Add Code

PASH at TREC 2021 Deep Learning Track: Generative Enhanced Model for Multi-stage Ranking

no code implementations • 18 May 2022 • Yixuan Qiao, Hao Chen, Jun Wang, Yongquan Lai, Tuozhen Liu, Xianbin Ye, Xin Tang, Rui Fang, Peng Gao, Wenfeng Xie, Guotong Xie

This paper describes the PASH participation in TREC 2021 Deep Learning Track.

General Knowledge Retrieval

Paper
Add Code

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models

no code implementations • SemEval (NAACL) 2022 • Changyu Hou, Jun Wang, Yixuan Qiao, Peng Jiang, Peng Gao, Guotong Xie, Qizhi Lin, Xiaopeng Wang, Xiandi Jiang, Benqi Wang, Qifeng Xiao

By assigning different weights to each model for different inputs, we adopted the Transformer layer to integrate the advantages of diverse models effectively.

Low Resource Named Entity Recognition named-entity-recognition +2

Paper
Add Code

Collaboration of Pre-trained Models Makes Better Few-shot Learner

no code implementations • 25 Sep 2022 • Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao

In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.

Few-Shot Learning Representation Learning

Paper
Add Code

HCL: Improving Graph Representation with Hierarchical Contrastive Learning

no code implementations • 21 Oct 2022 • Jun Wang, Weixun Li, Changyu Hou, Xin Tang, Yixuan Qiao, Rui Fang, Pengyong Li, Peng Gao, Guotong Xie

Contrastive learning has emerged as a powerful tool for graph representation learning.

Contrastive Learning Graph Classification +3

Paper
Add Code

Zebra: Deeply Integrating System-Level Provenance Search and Tracking for Efficient Attack Investigation

no code implementations • 10 Nov 2022 • Xinyu Yang, Haoyuan Liu, Ziyu Wang, Peng Gao

System auditing has emerged as a key approach for monitoring system call events and investigating sophisticated attacks.

Paper
Add Code

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Paper
Add Code

Medical Knowledge Graph QA for Drug-Drug Interaction Prediction based on Multi-hop Machine Reading Comprehension

no code implementations • 19 Dec 2022 • Peng Gao, Feng Gao, Jian-Cheng Ni, Yu Wang, Fei Wang

Drug-drug interaction prediction is a crucial issue in molecular biology.

Entity Embeddings Graph Question Answering +2

Paper
Add Code

Exploring Representation Learning for Small-Footprint Keyword Spotting

no code implementations • 20 Mar 2023 • Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.

Contrastive Learning Representation Learning +1

Paper
Add Code

OGMN: Occlusion-guided Multi-task Network for Object Detection in UAV Images

no code implementations • 24 Apr 2023 • Xuexue Li, Wenhui Diao, Yongqiang Mao, Peng Gao, Xiuhua Mao, Xinming Li, Xian Sun

One interaction for the guide is between two task decoders to address the feature confusion problem, and an occlusion decoupling head (ODH) is proposed to replace the general detection head.

object-detection Object Detection +1

Paper
Add Code

Filter Pruning via Filters Similarity in Consecutive Layers

no code implementations • 26 Apr 2023 • Xiaorui Wang, Jun Wang, Xin Tang, Peng Gao, Rui Fang, Guotong Xie

Filter pruning is widely adopted to compress and accelerate the Convolutional Neural Networks (CNNs), but most previous works ignore the relationship between filters and channels in different layers.

Paper
Add Code

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

no code implementations • 28 Jun 2023 • Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer.

Dimensionality Reduction Speech Extraction

Paper
Add Code

SparseMAE: Sparse Training Meets Masked Autoencoders

no code implementations • ICCV 2023 • Aojun Zhou, Yang Li, Zipeng Qin, Jianbo Liu, Junting Pan, Renrui Zhang, Rui Zhao, Peng Gao, Hongsheng Li

In this paper, we aim to reduce model complexity from large vision transformers pretrained by MAE with assistant of sparse training.

Paper
Add Code

Improving Compositional Text-to-image Generation with Large Vision-Language Models

no code implementations • 10 Oct 2023 • Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas

However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships.

Attribute Text-to-Image Generation

Paper
Add Code

Digital Life Project: Autonomous 3D Characters with Social Intelligence

no code implementations • 7 Dec 2023 • Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.

Motion Captioning Motion Synthesis

Paper
Add Code

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V

no code implementations • 15 Dec 2023 • Dingning Liu, Xiaomeng Dong, Renrui Zhang, Xu Luo, Peng Gao, Xiaoshui Huang, Yongshun Gong, Zhihui Wang

In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks.

3D Object Detection object-detection +1

Paper
Add Code

Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction

no code implementations • 21 Dec 2023 • Peng Gao, Ahmed Jaafar, Brian Reily, Christopher Reardon, Hao Zhang

However, visual observations of an object may not be available when it is referred to, and the number of objects and attributes may also be unbounded in open worlds.

16k Attribute +3

Paper
Add Code

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

no code implementations • 21 Dec 2023 • Senqiao Yang, Jiaming Liu, Ray Zhang, Mingjie Pan, Zoey Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Yandong Guo, Shanghang Zhang

In this paper, we introduce LiDAR-LLM, which takes raw LiDAR data as input and harnesses the remarkable reasoning capabilities of LLMs to gain a comprehensive understanding of outdoor 3D scenes.

Instruction Following Language Modelling +1

Paper
Add Code

Vision-Language Navigation with Embodied Intelligence: A Survey

no code implementations • 22 Feb 2024 • Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuan

As a long-term vision in the field of artificial intelligence, the core goal of embodied intelligence is to improve the perception, understanding, and interaction capabilities of agents and the environment.

Vision-Language Navigation

Paper
Add Code

Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

no code implementations • 22 Feb 2024 • Peng Gao, Tao Yu, Fei Wang, Ru-Yue Yuan

Designing distributed filtering circuits (DFCs) is complex and time-consuming, with the circuit performance relying heavily on the expertise and experience of electronics engineers.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

YOLO-TLA: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

no code implementations • 22 Feb 2024 • Peng Gao, Chun-Lin Ji, Tao Yu, Ru-Yue Yuan

Additionally, we have incorporated a global attention mechanism into the backbone network.

Object object-detection +1

Paper
Add Code

In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking

no code implementations • 27 Feb 2024 • Peng Gao, Shi-Min Li, Feng Gao, Fei Wang, Ru-Yue Yuan, Hamido Fujita

Deep learning-based methods monopolize the latest research in the field of thermal infrared (TIR) object tracking.

Object Thermal Infrared Object Tracking

Paper
Add Code

Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking

no code implementations • 26 Feb 2024 • Peng Gao, Xiao Liu, Yu Wang, Ru-Yue Yuan

To expedite the search process, a random channel selection strategy is employed prior to assessing operation candidates.

Paper
Add Code

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

no code implementations • 21 Mar 2024 • Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

To this end, we introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs.

Math Mathematical Reasoning

Paper
Add Code

CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

no code implementations • 26 Mar 2024 • Yongrui Yu, HanYu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node generation and the nnU-Net model for lymph node segmentation to improve the segmentation performance of abdominal lymph nodes through synthesizing a diversity of realistic abdominal lymph node data.

Denoising Image Generation +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.