Search Results for author: Peng Gao

Found 146 papers, 69 papers with code

Pre-training Entity Relation Encoder with Intra-span and Inter-span Information

no code implementations EMNLP 2020 Yijun Wang, Changzhi Sun, Yuanbin Wu, Junchi Yan, Peng Gao, Guotong Xie

In particular, a span encoder is trained to recover a random shuffling of tokens in a span, and a span pair encoder is trained to predict positive pairs that are from the same sentences and negative pairs that are from different sentences using contrastive loss.

Relation Relation Extraction +1

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

1 code implementation25 May 2024 Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li

Large Language Models (LLMs) have become pivotal in advancing the field of artificial intelligence, yet their immense sizes pose significant challenges for both fine-tuning and deployment.

TerDiT: Ternary Diffusion Models with Transformers

1 code implementation23 May 2024 Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li

Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs).

Image Generation Quantization

Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification

no code implementations21 May 2024 Peng Gao, Yujian Lee, HUI ZHANG, Xubo Liu, Yiyang Hu, Guquan Jing

Effectively minimizing these cross-modal discrepancies relies on obtaining representations that are guided by identity and consistent across modalities, while also filtering out representations that are irrelevant to identity.

Person Re-Identification

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

2 code implementations9 May 2024 Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

1 code implementation8 May 2024 Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao

In this work, we propose FreeBind, an idea that treats multimodal representation spaces as basic units, and freely augments pre-trained unified space by integrating knowledge from extra expert spaces via "space bonds".

An Analysis of Recent Advances in Deepfake Image Detection in an Evolving Threat Landscape

1 code implementation24 Apr 2024 Sifat Muhammad Abdullah, Aravind Cheruvu, Shravya Kanchi, Taejoong Chung, Peng Gao, Murtuza Jadliwala, Bimal Viswanath

Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses.

Adversarial Attack Face Swapping

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

7 code implementations11 Apr 2024 Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.

Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want

1 code implementation29 Mar 2024 Weifeng Lin, Xinyu Wei, Ruichuan An, Peng Gao, Bocheng Zou, Yulin Luo, Siyuan Huang, Shanghang Zhang, Hongsheng Li

In this paper, we introduce the Draw-and-Understand project: a new model, a multi-domain dataset, and a challenging benchmark for visual prompting.

Instruction Following Language Modelling +4

CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

no code implementations26 Mar 2024 Yongrui Yu, HanYu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node generation and the nnU-Net model for lymph node segmentation to improve the segmentation performance of abdominal lymph nodes through synthesizing a diversity of realistic abdominal lymph node data.

Denoising Image Generation +4

Masked AutoDecoder is Effective Multi-Task Vision Generalist

1 code implementation CVPR 2024 Han Qiu, Jiaxing Huang, Peng Gao, Lewei Lu, Xiaoqin Zhang, Shijian Lu

Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction.

Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking

no code implementations26 Feb 2024 Peng Gao, Xiao Liu, Yu Wang, Ru-Yue Yuan

To expedite the search process, a random channel selection strategy is employed prior to assessing operation candidates.

Automated Design and Optimization of Distributed Filtering Circuits via Reinforcement Learning

no code implementations22 Feb 2024 Peng Gao, Tao Yu, Fei Wang, Ru-Yue Yuan

Designing distributed filtering circuits (DFCs) is complex and time-consuming, with the circuit performance relying heavily on the expertise and experience of electronics engineers.

reinforcement-learning Reinforcement Learning (RL)

Vision-Language Navigation with Embodied Intelligence: A Survey

no code implementations22 Feb 2024 Peng Gao, Peng Wang, Feng Gao, Fei Wang, Ruyue Yuan

As a long-term vision in the field of artificial intelligence, the core goal of embodied intelligence is to improve the perception, understanding, and interaction capabilities of agents and the environment.

Vision-Language Navigation

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

2 code implementations18 Feb 2024 Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

Compositional Zero-Shot Learning for Attribute-Based Object Reference in Human-Robot Interaction

no code implementations21 Dec 2023 Peng Gao, Ahmed Jaafar, Brian Reily, Christopher Reardon, Hao Zhang

However, visual observations of an object may not be available when it is referred to, and the number of objects and attributes may also be unbounded in open worlds.

16k Attribute +3

LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding

no code implementations21 Dec 2023 Senqiao Yang, Jiaming Liu, Ray Zhang, Mingjie Pan, Zoey Guo, Xiaoqi Li, Zehui Chen, Peng Gao, Yandong Guo, Shanghang Zhang

In this paper, we introduce LiDAR-LLM, which takes raw LiDAR data as input and harnesses the remarkable reasoning capabilities of LLMs to gain a comprehensive understanding of outdoor 3D scenes.

Instruction Following Language Modelling +1

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V

no code implementations15 Dec 2023 Dingning Liu, Xiaomeng Dong, Renrui Zhang, Xu Luo, Peng Gao, Xiaoshui Huang, Yongshun Gong, Zhihui Wang

In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks.

3D Object Detection object-detection +1

Digital Life Project: Autonomous 3D Characters with Social Intelligence

no code implementations CVPR 2024 Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.

Motion Captioning Motion Synthesis

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation

1 code implementation29 Nov 2023 Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo

Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.

Image Generation Language Modelling +1

Improving Compositional Text-to-image Generation with Large Vision-Language Models

no code implementations10 Oct 2023 Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas

However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships.

Attribute Text-to-Image Generation

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks

1 code implementation24 Aug 2023 Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao

However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.

3D Semantic Segmentation Few-shot 3D semantic segmentation +1

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

1 code implementation7 Aug 2023 Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo

Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach.

Hallucination Visual Reasoning

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

no code implementations28 Jun 2023 Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer.

Dimensionality Reduction Speech Extraction

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

1 code implementation18 May 2023 Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li

This paper presents Instruct2Act, a framework that utilizes Large Language Models to map multi-modal instructions to sequential actions for robotic manipulation tasks.

Language Modelling Large Language Model +2

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

2 code implementations16 May 2023 Siyuan Huang, Bo Zhang, Botian Shi, Peng Gao, Yikang Li, Hongsheng Li

In this paper, different from previous 2D DG works, we focus on the 3D DG problem and propose a Single-dataset Unified Generalization (SUG) framework that only leverages a single source dataset to alleviate the unforeseen domain differences faced by a well-trained source model.

3D Point Cloud Classification Domain Generalization +2

Personalize Segment Anything Model with One Shot

1 code implementation4 May 2023 Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.

Personalized Segmentation Segmentation +4

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

3 code implementations28 Apr 2023 Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.

Instruction Following Optical Character Recognition (OCR) +7

Filter Pruning via Filters Similarity in Consecutive Layers

no code implementations26 Apr 2023 Xiaorui Wang, Jun Wang, Xin Tang, Peng Gao, Rui Fang, Guotong Xie

Filter pruning is widely adopted to compress and accelerate the Convolutional Neural Networks (CNNs), but most previous works ignore the relationship between filters and channels in different layers.

OGMN: Occlusion-guided Multi-task Network for Object Detection in UAV Images

no code implementations24 Apr 2023 Xuexue Li, Wenhui Diao, Yongqiang Mao, Peng Gao, Xiuhua Mao, Xinming Li, Xian Sun

One interaction for the guide is between two task decoders to address the feature confusion problem, and an occlusion decoupling head (ODH) is proposed to replace the general detection head.

object-detection Object Detection +1

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

1 code implementation CVPR 2023 Sheng Xu, Yanjing Li, Mingbao Lin, Peng Gao, Guodong Guo, Jinhu Lu, Baochang Zhang

At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.

object-detection Object Detection +1

Exploring Representation Learning for Small-Footprint Keyword Spotting

no code implementations20 Mar 2023 Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.

Contrastive Learning Representation Learning +1

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

2 code implementations14 Mar 2023 Renrui Zhang, Liuhui Wang, Ziyu Guo, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.

Supervised Only 3D Point Cloud Classification Training-free 3D Part Segmentation +1

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

1 code implementation9 Mar 2023 Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning.

Contrastive Learning Decoder

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

3 code implementations CVPR 2023 Renrui Zhang, Xiangfei Hu, Bohao Li, Siyuan Huang, Hanqiu Deng, Hongsheng Li, Yu Qiao, Peng Gao

Our CaFo incorporates CLIP's language-contrastive knowledge, DINO's vision-contrastive knowledge, DALL-E's vision-generative knowledge, and GPT-3's language-generative knowledge.

Few-Shot Learning Representation Learning

Resilient Binary Neural Network

1 code implementation2 Feb 2023 Sheng Xu, Yanjing Li, Teli Ma, Mingbao Lin, Hao Dong, Baochang Zhang, Peng Gao, Jinhu Lv

In this paper, we introduce a Resilient Binary Neural Network (ReBNN) to mitigate the frequent oscillation for better BNNs' training.

SparseMAE: Sparse Training Meets Masked Autoencoders

no code implementations ICCV 2023 Aojun Zhou, Yang Li, Zipeng Qin, Jianbo Liu, Junting Pan, Renrui Zhang, Rui Zhao, Peng Gao, Hongsheng Li

In this paper, we aim to reduce model complexity from large vision transformers pretrained by MAE with assistant of sparse training.

Starting From Non-Parametric Networks for 3D Point Cloud Analysis

1 code implementation CVPR 2023 Renrui Zhang, Liuhui Wang, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi

We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

2 code implementations ICCV 2023 Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

3D Classification 3D Object Detection +11

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations CVPR 2023 Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Zebra: Deeply Integrating System-Level Provenance Search and Tracking for Efficient Attack Investigation

no code implementations10 Nov 2022 Xinyu Yang, Haoyuan Liu, Ziyu Wang, Peng Gao

System auditing has emerged as a key approach for monitoring system call events and investigating sophisticated attacks.

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

1 code implementation13 Oct 2022 Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, Guodong Guo

The large pre-trained vision transformers (ViTs) have demonstrated remarkable performance on various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.


IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors

1 code implementation7 Oct 2022 Sheng Xu, Yanjing Li, Bohan Zeng, Teli Ma, Baochang Zhang, Xianbin Cao, Peng Gao, Jinhu Lv

This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student.

Knowledge Distillation object-detection +1

Collaboration of Pre-trained Models Makes Better Few-shot Learner

no code implementations25 Sep 2022 Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao

In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.

Few-Shot Learning Representation Learning

Recurrent Bilinear Optimization for Binary Neural Networks

2 code implementations4 Sep 2022 Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lv, Guodong Guo

To address this issue, Recurrent Bilinear Optimization is proposed to improve the learning process of BNNs (RBONNs) by associating the intrinsic bilinear variables in the back propagation process.

object-detection Object Detection

Frozen CLIP Models are Efficient Video Learners

2 code implementations6 Aug 2022 Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li

Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.

Ranked #27 on Action Classification on Kinetics-400 (using extra training data)

Action Classification Decoder +1

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

3 code implementations19 Jul 2022 Renrui Zhang, Zhang Wei, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

On top of that, the performance of Tip-Adapter can be further boosted to be state-of-the-art on ImageNet by fine-tuning the cache model for 10$\times$ fewer epochs than existing methods, which is both effective and efficient.

Retrieval Transfer Learning

Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

1 code implementation8 Jul 2022 Tong Zhang, Peng Gao, Hao Dong, Yin Zhuang, Guanqun Wang, Wei zhang, He Chen

Currently, under supervised learning, a model pretrained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated the knowledge transfer learning.

Land Cover Classification object-detection +3

You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction

1 code implementation30 May 2022 Ziteng Cui, Kunchang Li, Lin Gu, Shenghan Su, Peng Gao, Zhengkai Jiang, Yu Qiao, Tatsuya Harada

Challenging illumination conditions (low-light, under-exposure and over-exposure) in the real world not only cast an unpleasant visual appearance but also taint the computer vision tasks.

Low-Light Image Enhancement object-detection +2

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations28 May 2022 Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #5 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +6

ConvMAE: Masked Convolution Meets Masked Autoencoders

4 code implementations8 May 2022 Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao

Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation.

Computational Efficiency Image Classification +2

POS-BERT: Point Cloud One-Stage BERT Pre-Training

1 code implementation3 Apr 2022 Kexue Fu, Peng Gao, Shaolei Liu, Renrui Zhang, Yu Qiao, Manning Wang

We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process.

Contrastive Learning Language Modelling +3

Learning Decoupling Features Through Orthogonality Regularization

no code implementations31 Mar 2022 Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou

Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively.

Keyword Spotting Speaker Verification

CandidateDrug4Cancer: An Open Molecular Graph Learning Benchmark on Drug Discovery for Cancer

no code implementations2 Mar 2022 Xianbin Ye, Ziliang Li, Fei Ma, Zongbi Yi, Pengyong Li, Jun Wang, Peng Gao, Yixuan Qiao, Guotong Xie

Anti-cancer drug discoveries have been serendipitous, we sought to present the Open Molecular Graph Learning Benchmark, named CandidateDrug4Cancer, a challenging and realistic benchmark dataset to facilitate scalable, robust, and reproducible graph machine learning research for anti-cancer drug discovery.

Drug Discovery Graph Learning

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

no code implementations9 Feb 2022 Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang

Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method.

Contrastive Learning Knowledge Distillation +1

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

7 code implementations24 Jan 2022 Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

Image Classification object-detection +5

TerViT: An Efficient Ternary Vision Transformer

no code implementations20 Jan 2022 Sheng Xu, Yanjing Li, Teli Ma, Bohan Zeng, Baochang Zhang, Peng Gao, Jinhu Lv

Vision transformers (ViTs) have demonstrated great potential in various visual tasks, but suffer from expensive computational and memory cost problems when deployed on resource-constrained devices.

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

2 code implementations12 Jan 2022 Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao

For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.

Representation Learning

RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images

no code implementations7 Jan 2022 Ziteng Cui, Yingying Zhu, Lin Gu, Guo-Jun Qi, Xiaoxiao Li, Peng Gao, Zenghui Zhang, Tatsuya Harada

Image restoration algorithms such as super resolution (SR) are indispensable pre-processing modules for object detection in degraded images.

Decoder Image Restoration +5

Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks

no code implementations9 Dec 2021 Jun Wang, Zhoujing Li, Yixuan Qiao, Qiming Qin, Peng Gao, Guotong Xie

This paper presents a novel superpixel based approach combining DNN and a modified segmentation method, to detect damaged buildings from VHR imagery.

Denoising Semantic Similarity +1

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations CVPR 2022 Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

3D Open-Vocabulary Instance Segmentation Few-Shot Learning +7

Container: Context Aggregation Networks

2 code implementations NeurIPS 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Inductive Bias Instance Segmentation +4

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation29 Nov 2021 Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Ranked #5 on Long-tail Learning on Places-LT (using extra training data)

Contrastive Learning Language Modelling +3

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation6 Nov 2021 Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

Asynchronous Collaborative Localization by Integrating Spatiotemporal Graph Learning with Model-Based Estimation

no code implementations5 Nov 2021 Peng Gao, Brian Reily, Rui Guo, HongSheng Lu, Qingzhao Zhu, Hao Zhang

In this paper, we introduce a novel approach that integrates uncertainty-aware spatiotemporal graph learning and model-based state estimation for a team of robots to collaboratively localize objects.

Graph Learning Object +1

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations13 Oct 2021 Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

2 code implementations9 Oct 2021 Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Prompt Engineering Representation Learning

Dense Contrastive Visual-Linguistic Pretraining

no code implementations24 Sep 2021 Lei Shi, Kai Shuang, Shijie Geng, Peng Gao, Zuohui Fu, Gerard de Melo, Yunpeng Chen, Sen Su

To overcome these issues, we propose unbiased Dense Contrastive Visual-Linguistic Pretraining (DCVLP), which replaces the region regression and classification with cross-modality region contrastive learning that requires no annotations.

Contrastive Learning Data Augmentation +2

ClueReader: Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension

no code implementations2 Jul 2021 Peng Gao, Feng Gao, Peng Wang, Jian-Cheng Ni, Fei Wang, Hamido Fujita

Multi-hop machine reading comprehension is a challenging task in natural language processing as it requires more reasoning ability across multiple documents.

Graph Attention Machine Reading Comprehension

Oriented Object Detection with Transformer

no code implementations6 Jun 2021 Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN.

Object object-detection +2

Scalable Transformers for Neural Machine Translation

no code implementations4 Jun 2021 Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation NMT +1

Container: Context Aggregation Network

4 code implementations2 Jun 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Image Classification Inductive Bias +5

Dual-stream Network for Visual Recognition

no code implementations NeurIPS 2021 Mingyuan Mao, Renrui Zhang, Honghui Zheng, Peng Gao, Teli Ma, Yan Peng, Errui Ding, Baochang Zhang, Shumin Han

Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images.

Image Classification Instance Segmentation +3

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML

2 code implementations5 May 2021 Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao

In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm.

Line Detection Table Recognition

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

1 code implementation Briefings in Bioinformatics 2021 Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song

In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level.

Drug Discovery Graph Neural Network

RomeBERT: Robust Training of Multi-Exit BERT

1 code implementation24 Jan 2021 Shijie Geng, Peng Gao, Zuohui Fu, Yongfeng Zhang

In this paper, we leverage gradient regularized self-distillation for RObust training of Multi-Exit BERT (RomeBERT), which can effectively solve the performance imbalance problem between early and late exits.

Natural Language Understanding

Sharp upper bounds for moments of quadratic Dirichlet $L$-functions

no code implementations21 Jan 2021 Peng Gao

We establish unconditional sharp upper bounds of the $k$-th moments of the family of quadratic Dirichlet $L$-functions at the central point for $0 \leq k \leq 2$.

Number Theory

A System for Automated Open-Source Threat Intelligence Gathering and Management

no code implementations19 Jan 2021 Peng Gao, Xiaoyuan Liu, Edward Choi, Bhavna Soman, Chinmaya Mishra, Kate Farris, Dawn Song

SecurityKG collects OSCTI reports from various sources, uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors, and constructs a security knowledge graph.


Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations19 Jan 2021 Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

Decoder object-detection +1

Learn molecular representations from large-scale unlabeled molecules for drug discovery

no code implementations21 Dec 2020 Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song

Here, we proposed a novel Molecular Pre-training Graph-based deep learning framework, named MPG, that leans molecular representations from large-scale unlabeled molecules.

Drug Discovery Graph Neural Network

Semi-supervised Active Learning for Instance Segmentation via Scoring Predictions

no code implementations9 Dec 2020 Jun Wang, Shaoguo Wen, Kaixing Chen, Jianghua Yu, Xin Zhou, Peng Gao, Changsheng Li, Guotong Xie

Active learning generally involves querying the most representative samples for human labeling, which has been widely studied in many fields such as image classification and object detection.

Active Learning Image Classification +5

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation18 Nov 2020 Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Clustering Object +2

Multi-view Sensor Fusion by Integrating Model-based Estimation and Graph Learning for Collaborative Object Localization

no code implementations16 Nov 2020 Peng Gao, Rui Guo, HongSheng Lu, Hao Zhang

Collaborative object localization aims to collaboratively estimate locations of objects observed from multiple views or perspectives, which is a critical ability for multi-agent systems such as connected vehicles.

Autonomous Driving Graph Learning +3

Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence

1 code implementation26 Oct 2020 Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, Zheng Qin, Fengyuan Xu, Prateek Mittal, Sanjeev R. Kulkarni, Dawn Song

Log-based cyber threat hunting has emerged as an important solution to counter sophisticated attacks.

A Predictive Autoscaler for Elastic Batch Jobs

no code implementations10 Oct 2020 Peng Gao

Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service.

Scheduling Time Series +1

Multi-Pass Transformer for Machine Translation

no code implementations23 Sep 2020 Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux

In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.

Machine Translation Neural Architecture Search +1

Reconstruction Regularized Deep Metric Learning for Multi-label Image Classification

no code implementations27 Jul 2020 Changsheng Li, Chong Liu, Lixin Duan, Peng Gao, Kai Zheng

In this paper, we present a novel deep metric learning method to tackle the multi-label image classification problem.

General Classification Metric Learning +1

Contrastive Visual-Linguistic Pretraining

no code implementations26 Jul 2020 Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su

We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning.

Contrastive Learning regression +2

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

no code implementations25 Jul 2020 Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang

At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains.

Contrastive Learning Domain Adaptation

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

no code implementations8 Jul 2020 Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.

Answer Generation Graph Representation Learning

Extreme Low-Light Imaging with Multi-granulation Cooperative Networks

no code implementations16 May 2020 Keqi Wang, Peng Gao, Steven Hoi, Qian Guo, Yuhua Qian

Low-light imaging is challenging since images may appear to be dark and noised due to low signal-to-noise ratio, complex image content, and the variety in shooting scenes in extreme low-light condition.

Character Matters: Video Story Understanding with Character-Aware Relations

no code implementations9 May 2020 Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo

Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots.

Question Answering

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

no code implementations3 Jan 2020 Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Question Answering Video Description +1

Learning Where to Focus for Efficient Video Object Detection

1 code implementation ECCV 2020 Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.

Object object-detection +1

Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks

no code implementations WS 2019 Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, Guotong Xie

We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention.

Common Sense Reasoning Machine Reading Comprehension +2

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

no code implementations27 Aug 2019 Peng Gao, Qiquan Zhang, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge.

Visual Tracking

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

no code implementations27 Aug 2019 Zhencai Hu, Peng Gao, Fei Wang

To solve the problem of dimensional explosion in the air combat, the proposed method is implemented through feature selection, trajectory sampling, function approximation and Bellman backup operation in the air combat simulation environment.

Decision Making feature selection

Multi-modality Latent Interaction Network for Visual Question Answering

no code implementations ICCV 2019 Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.

Language Modelling Question Answering +1

FPGA-based Binocular Image Feature Extraction and Matching System

no code implementations13 May 2019 Qi Ni, Fei Wang, Ziwei Zhao, Peng Gao

Image feature extraction and matching is a fundamental but computation intensive task in machine vision.

Image Compression Stereo Matching +1

Learning Cascaded Siamese Networks for High Performance Visual Tracking

no code implementations8 May 2019 Peng Gao, Yipeng Ma, Ruyue Yuan, Liyi Xiao, Fei Wang

In order to achieve high performance visual tracking in various negative scenarios, a novel cascaded Siamese network is proposed and developed based on two different deep learning networks: a matching subnetwork and a classification subnetwork.

Classification General Classification +2

Siamese Attentional Keypoint Network for High Performance Visual Tracking

no code implementations23 Apr 2019 Peng Gao, Ruyue Yuan, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

In this paper, we investigate the impacts of three main aspects of visual tracking, i. e., the backbone network, the attentional mechanism, and the detection component, and propose a Siamese Attentional Keypoint Network, dubbed SATIN, for efficient tracking and accurate localization.

Visual Tracking Vocal Bursts Intensity Prediction

Efficient Multi-level Correlating for Visual Tracking

no code implementations13 Oct 2018 Yipeng Ma, Chun Yuan, Peng Gao, Fei Wang

Correlation filter (CF) based tracking algorithms have demonstrated favorable performance recently.

Visual Tracking

FPGA-based Acceleration System for Visual Tracking

no code implementations12 Oct 2018 Ke Song, Chun Yuan, Peng Gao, Yunxu Sun

In order to improve the tracking speed and reduce the overall power consumption of visual tracking, this paper proposes a real-time visual tracking algorithm based on DSST(Discriminative Scale Space Tracking) approach.

Real-Time Visual Tracking

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations ECCV 2018 Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Question Answering Visual Question Answering

SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection

1 code implementation25 Jun 2018 Peng Gao, Xusheng Xiao, Ding Li, Zhichun Li, Kangkook Jee, Zhen-Yu Wu, Chung Hwan Kim, Sanjeev R. Kulkarni, Prateek Mittal

To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies.

Cryptography and Security Databases

High Performance Visual Tracking with Circular and Structural Operators

no code implementations23 Apr 2018 Peng Gao, Yipeng Ma, Ke Song, Chao Li, Fei Wang, Liyi Xiao, Yan Zhang

Based on the proposed circular and structural operators, a set of primal confidence score maps can be obtained by circular correlating feature maps with their corresponding structural correlation filters.

Computational Efficiency Visual Tracking +1

A Complementary Tracking Model with Multiple Features

no code implementations20 Apr 2018 Peng Gao, Yipeng Ma, Chao Li, Ke Song, Fei Wang, Liyi Xiao

Discriminative Correlation Filters based tracking algorithms exploiting conventional handcrafted features have achieved impressive results both in terms of accuracy and robustness.

Visual Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

no code implementations19 Apr 2018 Peng Gao, Yipeng Ma, Ke Song, Chao Li, Fei Wang, Liyi Xiao

To the best of our knowledge, we are the first to incorporate the advantages of DCF and SOSVM for TIR object tracking.

Object Thermal Infrared Object Tracking

A Novel Low-cost FPGA-based Real-time Object Tracking System

no code implementations16 Apr 2018 Peng Gao, Ruyue Yuan, Zhicong Lin, Linsheng Zhang, Yan Zhang

In current visual object tracking system, the CPU or GPU-based visual object tracking systems have high computational cost and consume a prohibitive amount of power.