Pre-training Entity Relation Encoder with Intra-span and Inter-span Information

no code implementations EMNLP 2020 Yijun Wang, Changzhi Sun, Yuanbin Wu, Junchi Yan, Peng Gao, Guotong Xie

In particular, a span encoder is trained to recover a random shuffling of tokens in a span, and a span pair encoder is trained to predict positive pairs that are from the same sentences and negative pairs that are from different sentences using contrastive loss.

Relation Extraction

Container: Context Aggregation Networks

1 code implementation NeurIPS 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Instance Segmentation Object Detection +2

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation29 Nov 2021 Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Contrastive Learning Language Modelling +2

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation6 Nov 2021 Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Fine-tuning Language Modelling +1

Asynchronous Collaborative Localization by Integrating Spatiotemporal Graph Learning with Model-Based Estimation

no code implementations5 Nov 2021 Peng Gao, Brian Reily, Rui Guo, HongSheng Lu, Qingzhao Zhu, Hao Zhang

In this paper, we introduce a novel approach that integrates uncertainty-aware spatiotemporal graph learning and model-based state estimation for a team of robots to collaboratively localize objects.

Graph Learning Object Localization

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

no code implementations13 Oct 2021 Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Region Proposal

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

1 code implementation9 Oct 2021 Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Fine-tuning Representation Learning

Dense Contrastive Visual-Linguistic Pretraining

no code implementations24 Sep 2021 Lei Shi, Kai Shuang, Shijie Geng, Peng Gao, Zuohui Fu, Gerard de Melo, Yunpeng Chen, Sen Su

To overcome these issues, we propose unbiased Dense Contrastive Visual-Linguistic Pretraining (DCVLP), which replaces the region regression and classification with cross-modality region contrastive learning that requires no annotations.

Contrastive Learning Data Augmentation +1

Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension

no code implementations2 Jul 2021 Feng Gao, Jian-Cheng Ni, Peng Gao, Zi-Li Zhou, Yan-Yan Li, Hamido Fujita

Multi-hop machine reading comprehension is a challenging task in natural language processing, which requires more reasoning ability and explainability.

Graph Attention Machine Reading Comprehension

Oriented Object Detection with Transformer

no code implementations6 Jun 2021 Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann

Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN.

Object Detection Oriented Object Detection

Scalable Transformers for Neural Machine Translation

no code implementations4 Jun 2021 Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation Translation

Container: Context Aggregation Network

2 code implementations2 Jun 2021 Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations.

Image Classification Instance Segmentation +3

Dual-stream Network for Visual Recognition

no code implementations NeurIPS 2021 Mingyuan Mao, Renrui Zhang, Honghui Zheng, Peng Gao, Teli Ma, Yan Peng, Errui Ding, Baochang Zhang, Shumin Han

Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images.

Image Classification Instance Segmentation +2

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML

1 code implementation5 May 2021 Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao

In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm.

Line Detection Table Recognition

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

1 code implementation Briefings in Bioinformatics 2021 Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song

In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level.

Drug Discovery

RomeBERT: Robust Training of Multi-Exit BERT

1 code implementation24 Jan 2021 Shijie Geng, Peng Gao, Zuohui Fu, Yongfeng Zhang

In this paper, we leverage gradient regularized self-distillation for RObust training of Multi-Exit BERT (RomeBERT), which can effectively solve the performance imbalance problem between early and late exits.

Language understanding Natural Language Understanding

Sharp upper bounds for moments of quadratic Dirichlet $L$-functions

no code implementations21 Jan 2021 Peng Gao

We establish unconditional sharp upper bounds of the $k$-th moments of the family of quadratic Dirichlet $L$-functions at the central point for $0 \leq k \leq 2$.

Number Theory

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations19 Jan 2021 Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

Object Detection

A System for Automated Open-Source Threat Intelligence Gathering and Management

no code implementations19 Jan 2021 Peng Gao, Xiaoyuan Liu, Edward Choi, Bhavna Soman, Chinmaya Mishra, Kate Farris, Dawn Song

SecurityKG collects OSCTI reports from various sources, uses a combination of AI and NLP techniques to extract high-fidelity knowledge about threat behaviors, and constructs a security knowledge graph.

Learn molecular representations from large-scale unlabeled molecules for drug discovery

no code implementations21 Dec 2020 Pengyong Li, Jun Wang, Yixuan Qiao, Hao Chen, Yihuan Yu, Xiaojun Yao, Peng Gao, Guotong Xie, Sen Song

Here, we proposed a novel Molecular Pre-training Graph-based deep learning framework, named MPG, that leans molecular representations from large-scale unlabeled molecules.

Drug Discovery

Semi-supervised Active Learning for Instance Segmentation via Scoring Predictions

no code implementations9 Dec 2020 Jun Wang, Shaoguo Wen, Kaixing Chen, Jianghua Yu, Xin Zhou, Peng Gao, Changsheng Li, Guotong Xie

Active learning generally involves querying the most representative samples for human labeling, which has been widely studied in many fields such as image classification and object detection.

Active Learning Image Classification +3

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation18 Nov 2020 Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Object Detection

Multi-view Sensor Fusion by Integrating Model-based Estimation and Graph Learning for Collaborative Object Localization

no code implementations16 Nov 2020 Peng Gao, Rui Guo, HongSheng Lu, Hao Zhang

Collaborative object localization aims to collaboratively estimate locations of objects observed from multiple views or perspectives, which is a critical ability for multi-agent systems such as connected vehicles.

Autonomous Driving Graph Learning +2

Enabling Efficient Cyber Threat Hunting With Cyber Threat Intelligence

no code implementations26 Oct 2020 Peng Gao, Fei Shao, Xiaoyuan Liu, Xusheng Xiao, Zheng Qin, Fengyuan Xu, Prateek Mittal, Sanjeev R. Kulkarni, Dawn Song

Log-based cyber threat hunting has emerged as an important solution to counter sophisticated attacks.

A Predictive Autoscaler for Elastic Batch Jobs

no code implementations10 Oct 2020 Peng Gao

Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service.

Time Series

Multi-Pass Transformer for Machine Translation

no code implementations23 Sep 2020 Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux

In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.

Machine Translation Neural Architecture Search +1

Reconstruction Regularized Deep Metric Learning for Multi-label Image Classification

no code implementations27 Jul 2020 Changsheng Li, Chong Liu, Lixin Duan, Peng Gao, Kai Zheng

In this paper, we present a novel deep metric learning method to tackle the multi-label image classification problem.

Classification General Classification +2

Contrastive Visual-Linguistic Pretraining

no code implementations26 Jul 2020 Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su

We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning.

Contrastive Learning Representation Learning +1

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

no code implementations25 Jul 2020 Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang

At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains.

Contrastive Learning Domain Adaptation

Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers

no code implementations8 Jul 2020 Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian

Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.

Graph Representation Learning

Extreme Low-Light Imaging with Multi-granulation Cooperative Networks

no code implementations16 May 2020 Keqi Wang, Peng Gao, Steven Hoi, Qian Guo, Yuhua Qian

Low-light imaging is challenging since images may appear to be dark and noised due to low signal-to-noise ratio, complex image content, and the variety in shooting scenes in extreme low-light condition.

Character Matters: Video Story Understanding with Character-Aware Relations

no code implementations9 May 2020 Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo

Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots.

Question Answering

Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

no code implementations3 Jan 2020 Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.

Question Answering Video Description +1

Learning Where to Focus for Efficient Video Object Detection

1 code implementation ECCV 2020 Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan

Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.

Optical Flow Estimation Video Object Detection

Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks

no code implementations WS 2019 Xiepeng Li, Zhexi Zhang, Wei Zhu, Zheng Li, Yuan Ni, Peng Gao, Junchi Yan, Guotong Xie

We have experimented both (a) improving the fine-tuning of pre-trained language models on a task with a small dataset size, by leveraging datasets of similar tasks; and (b) incorporating the distributional representations of a KG onto the representations of pre-trained language models, via simply concatenation or multi-head attention.

Common Sense Reasoning Fine-tuning +2

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

no code implementations27 Aug 2019 Peng Gao, Qiquan Zhang, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge.

Fine-tuning Visual Tracking

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

no code implementations27 Aug 2019 Zhencai Hu, Peng Gao, Fei Wang

To solve the problem of dimensional explosion in the air combat, the proposed method is implemented through feature selection, trajectory sampling, function approximation and Bellman backup operation in the air combat simulation environment.

Decision Making Feature Selection

Multi-modality Latent Interaction Network for Visual Question Answering

no code implementations ICCV 2019 Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.

Language Modelling Question Answering +1

FPGA-based Binocular Image Feature Extraction and Matching System

no code implementations13 May 2019 Qi Ni, Fei Wang, Ziwei Zhao, Peng Gao

Image feature extraction and matching is a fundamental but computation intensive task in machine vision.

Image Compression Stereo Matching +1

Learning Cascaded Siamese Networks for High Performance Visual Tracking

no code implementations8 May 2019 Peng Gao, Yipeng Ma, Ruyue Yuan, Liyi Xiao, Fei Wang

In order to achieve high performance visual tracking in various negative scenarios, a novel cascaded Siamese network is proposed and developed based on two different deep learning networks: a matching subnetwork and a classification subnetwork.

Classification General Classification +1

Siamese Attentional Keypoint Network for High Performance Visual Tracking

no code implementations23 Apr 2019 Peng Gao, Ruyue Yuan, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

In this paper, we investigate the impacts of three main aspects of visual tracking, i. e., the backbone network, the attentional mechanism, and the detection component, and propose a Siamese Attentional Keypoint Network, dubbed SATIN, for efficient tracking and accurate localization.

Visual Tracking

Efficient Multi-level Correlating for Visual Tracking

no code implementations13 Oct 2018 Yipeng Ma, Chun Yuan, Peng Gao, Fei Wang

Correlation filter (CF) based tracking algorithms have demonstrated favorable performance recently.

Visual Tracking

FPGA-based Acceleration System for Visual Tracking

no code implementations12 Oct 2018 Ke Song, Chun Yuan, Peng Gao, Yunxu Sun

In order to improve the tracking speed and reduce the overall power consumption of visual tracking, this paper proposes a real-time visual tracking algorithm based on DSST(Discriminative Scale Space Tracking) approach.

Real-Time Visual Tracking

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations ECCV 2018 Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Question Answering Visual Question Answering

SAQL: A Stream-based Query System for Real-Time Abnormal System Behavior Detection

1 code implementation25 Jun 2018 Peng Gao, Xusheng Xiao, Ding Li, Zhichun Li, Kangkook Jee, Zhen-Yu Wu, Chung Hwan Kim, Sanjeev R. Kulkarni, Prateek Mittal

To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies.

Cryptography and Security Databases

High Performance Visual Tracking with Circular and Structural Operators

no code implementations23 Apr 2018 Peng Gao, Yipeng Ma, Ke Song, Chao Li, Fei Wang, Liyi Xiao, Yan Zhang

Based on the proposed circular and structural operators, a set of primal confidence score maps can be obtained by circular correlating feature maps with their corresponding structural correlation filters.

Visual Tracking

A Complementary Tracking Model with Multiple Features

no code implementations20 Apr 2018 Peng Gao, Yipeng Ma, Chao Li, Ke Song, Fei Wang, Liyi Xiao

Discriminative Correlation Filters based tracking algorithms exploiting conventional handcrafted features have achieved impressive results both in terms of accuracy and robustness.

Visual Tracking

Large Margin Structured Convolution Operator for Thermal Infrared Object Tracking

no code implementations19 Apr 2018 Peng Gao, Yipeng Ma, Ke Song, Chao Li, Fei Wang, Liyi Xiao

To the best of our knowledge, we are the first to incorporate the advantages of DCF and SOSVM for TIR object tracking.

Thermal Infrared Object Tracking

A Novel Parallel Ray-Casting Algorithm

no code implementations16 Apr 2018 Yan Zhang, Peng Gao, Xiao-Qing Li

The Ray-Casting algorithm is an important method for fast real-time surface display from 3D medical images.

A Novel Low-cost FPGA-based Real-time Object Tracking System

no code implementations16 Apr 2018 Peng Gao, Ruyue Yuan, Zhicong Lin, Linsheng Zhang, Yan Zhang

In current visual object tracking system, the CPU or GPU-based visual object tracking systems have high computational cost and consume a prohibitive amount of power.

Visual Object Tracking

SCOPE: Scalable Composite Optimization for Learning on Spark

1 code implementation30 Jan 2016 Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li

Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods.

Stochastic Optimization

