Search Results for author: Shiji Song

Found 66 papers, 37 papers with code

Vision Transformer with Deformable Attention

2 code implementations CVPR 2022 Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention e. g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests.

Image Classification Object Detection +1

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

1 code implementation4 Sep 2023 Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests.

Image Classification Instance Segmentation +2

Implicit Semantic Data Augmentation for Deep Networks

1 code implementation NeurIPS 2019 Yulin Wang, Xuran Pan, Shiji Song, Hong Zhang, Cheng Wu, Gao Huang

Our work is motivated by the intriguing property that deep networks are surprisingly good at linearizing features, such that certain directions in the deep feature space correspond to meaningful semantic transformations, e. g., adding sunglasses or changing backgrounds.

Image Augmentation

Regularizing Deep Networks with Semantic Data Augmentation

1 code implementation21 Jul 2020 Yulin Wang, Gao Huang, Shiji Song, Xuran Pan, Yitong Xia, Cheng Wu

The proposed method is inspired by the intriguing property that deep networks are effective in learning linearized features, i. e., certain directions in the deep feature space correspond to meaningful semantic transformations, e. g., changing the background or view angle of an object.

Data Augmentation

On the Integration of Self-Attention and Convolution

2 code implementations CVPR 2022 Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, Gao Huang

In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation.

Representation Learning

Agent Attention: On the Integration of Softmax and Linear Attention

2 code implementations14 Dec 2023 Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module.

Computational Efficiency Image Classification +4

FLatten Transformer: Vision Transformer using Focused Linear Attention

1 code implementation ICCV 2023 Dongchen Han, Xuran Pan, Yizeng Han, Shiji Song, Gao Huang

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks.

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

1 code implementation7 Dec 2023 Jiayi Guo, Xingqian Xu, Yifan Pu, Zanlin Ni, Chaofei Wang, Manushree Vasu, Shiji Song, Gao Huang, Humphrey Shi

Specifically, we introduce Step-wise Variation Regularization to enforce the proportion between the variations of an arbitrary input latent and that of the output image is a constant at any diffusion training step.

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

2 code implementations NeurIPS 2021 Yulin Wang, Rui Huang, Shiji Song, Zeyi Huang, Gao Huang

Inspired by this phenomenon, we propose a Dynamic Transformer to automatically configure a proper number of tokens for each input image.

Ranked #29 on Image Classification on CIFAR-100 (using extra training data)

Computational Efficiency Image Classification

Glance and Focus Networks for Dynamic Visual Recognition

1 code implementation9 Jan 2022 Gao Huang, Yulin Wang, Kangchen Lv, Haojun Jiang, Wenhui Huang, Pengfei Qi, Shiji Song

Spatial redundancy widely exists in visual recognition tasks, i. e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand.

Image Classification Video Recognition

3D Object Detection with Pointformer

1 code implementation CVPR 2021 Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, Gao Huang

In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively.

3D Object Detection Object +2

Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention

1 code implementation CVPR 2023 Xuran Pan, Tianzhu Ye, Zhuofan Xia, Shiji Song, Gao Huang

Self-attention mechanism has been a key factor in the recent progress of Vision Transformer (ViT), which enables adaptive feature extraction from global contexts.

feature selection Inductive Bias

Resolution Adaptive Networks for Efficient Inference

2 code implementations CVPR 2020 Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, Gao Huang

Adaptive inference is an effective mechanism to achieve a dynamic tradeoff between accuracy and computational cost in deep networks.

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

1 code implementation CVPR 2022 Haojun Jiang, Yuanze Lin, Dongchen Han, Shiji Song, Gao Huang

Our method leverages an off-the-shelf object detector to identify visual objects from unlabeled images, and then language queries for these objects are obtained in an unsupervised fashion with a pseudo-query generation module.

Language Modelling Natural Language Queries +1

Adaptive Focus for Efficient Video Recognition

1 code implementation ICCV 2021 Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, Gao Huang

In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency.

Computational Efficiency Video Recognition

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

1 code implementation26 Jan 2021 Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang

Due to the need to store the intermediate activations for back-propagation, end-to-end (E2E) training of deep networks usually suffers from high GPUs memory footprint.

ActiveNeRF: Learning where to See with Uncertainty Estimation

1 code implementation18 Sep 2022 Xuran Pan, Zihang Lai, Shiji Song, Gao Huang

In this paper, we present a novel learning framework, ActiveNeRF, aiming to model a 3D scene with a constrained input budget.

Active Learning Novel View Synthesis

Adaptive Rotated Convolution for Rotated Object Detection

1 code implementation ICCV 2023 Yifan Pu, Yiru Wang, Zhuofan Xia, Yizeng Han, Yulin Wang, Weihao Gan, Zidong Wang, Shiji Song, Gao Huang

In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images, and an efficient conditional computation mechanism is introduced to accommodate the large orientation variations of objects within an image.

Ranked #3 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +2

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

1 code implementation CVPR 2023 Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang

Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain.

Image Generation

Domain Adaptation via Prompt Learning

1 code implementation14 Feb 2022 Chunjiang Ge, Rui Huang, Mixue Xie, Zihang Lai, Shiji Song, Shuang Li, Gao Huang

Unsupervised domain adaption (UDA) aims to adapt models learned from a well-annotated source domain to a target domain, where only unlabeled samples are given.

Domain Adaptation

Cross-Modal Adapter for Text-Video Retrieval

1 code implementation17 Nov 2022 Haojun Jiang, Jianke Zhang, Rui Huang, Chunjiang Ge, Zanlin Ni, Jiwen Lu, Jie zhou, Shiji Song, Gao Huang

However, as pre-trained models are scaling up, fully fine-tuning them on text-video retrieval datasets has a high risk of overfitting.

Retrieval Video Retrieval

Dynamic Perceiver for Efficient Visual Recognition

1 code implementation ICCV 2023 Yizeng Han, Dongchen Han, Zeyu Liu, Yulin Wang, Xuran Pan, Yifan Pu, Chao Deng, Junlan Feng, Shiji Song, Gao Huang

Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.

Action Recognition Classification +4

Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL

2 code implementations NeurIPS 2023 Yang Yue, Rui Lu, Bingyi Kang, Shiji Song, Gao Huang

We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.

Attribute Offline RL

Latency-aware Spatial-wise Dynamic Networks

2 code implementations12 Oct 2022 Yizeng Han, Zhihang Yuan, Yifan Pu, Chenhao Xue, Shiji Song, Guangyu Sun, Gao Huang

The latency prediction model can efficiently estimate the inference latency of dynamic networks by simultaneously considering algorithms, scheduling strategies, and hardware properties.

Image Classification Instance Segmentation +4

Latency-aware Unified Dynamic Networks for Efficient Image Recognition

1 code implementation30 Aug 2023 Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Dynamic computation has emerged as a promising avenue to enhance the inference efficiency of deep networks.

Scheduling

Learning to Weight Samples for Dynamic Early-exiting Networks

1 code implementation17 Sep 2022 Yizeng Han, Yifan Pu, Zihang Lai, Chaofei Wang, Shiji Song, Junfen Cao, Wenhui Huang, Chao Deng, Gao Huang

Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.

Meta-Learning

Efficient Knowledge Distillation from Model Checkpoints

1 code implementation12 Oct 2022 Chaofei Wang, Qisen Yang, Rui Huang, Shiji Song, Gao Huang

Knowledge distillation is an effective approach to learn compact models (students) with the supervision of large and strong models (teachers).

Knowledge Distillation

Self-Supervised Discovering of Interpretable Features for Reinforcement Learning

1 code implementation16 Mar 2020 Wenjie Shi, Gao Huang, Shiji Song, Zhuoyuan Wang, Tingyu Lin, Cheng Wu

Deep reinforcement learning (RL) has recently led to many breakthroughs on a range of complex control tasks.

Atari Games Decision Making +2

Learning Specialized Activation Functions for Physics-informed Neural Networks

1 code implementation8 Aug 2023 Honghui Wang, Lu Lu, Shiji Song, Gao Huang

To avoid the inefficient manual selection and to alleviate the optimization difficulty of PINNs, we introduce adaptive activation functions to search for the optimal function when solving different problems.

Probabilistic Contrastive Learning for Long-Tailed Visual Recognition

1 code implementation11 Mar 2024 Chaoqun Du, Yulin Wang, Shiji Song, Gao Huang

To overcome this obstacle, we propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space, and samples contrastive pairs accordingly.

Long-tail Learning

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

1 code implementation NeurIPS 2019 Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations.

reinforcement-learning Reinforcement Learning (RL)

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

no code implementations7 Sep 2019 Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively.

Policy Gradient Methods Q-Learning

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

no code implementations7 Sep 2019 Wenjie Shi, Shiji Song, Cheng Wu

Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation.

reinforcement-learning Reinforcement Learning (RL)

A selected review on reinforcement learning based control for autonomous underwater vehicles

no code implementations27 Nov 2019 Ya-Chu Hsu, Hui Wu, Keyou You, Shiji Song

This paper provides a selected review on RL based control for AUVs with the focus on applications of RL to low-level control tasks for underwater regulation and tracking.

Robotics

Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification

no code implementations6 Mar 2020 Kaichen Zhou, Shiji Song, Gao Huang, Wu Cheng, Quan Zhou

Specifically, the proposed algorithm can be used to estimate the upper and lower bounds of the updated classifier's coefficient matrix with a low computational complexity related to the size of the updated dataset.

Incremental Learning L2 Regularization

Meta-Semi: A Meta-learning Approach for Semi-supervised Learning

no code implementations5 Jul 2020 Yulin Wang, Jiayi Guo, Shiji Song, Gao Huang

In this paper, we propose a novel meta-learning based SSL algorithm (Meta-Semi) that requires tuning only one additional hyper-parameter, compared with a standard supervised deep learning algorithm, to achieve competitive performance under various conditions of SSL.

Meta-Learning

Revisiting Locally Supervised Training of Deep Neural Networks

no code implementations ICLR 2021 Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang

As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm.

Robust Offline Reinforcement Learning from Low-Quality Data

no code implementations1 Jan 2021 Wenjie Shi, Tianchi Cai, Shiji Song, Lihong Gu, Jinjie Gu, Gao Huang

We theoretically show that AdaPT produces a tight upper bound on the distributional deviation between the learned policy and the behavior policy, and this upper bound is the minimum requirement to guarantee policy improvement at each iteration.

Continuous Control Offline RL +2

A Unified Framework for Convolution-based Graph Neural Networks

no code implementations1 Jan 2021 Xuran Pan, Shiji Song, Gao Huang

In this paper, we take a step forward to establish a unified framework for convolution-based graph neural networks, by formulating the basic graph convolution operation as an optimization problem in the graph Fourier space.

CAM-loss: Towards Learning Spatially Discriminative Feature Representations

no code implementations ICCV 2021 Chaofei Wang, Jiayu Xiao, Yizeng Han, Qisen Yang, Shiji Song, Gao Huang

The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification.

Few-Shot Learning Image Classification +2

Fine-Grained Few Shot Learning with Foreground Object Transformation

no code implementations13 Sep 2021 Chaofei Wang, Shiji Song, Qisen Yang, Xiang Li, Gao Huang

As a data augmentation method, FOT can be conveniently applied to any existing few shot learning algorithm and greatly improve its performance on FG-FSL tasks.

Data Augmentation Few-Shot Learning +2

Temporal-Spatial Causal Interpretations for Vision-Based Reinforcement Learning

no code implementations6 Dec 2021 Wenjie Shi, Gao Huang, Shiji Song, Cheng Wu

TSCI model builds on the formulation of temporal causality, which reflects the temporal causal relations between sequential observations and decisions of RL agent.

Causal Discovery Decision Making +2

Learn From the Past: Experience Ensemble Knowledge Distillation

no code implementations25 Feb 2022 Chaofei Wang, Shaowei Zhang, Shiji Song, Gao Huang

We save a moderate number of intermediate models from the training process of the teacher model uniformly, and then integrate the knowledge of these intermediate models by ensemble technique.

Knowledge Distillation Transfer Learning

The Neural-Prediction based Acceleration Algorithm of Column Generation for Graph-Based Set Covering Problems

no code implementations4 Jul 2022 Haofeng Yuan, Peng Jiang, Shiji Song

In this paper, we propose an improved column generation algorithm with neural prediction (CG-P) for solving graph-based set covering problems.

Combinatorial Optimization Scheduling

AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition

no code implementations27 Sep 2022 Yulin Wang, Yang Yue, Xinhong Xu, Ali Hassani, Victor Kulikov, Nikita Orlov, Shiji Song, Humphrey Shi, Gao Huang

Recent research has revealed that reducing the temporal and spatial redundancy are both effective approaches towards efficient video recognition, e. g., allocating the majority of computation to a task-relevant subset of frames or the most valuable image regions of each frame.

Video Recognition

Contrastive Language-Image Pre-Training with Knowledge Graphs

no code implementations17 Oct 2022 Xuran Pan, Tianzhu Ye, Dongchen Han, Shiji Song, Gao Huang

Recent years have witnessed the fast development of large-scale pre-training frameworks that can extract multi-modal representations in a unified form and achieve promising performances when transferred to downstream tasks.

Knowledge Graphs

Standoff Tracking Using DNN-Based MPC with Implementation on FPGA

no code implementations21 Dec 2022 Fei Dong, Xingchen Li, Keyou You, Shiji Song

This work studies the standoff tracking problem to drive an unmanned aerial vehicle (UAV) to slide on a desired circle over a moving target at a constant height.

Model Predictive Control Trajectory Planning +1

Joint Representation Learning for Text and 3D Point Cloud

no code implementations18 Jan 2023 Rui Huang, Xuran Pan, Henry Zheng, Haojun Jiang, Zhifeng Xie, Shiji Song, Gao Huang

During the pre-training stage, we establish the correspondence of images and point clouds based on the readily available RGB-D data and use contrastive learning to align the image and point cloud representations.

Contrastive Learning Instance Segmentation +4

Boosting Offline Reinforcement Learning with Action Preference Query

no code implementations6 Jun 2023 Qisen Yang, Shenzhi Wang, Matthieu Gaetan Lin, Shiji Song, Gao Huang

In particular, online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase.

Autonomous Driving D4RL +2

Computation-efficient Deep Learning for Computer Vision: A Survey

no code implementations27 Aug 2023 Yulin Wang, Yizeng Han, Chaofei Wang, Shiji Song, Qi Tian, Gao Huang

Over the past decade, deep learning models have exhibited considerable advancements, reaching or even exceeding human-level performance in a range of visual perception tasks.

Autonomous Vehicles Edge-computing +1

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

no code implementations4 Sep 2023 Qisen Yang, Shenzhi Wang, Qihang Zhang, Gao Huang, Shiji Song

Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the environment, yet usually suffers from the distributional shift problem.

Offline RL reinforcement-learning +1

GSVA: Generalized Segmentation via Multimodal Large Language Models

no code implementations15 Dec 2023 Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image.

Generalized Referring Expression Segmentation Referring Expression +1

A Reinforcement-Learning-Based Multiple-Column Selection Strategy for Column Generation

no code implementations21 Dec 2023 Haofeng Yuan, Lichang Fang, Shiji Song

Column generation (CG) is one of the most successful approaches for solving large-scale linear programming (LP) problems.

reinforcement-learning

LLM Agents for Psychology: A Study on Gamified Assessments

no code implementations19 Feb 2024 Qisen Yang, Zekun Wang, Honghui Chen, Shenzhi Wang, Yifan Pu, Xin Gao, Wenhao Huang, Shiji Song, Gao Huang

Psychological measurement is essential for mental health, self-understanding, and personal development.

Deep Reinforcement Learning for Traveling Purchaser Problems

no code implementations3 Apr 2024 Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Yuli Zhang

The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications.

Combinatorial Optimization Meta-Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.