Search Results for author: Yansong Tang

Found 47 papers, 28 papers with code

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

7 code implementations28 Jul 2022 Yongming Rao, Wenliang Zhao, Yansong Tang, Jie zhou, Ser-Nam Lim, Jiwen Lu

In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework.

Image Classification Object Detection +2

OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields

1 code implementation14 Dec 2023 Chubin Zhang, Juncheng Yan, Yi Wei, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu

Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.

Autonomous Driving Depth Estimation +1

Segment and Caption Anything

1 code implementation1 Dec 2023 Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

object-detection Object Detection +1

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

1 code implementation ICCV 2023 Ronghui Li, Junfan Zhao, Yachao Zhang, Mingyang Su, Zeping Ren, Han Zhang, Yansong Tang, Xiu Li

To address these problems, we propose FineDance, which contains 14. 6 hours of music-dance paired data, with fine-grained hand motions, fine-grained genres (22 dance genres), and accurate posture.

Motion Synthesis Retrieval

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

1 code implementation NeurIPS 2023 Yinan Liang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

Due to the high price and heavy energy consumption of GPUs, deploying deep models on IoT devices such as microcontrollers makes significant contributions for ecological AI.

Image Classification

Global Spectral Filter Memory Network for Video Object Segmentation

1 code implementation11 Oct 2022 Yong liu, Ran Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong Tang, Yujiu Yang

Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head).

Attribute Object +4

ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer

2 code implementations21 Mar 2022 Rui Yang, Hailong Ma, Jie Wu, Yansong Tang, Xuefeng Xiao, Min Zheng, Xiu Li

The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions.

Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer

1 code implementation ICCV 2023 Guangyi Chen, Xiao Liu, Guangrun Wang, Kun Zhang, Philip H. S. Torr, Xiao-Ping Zhang, Yansong Tang

To bridge these gaps, in this paper, we propose Tem-Adapter, which enables the learning of temporal dynamics and complex semantics by a visual Temporal Aligner and a textual Semantic Aligner.

Question Answering Video Question Answering

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

1 code implementation NeurIPS 2023 Zhuoyan Luo, Yicheng Xiao, Yong liu, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang

To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment.

Ranked #2 on Referring Expression Segmentation on A2D Sentences (using extra training data)

Object Referring Expression Segmentation +4

Self-similarity-based super-resolution of photoacoustic angiography from hand-drawn doodles

1 code implementation2 May 2023 Yuanzheng Ma, Wangting Zhou, Rui Ma, Sihua Yang, Yansong Tang, Xun Guan

To address this challenge, we propose a novel approach that employs a super-resolution PAA method trained with forged PAA images.

Image Generation Super-Resolution +1

1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation

1 code implementation1 Jan 2024 Zhuoyan Luo, Yicheng Xiao, Yong liu, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang

The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance.

Object Referring Video Object Segmentation +3

Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition

1 code implementation17 Jul 2022 Yansong Tang, Xingyu Liu, Xumin Yu, Danyang Zhang, Jiwen Lu, Jie zhou

Different from the conventional adversarial learning-based approaches for UDA, we utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.

Action Recognition Self-Supervised Learning +2

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

1 code implementation5 Mar 2024 JianJian Cao, Peng Ye, Shengze Li, Chong Yu, Yansong Tang, Jiwen Lu, Tao Chen

To this end, we propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs.

Semantic-Aware Auto-Encoders for Self-Supervised Representation Learning

1 code implementation CVPR 2022 Guangrun Wang, Yansong Tang, Liang Lin, Philip H.S. Torr

Inspired by perceptual learning that could use cross-view learning to perceive concepts and semantics, we propose a novel AE that could learn semantic-aware representation via cross-view image reconstruction.

Image Reconstruction Representation Learning +1

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

1 code implementation23 Mar 2024 Hancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang

Recent Vision Transformer Compression (VTC) works mainly follow a two-stage scheme, where the importance score of each model unit is first evaluated or preset in each submodule, followed by the sparsity score evaluation according to the target sparsity constraint.

Dimensionality Reduction

Language-free Compositional Action Generation via Decoupling Refinement

1 code implementation7 Jul 2023 Xiao Liu, Guangyi Chen, Yansong Tang, Guangrun Wang, Xiao-Ping Zhang, Ser-Nam Lim

Composing simple elements into complex concepts is crucial yet challenging, especially for 3D action generation.

Action Generation

Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition

no code implementations CVPR 2018 Yansong Tang, Yi Tian, Jiwen Lu, Peiyang Li, Jie zhou

In this paper, we propose a deep progressive reinforcement learning (DPRL) method for action recognition in skeleton-based videos, which aims to distil the most informative frames and discard ambiguous frames in sequences for recognizing actions.

Action Recognition reinforcement-learning +3

COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis

no code implementations CVPR 2019 Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie zhou

There are substantial instructional videos on the Internet, which enables us to acquire knowledge for completing various tasks.

Action Detection

Comprehensive Instructional Video Analysis: The COIN Dataset and Performance Evaluation

no code implementations20 Mar 2020 Yansong Tang, Jiwen Lu, Jie zhou

We believe the introduction of the COIN dataset will promote the future in-depth research on instructional video analysis for the community.

Action Detection

Unsupervised Embedding Learning from Uncertainty Momentum Modeling

no code implementations19 Jul 2021 Jiahuan Zhou, Yansong Tang, Bing Su, Ying Wu

We justify that the performance limitation is caused by the gradient vanishing on these sample outliers.

FLAG3D: A 3D Fitness Activity Dataset with Language Instruction

1 code implementation CVPR 2023 Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming Rao, Jiwen Lu, Jie zhou, Xiu Li

With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision.

Action Generation Action Recognition +2

Global Knowledge Calibration for Fast Open-Vocabulary Segmentation

no code implementations ICCV 2023 Kunyang Han, Yong liu, Jun Hao Liew, Henghui Ding, Yunchao Wei, Jiajun Liu, Yitong Wang, Yansong Tang, Yujiu Yang, Jiashi Feng, Yao Zhao

Recent advancements in pre-trained vision-language models, such as CLIP, have enabled the segmentation of arbitrary concepts solely from textual inputs, a process commonly referred to as open-vocabulary semantic segmentation (OVS).

Knowledge Distillation Open Vocabulary Semantic Segmentation +4

Towards Accurate Data-free Quantization for Diffusion Models

no code implementations30 May 2023 Changyuan Wang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie zhou, Jiwen Lu

On the contrary, we design group-wise quantization functions for activation discretization in different timesteps and sample the optimal timestep for informative calibration image generation, so that our quantized diffusion model can reduce the discretization errors with negligible computational overhead.

Data Free Quantization Image Generation

Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution

no code implementations3 Jun 2023 Yiji Cheng, Fei Yin, Xiaoke Huang, Xintong Yu, Jiaxiang Liu, Shikun Feng, Yujiu Yang, Yansong Tang

These elaborated designs enable our model to generate portraits with robust multi-view semantic consistency, eliminating the need for optimization-based methods.

Text to 3D

Skip-Plan: Procedure Planning in Instructional Videos via Condensed Action Space Learning

1 code implementation ICCV 2023 Zhiheng Li, Wenjia Geng, Muheng Li, Lei Chen, Yansong Tang, Jiwen Lu, Jie zhou

By this means, our model explores all sorts of reliable sub-relations within an action sequence in the condensed action space.

Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search

no code implementations8 Nov 2023 Siao Tang, Xin Wang, Hong Chen, Chaoyu Guan, Yansong Tang, Wenwu Zhu

When retraining the searched architecture, we adopt a dynamic joint loss to maintain the consistency between supernet training and subnet retraining, which also provides informative objectives for each block and shortens the paths of gradient propagation.

Neural Architecture Search

Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models

no code implementations10 Nov 2023 Siao Tang, Xin Wang, Hong Chen, Chaoyu Guan, Zewen Wu, Yansong Tang, Wenwu Zhu

In this paper, we propose a novel post-training quantization method PCR (Progressive Calibration and Relaxing) for text-to-image diffusion models, which consists of a progressive calibration strategy that considers the accumulated quantization error across timesteps, and an activation relaxing strategy that improves the performance with negligible cost.

Quantization

Fine-tuning vision foundation model for crack segmentation in civil infrastructures

no code implementations7 Dec 2023 Kang Ge, Chen Wang, Yutao Guo, Yansong Tang, Zhenzhong Hu

Two parameter-efficient fine-tuning methods, adapter and low-rank adaptation, are adopted to fine-tune the foundation model in semantic segmentation: the Segment Anything Model (SAM).

Crack Segmentation Segmentation

ThinkBot: Embodied Instruction Following with Thought Chain Reasoning

no code implementations12 Dec 2023 Guanxing Lu, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang

Embodied Instruction Following (EIF) requires agents to complete human instruction by interacting objects in complicated surrounding environments.

Instruction Following

Plan, Posture and Go: Towards Open-World Text-to-Motion Generation

no code implementations22 Dec 2023 Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin Tong

Some works use the CLIP model to align the motion space and the text space, aiming to enable motion generation from natural language motion descriptions.

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

no code implementations13 Mar 2024 Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang

Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction.

Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution

no code implementations16 Mar 2024 Zhiheng Li, Muheng Li, Jixuan Fan, Lei Chen, Yansong Tang, Jie zhou, Jiwen Lu

Scale arbitrary super-resolution based on implicit image function gains increasing popularity since it can better represent the visual world in a continuous manner.

Super-Resolution

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

no code implementations28 Mar 2024 BoWen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo

To address the problem, we introduce GaussianCube, a structured GS representation that is both powerful and efficient for generative modeling.

Cannot find the paper you are looking for? You can Submit a new open access paper.