Search Results for author: Jiahao Wang

Found 47 papers, 25 papers with code

Mamba-R: Vision Mamba ALSO Needs Registers

no code implementations23 May 2024 Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.

Semantic Segmentation

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

no code implementations13 May 2024 Chengyue Wu, Yixiao Ge, Qiushan Guo, Jiahao Wang, Zhixuan Liang, Zeyu Lu, Ying Shan, Ping Luo

Furthermore, we propose three automatic evaluation metrics, including code pass rate, text-match ratio, and GPT-4V overall rating, for a fine-grained assessment of the output code and rendered images.

Code Generation Descriptive

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance

no code implementations16 Apr 2024 Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang

Comprehensive experiments show that our method outperforms a variety of baselines with satisfactory character consistency, superior prompt conformity as well as high image quality.

Consistent Character Generation Denoising +1

Adapting LLaMA Decoder to Vision Transformer

1 code implementation10 Apr 2024 Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Taiqiang Wu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a causal mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Decoder +2

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

no code implementations3 Apr 2024 Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs).

Knowledge Distillation

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

1 code implementation14 Mar 2024 Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Moment Retrieval Temporal Action Localization +1

A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics

no code implementations26 Feb 2024 Jiahao Wang, Sikun Yang, Heinz Koeppl, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data.

Data Augmentation Time Series

Less is more: Ensemble Learning for Retinal Disease Recognition Under Limited Resources

no code implementations15 Feb 2024 Jiahao Wang, Hong Peng, Shengchao Chen, Sufen Ren

This approach establishes a robust model even when confronted with limited labeled data, eliminating the need for an extensive array of parameters, as required in learning from scratch.

Decision Making Ensemble Learning

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

1 code implementation15 Feb 2024 Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan

Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions.

Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition +8

A Survey on Data Selection for LLM Instruction Tuning

1 code implementation4 Feb 2024 Jiahao Wang, Bolin Zhang, Qianlong Du, Jiajun Zhang, Dianhui Chu

Instruction tuning is a vital step of training large language models (LLM), so how to enhance the effect of instruction tuning has received increased attention.

Instruction Following

LLaMA Pro: Progressive LLaMA with Block Expansion

1 code implementation4 Jan 2024 Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e. g., from LLaMA to CodeLLaMA.

Instruction Following Math

Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework

no code implementations3 Jan 2024 Shengchao Chen, Ting Shu, Huan Zhao, Jiahao Wang, Sufen Ren, Lina Yang

Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields.

Federated Learning

CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

1 code implementation3 Jan 2024 Yi Rong, Haoran Zhou, Lixin Yuan, Cheng Mei, Jiahao Wang, Tong Lu

Point cloud completion is an indispensable task for recovering complete point clouds due to incompleteness caused by occlusion, limited sensor resolution, etc.

Point Cloud Completion

Structure-Aware Sparse-View X-ray 3D Reconstruction

1 code implementation18 Nov 2023 Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang

In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction.

3D Reconstruction Low-Dose X-Ray Ct Reconstruction +1

Memory-and-Anticipation Transformer for Online Action Understanding

1 code implementation ICCV 2023 Jiahao Wang, Guo Chen, Yifei HUANG, LiMin Wang, Tong Lu

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

Action Understanding Online Action Detection

Generating Images with 3D Annotations Using Diffusion Models

no code implementations13 Jun 2023 Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically.

3D Pose Estimation Style Transfer

4D Millimeter-Wave Radar in Autonomous Driving: A Survey

no code implementations7 Jun 2023 Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

In an effort to bridge this gap and stimulate future research, this paper presents an exhaustive survey on the utilization of 4D mmWave radar in autonomous driving.

Autonomous Driving Point Cloud Generation

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation22 May 2023 Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Decoder Video Understanding

Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge

1 code implementation20 May 2023 Jinyuan Li, Han Li, Zhuo Pan, Di Sun, Jiahao Wang, Wenkun Zhang, Gang Pan

However, these methods either neglect the necessity of providing the model with external knowledge, or encounter issues of high redundancy in the retrieved knowledge.

 Ranked #1 on Multi-modal Named Entity Recognition on Twitter-2017 (using extra training data)

Multi-modal Named Entity Recognition named-entity-recognition +1

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations12 Apr 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

no code implementations24 Mar 2023 Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic.

Knowledge Distillation

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations CVPR 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Global Spectral Filter Memory Network for Video Object Segmentation

1 code implementation11 Oct 2022 Yong liu, Ran Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong Tang, Yujiu Yang

Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head).

Attribute Decoder +5

Towards Real-World Video Deblurring by Exploring Blur Formation Process

1 code implementation28 Aug 2022 Mingdeng Cao, Zhihang Zhong, Yanbo Fan, Jiahao Wang, Yong Zhang, Jue Wang, Yujiu Yang, Yinqiang Zheng

We believe the novel realistic synthesis pipeline and the corresponding RAW video dataset can help the community to easily construct customized blur datasets to improve real-world video deblurring performance largely, instead of laboriously collecting real data pairs.


Learning Adaptive Warping for Real-World Rolling Shutter Correction

1 code implementation CVPR 2022 Mingdeng Cao, Zhihang Zhong, Jiahao Wang, Yinqiang Zheng, Yujiu Yang

This paper proposes the first real-world rolling shutter (RS) correction dataset, BS-RSC, and a corresponding model to correct the RS frames in a distorted video.

Rolling Shutter Correction

Accelerating Neural Network Optimization Through an Automated Control Theory Lens

no code implementations CVPR 2022 Jiahao Wang, Baoyuan Wu, Rui Su, Mingdeng Cao, Shuwei Shi, Wanli Ouyang, Yujiu Yang

We conduct experiments both from a control theory lens through a phase locus verification and from a network training lens on several models, including CNNs, Transformers, MLPs, and on benchmark datasets.


SAGA: Stochastic Whole-Body Grasping with Contact

1 code implementation19 Dec 2021 Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu, Siyu Tang

Given an initial pose and the generated whole-body grasping pose as the start and end of the motion respectively, we design a novel contact-aware generative motion infilling module to generate a diverse set of grasp-oriented motions.


Adder Attention for Vision Transformer

4 code implementations NeurIPS 2021 Han Shu, Jiahao Wang, Hanting Chen, Lin Li, Yujiu Yang, Yunhe Wang

With the new operation, vision transformers constructed using additions can also provide powerful feature representations.

Will You Ever Become Popular? Learning to Predict Virality of Dance Clips

no code implementations6 Nov 2021 Jiahao Wang, Yunhong Wang, Nina Weng, Tianrui Chai, Annan Li, Faxi Zhang, Sansi Yu

Therefore, virality prediction from dance challenges is of great commercial value and has a wide range of applications, such as smart recommendation and popularity promotion.

Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

1 code implementation15 Aug 2021 Jiahao Wang, Yunhong Wang, Sheng Liu, Annan Li

Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited.

Action Understanding Fine-grained Action Recognition +1

Quantitatively Nonblocking Supervisory Control of Discrete-Event Systems

no code implementations2 Aug 2021 Renyuan Zhang, Jiahao Wang, Zenghui Wang, Kai Cai

Finally, combining with the algorithm of computing the supremal controllable sublanguage, we design algorithms to compute the maximally permissive solutions to the formulated (heterogeneously) quantitatively nonblocking supervisory control problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.