Search Results for author: Jiahao Wang

Found 81 papers, 38 papers with code

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

no code implementations31 Mar 2025 Shengqiong Wu, Weicai Ye, Jiahao Wang, Quande Liu, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Shuicheng Yan, Hao Fei, Tat-Seng Chua

To address the bottleneck of accurate user intent interpretation within the current video generation community, we present Any2Caption, a novel framework for controllable video generation under any condition.

Video Generation

Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark

1 code implementation10 Mar 2025 Jiahao Wang, Xiangyu Cao, Jiaru Zhong, Yuner Zhang, Haibao Yu, Lei He, Shaobing Xu

Despite significant advancements, autonomous driving systems continue to struggle with occluded objects and long-range detection due to the inherent limitations of single-perspective sensing.

Autonomous Driving Benchmarking

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

no code implementations9 Mar 2025 Xirui Hu, Jiahao Wang, Hao Chen, Weizhan Zhang, Benqi Wang, Yikun Li, Haishun Nan

We present DynamicID, a tuning-free framework supported by a dual-stage training paradigm that inherently facilitates both single-ID and multi-ID personalized generation with high fidelity and flexible facial editability.

Contrastive Learning Facial Editing +1

FuzzyLight: A Robust Two-Stage Fuzzy Approach for Traffic Signal Control Works in Real Cities

no code implementations27 Jan 2025 Mingyuan Li, Jiahao Wang, Bo Du, Jun Shen, Qiang Wu

FuzzyLight offers several key contributions: (1) It employs fuzzy logic and compressed sensing to address sensor noise and enhances the efficiency of TSP decisions.

compressed sensing Reinforcement Learning (RL) +1

LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation

no code implementations22 Jan 2025 Jiahao Wang, Ning Kang, Lewei Yao, Mengzhao Chen, Chengyue Wu, Songyang Zhang, Shuchen Xue, Yong liu, Taiqiang Wu, Xihui Liu, Kaipeng Zhang, Shifeng Zhang, Wenqi Shao, Zhenguo Li, Ping Luo

(3) Hybrid knowledge distillation objective: using a pre-trained diffusion Transformer to help the training of the student linear Transformer, supervising not only the predicted noise but also the variance of the reverse diffusion process.

Knowledge Distillation Mamba +1

GenEx: Generating an Explorable World

no code implementations12 Dec 2024 Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen

In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments.

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

no code implementations12 Dec 2024 Chenyu Yang, Xuan Dong, Xizhou Zhu, Weijie Su, Jiahao Wang, Hao Tian, Zhe Chen, Wenhai Wang, Lewei Lu, Jifeng Dai

To this end, we extend each image into a "static" video and introduce a unified token compression strategy called Progressive Visual Token Compression (PVC), where the tokens of each frame are progressively encoded and adaptively compressed to supplement the information not extracted from previous frames.

Video Understanding

VP-MEL: Visual Prompts Guided Multimodal Entity Linking

no code implementations9 Dec 2024 Hongze Mi, Jinyuan Li, Xuying Zhang, Haoran Cheng, Jiahao Wang, Di Sun, Gang Pan

Multimodal entity linking (MEL), a task aimed at linking mentions within multimodal contexts to their corresponding entities in a knowledge base (KB), has attracted much attention due to its wide applications in recent years.

Entity Linking Information Retrieval +1

Towards Precise Scaling Laws for Video Diffusion Transformers

no code implementations25 Nov 2024 Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke Lin, Jiarong Ou, Rui Chen, Victor Shea-Jay Huang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang, Kun Gai

Achieving optimal performance of video diffusion transformers within given data and compute budget is crucial due to their high training costs.

TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models

1 code implementation24 Nov 2024 Jiahao Wang, Mingyue Cheng, Qingyang Mao, Qi Liu, Feiyang Xu, Xin Li, Enhong Chen

Despite their effectiveness, we reveal that these methods conceal three inherent bottlenecks: (1) they struggle to encode temporal and channel-specific information in a lossless manner, both of which are critical components of multivariate time series; (2) it is much difficult to align the learned representation space with the semantic space of the LLMs; (3) they require task-specific retraining, which is both computationally expensive and labor-intensive.

Problem Decomposition Time Series +2

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

no code implementations20 Nov 2024 Zhicong Li, Jiahao Wang, Zhishu Jiang, Hangyu Mao, Zhongxia Chen, Jiazhen Du, Yuanxing Zhang, Fuzheng Zhang, Di Zhang, Yong liu

In this paper, we introduce DMQR-RAG, a Diverse Multi-Query Rewriting framework designed to improve the performance of both document retrieval and final responses in RAG.

RAG Retrieval +1

The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods

no code implementations15 Nov 2024 Yifu Tao, Miguel Ángel Muñoz-Bañón, Lintong Zhang, Jiahao Wang, Lanke Frank Tarimo Fu, Maurice Fallon

Our evaluation demonstrates a key limitation of state-of-the-art radiance field methods: we show that they tend to overfit to the training poses/images and do not generalise well to out-of-sequence poses.

3D Reconstruction Benchmarking +2

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

no code implementations11 Nov 2024 Runming Yang, Taiqiang Wu, Jiahao Wang, Pengfei Hu, Ngai Wong, Yujiu Yang

Inspired by this observation, we explore the strategy that combines LoRA and KD to enhance the efficiency of knowledge transfer.

Knowledge Distillation Language Modeling +3

Research on gesture recognition method based on SEDCNN-SVM

no code implementations24 Oct 2024 Mingjin Zhang, Jiahao Wang, Jianming Wang, Qi Wang

The DCNN can automatically extract and learn the feature information of sEMG through the convolution operation of the convolutional layer, so that it can capture the complex and high-level features in the data.

Classification Gesture Recognition

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

no code implementations24 Oct 2024 Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong liu, Feng Tian, Guang Dai, Jingdong Wang, Qianying Wang

To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing.

Transit Pulse: Utilizing Social Media as a Source for Customer Feedback and Information Extraction with Large Language Model

no code implementations19 Oct 2024 Jiahao Wang, Amer Shalaby

Users of the transit system flood social networks daily with messages that contain valuable insights crucial for improving service quality.

Language Modeling Language Modelling +4

Leveraging Large Language Models for Enhancing Public Transit Services

no code implementations18 Oct 2024 Jiahao Wang, Amer Shalaby

With the help of these three LLM transit applications, transit system media personnel can provide system updates more efficiently, and customers can access travel information and policy answers in a more user-friendly manner.

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

no code implementations10 Oct 2024 Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang

As visual generation technologies continue to advance, the scale of video datasets has expanded rapidly, and the quality of these datasets is critical to the performance of video generation models.

Video Alignment Video Generation

PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization

1 code implementation7 Oct 2024 Mengzhao Chen, Yi Liu, Jiahao Wang, Yi Bin, Wenqi Shao, Ping Luo

In this work, we propose PrefixQuant, a novel quantization method that achieves state-of-the-art performance across various precision levels (W4A4KV4 and W4A8KV4) and granularities (dynamic and static quantization) by effectively isolating token-wise outliers.

Common Sense Reasoning Quantization

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

no code implementations29 Sep 2024 Haonan Lin, Wenbin An, Jiahao Wang, Yan Chen, Feng Tian, Mengmeng Wang, Guang Dai, Qianying Wang, Jingdong Wang

Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD).

SpotActor: Training-Free Layout-Controlled Consistent Image Generation

no code implementations7 Sep 2024 Jiahao Wang, Caixia Yan, Weizhan Zhang, Haonan Lin, Mengmeng Wang, Guang Dai, Tieliang Gong, Hao Sun, Jingdong Wang

For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts.

Image Generation object-detection +1

SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs

no code implementations21 Aug 2024 Yuanyang Yin, Yaqi Zhao, YaJie Zhang, Ke Lin, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang

Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities, typically comprising a Vision Encoder, an Adapter, and a Large Language Model (LLM).

Contrastive Learning Language Modeling +3

HBot: A Chatbot for Healthcare Applications in Traditional Chinese Medicine Based on Human Body 3D Visualization

no code implementations1 Aug 2024 Bolin Zhang, Zhiwei Yi, Jiahao Wang, Dianbo Sui, Zhiying Tu, Dianhui Chu

However, concepts such as acupuncture points (acupoints) and meridians involved in TCM always appear in the consultation, which cannot be displayed intuitively.

Chatbot

Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

no code implementations27 Jul 2024 Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

Moreover, we also construct a large-scale and high-quality dataset specialized for the inspection task.

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

1 code implementation10 Jul 2024 Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Ping Luo

To the best of our knowledge, Block-AP is the first method to enable direct training of all parameters in a block-wise manner, reducing accuracy loss in low-bit scenarios by enhancing the solution space during optimization.

Quantization

Fast and Continual Knowledge Graph Embedding via Incremental LoRA

1 code implementation8 Jul 2024 Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

To address this issue, we propose a fast CKGE framework (\model), incorporating an incremental low-rank adapter (\mec) mechanism to efficiently acquire new knowledge while preserving old knowledge.

Knowledge Graph Embedding Knowledge Graphs +1

Mixture-of-Subspaces in Low-Rank Adaptation

1 code implementation16 Jun 2024 Taiqiang Wu, Jiahao Wang, Zhe Zhao, Ngai Wong

In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models.

Common Sense Reasoning Question Answering +3

LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model

no code implementations29 May 2024 Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan

In this paper, we propose a Language Collaboration and Glyph Perception Model, termed LOGO, an innovative framework designed to enhance the performance of conventional text spotters.

Position Text Spotting

Mamba-R: Vision Mamba ALSO Needs Registers

1 code implementation23 May 2024 Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.

Mamba Semantic Segmentation

Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

no code implementations13 May 2024 Chengyue Wu, Yixiao Ge, Qiushan Guo, Jiahao Wang, Zhixuan Liang, Zeyu Lu, Ying Shan, Ping Luo

Furthermore, we propose three automatic evaluation metrics, including code pass rate, text-match ratio, and GPT-4V overall rating, for a fine-grained assessment of the output code and rendered images.

Code Generation Descriptive

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance

no code implementations16 Apr 2024 Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun

To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance.

Consistent Character Generation Denoising +1

Adapting LLaMA Decoder to Vision Transformer

1 code implementation10 Apr 2024 Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Taiqiang Wu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a causal mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Decoder +2

Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

1 code implementation3 Apr 2024 Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong

Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs).

Diversity Knowledge Distillation

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

1 code implementation14 Mar 2024 Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Mamba Moment Retrieval +2

A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics

no code implementations26 Feb 2024 Jiahao Wang, Sikun Yang, Heinz Koeppl, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data.

Data Augmentation Time Series

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

2 code implementations15 Feb 2024 Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan

Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions.

Grounded Multimodal Named Entity Recognition Multi-modal Named Entity Recognition +8

Less is more: Ensemble Learning for Retinal Disease Recognition Under Limited Resources

no code implementations15 Feb 2024 Jiahao Wang, Hong Peng, Shengchao Chen, Sufen Ren

This approach establishes a robust model even when confronted with limited labeled data, eliminating the need for an extensive array of parameters, as required in learning from scratch.

Decision Making Ensemble Learning

A Survey on Data Selection for LLM Instruction Tuning

1 code implementation4 Feb 2024 Jiahao Wang, Bolin Zhang, Qianlong Du, Jiajun Zhang, Dianhui Chu

Instruction tuning is a vital step of training large language models (LLM), so how to enhance the effect of instruction tuning has received increased attention.

Instruction Following Survey

LLaMA Pro: Progressive LLaMA with Block Expansion

1 code implementation4 Jan 2024 Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng, Ying Shan, Ping Luo

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e. g., from LLaMA to CodeLLaMA.

Instruction Following Math

CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

1 code implementation3 Jan 2024 Yi Rong, Haoran Zhou, Lixin Yuan, Cheng Mei, Jiahao Wang, Tong Lu

Point cloud completion is an indispensable task for recovering complete point clouds due to incompleteness caused by occlusion, limited sensor resolution, etc.

Point Cloud Completion

Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework

no code implementations3 Jan 2024 Shengchao Chen, Ting Shu, Huan Zhao, Jiahao Wang, Sufen Ren, Lina Yang

Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields.

Federated Learning

RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation

1 code implementation CVPR 2024 Yi Rong, Haoran Zhou, Kang Xia, Cheng Mei, Jiahao Wang, Tong Lu

Moreover we propose a novel paradigm namely Kernel-to-Displacement generation for point generation where point cloud upsampling is reformulated as the deformation of kernel points.

point cloud upsampling

Structure-Aware Sparse-View X-ray 3D Reconstruction

2 code implementations CVPR 2024 Yuanhao Cai, Jiahao Wang, Alan Yuille, Zongwei Zhou, Angtian Wang

In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction.

3D Reconstruction CT Reconstruction +3

Memory-and-Anticipation Transformer for Online Action Understanding

1 code implementation ICCV 2023 Jiahao Wang, Guo Chen, Yifei HUANG, LiMin Wang, Tong Lu

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

Action Understanding Online Action Detection

Generating Images with 3D Annotations Using Diffusion Models

no code implementations13 Jun 2023 Wufei Ma, Qihao Liu, Jiahao Wang, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Guofeng Zhang, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liu, Alan Yuille

With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically.

3D geometry 3D Pose Estimation +1

4D Millimeter-Wave Radar in Autonomous Driving: A Survey

no code implementations7 Jun 2023 Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

In an effort to bridge this gap and stimulate future research, this paper presents an exhaustive survey on the utilization of 4D mmWave radar in autonomous driving.

Autonomous Driving Point Cloud Generation +1

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation22 May 2023 Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Decoder Video Understanding

Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge

1 code implementation20 May 2023 Jinyuan Li, Han Li, Zhuo Pan, Di Sun, Jiahao Wang, Wenkun Zhang, Gang Pan

However, these methods either neglect the necessity of providing the model with external knowledge, or encounter issues of high redundancy in the retrieved knowledge.

 Ranked #1 on Multi-modal Named Entity Recognition on Twitter-2017 (using extra training data)

Multi-modal Named Entity Recognition named-entity-recognition +1

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations12 Apr 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs

no code implementations24 Mar 2023 Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang

Specifically, we first employ the class prototypes to analyze the impact of graph structures on GNN teachers, and then design two losses to distill such information from GNNs to MLPs.

Knowledge Distillation

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations CVPR 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Global Spectral Filter Memory Network for Video Object Segmentation

1 code implementation11 Oct 2022 Yong liu, Ran Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong Tang, Yujiu Yang

Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head).

Attribute Decoder +5

Towards Real-World Video Deblurring by Exploring Blur Formation Process

1 code implementation28 Aug 2022 Mingdeng Cao, Zhihang Zhong, Yanbo Fan, Jiahao Wang, Yong Zhang, Jue Wang, Yujiu Yang, Yinqiang Zheng

We believe the novel realistic synthesis pipeline and the corresponding RAW video dataset can help the community to easily construct customized blur datasets to improve real-world video deblurring performance largely, instead of laboriously collecting real data pairs.

Deblurring Video Deblurring

Learning Adaptive Warping for Real-World Rolling Shutter Correction

1 code implementation CVPR 2022 Mingdeng Cao, Zhihang Zhong, Jiahao Wang, Yinqiang Zheng, Yujiu Yang

This paper proposes the first real-world rolling shutter (RS) correction dataset, BS-RSC, and a corresponding model to correct the RS frames in a distorted video.

Rolling Shutter Correction

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

2 code implementations19 Apr 2022 Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, Yujiu Yang

No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception.

Accelerating Neural Network Optimization Through an Automated Control Theory Lens

no code implementations CVPR 2022 Jiahao Wang, Baoyuan Wu, Rui Su, Mingdeng Cao, Shuwei Shi, Wanli Ouyang, Yujiu Yang

We conduct experiments both from a control theory lens through a phase locus verification and from a network training lens on several models, including CNNs, Transformers, MLPs, and on benchmark datasets.

Math

SAGA: Stochastic Whole-Body Grasping with Contact

1 code implementation19 Dec 2021 Yan Wu, Jiahao Wang, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu, Siyu Tang

Given an initial pose and the generated whole-body grasping pose as the start and end of the motion respectively, we design a novel contact-aware generative motion infilling module to generate a diverse set of grasp-oriented motions.

Object

Adder Attention for Vision Transformer

4 code implementations NeurIPS 2021 Han Shu, Jiahao Wang, Hanting Chen, Lin Li, Yujiu Yang, Yunhe Wang

With the new operation, vision transformers constructed using additions can also provide powerful feature representations.

Diversity

Will You Ever Become Popular? Learning to Predict Virality of Dance Clips

no code implementations6 Nov 2021 Jiahao Wang, Yunhong Wang, Nina Weng, Tianrui Chai, Annan Li, Faxi Zhang, Sansi Yu

Therefore, virality prediction from dance challenges is of great commercial value and has a wide range of applications, such as smart recommendation and popularity promotion.

Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

1 code implementation15 Aug 2021 Jiahao Wang, Yunhong Wang, Sheng Liu, Annan Li

Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited.

Action Understanding Fine-grained Action Recognition +1

Quantitatively Nonblocking Supervisory Control of Discrete-Event Systems

no code implementations2 Aug 2021 Renyuan Zhang, Jiahao Wang, Zenghui Wang, Kai Cai

Finally, combining with the algorithm of computing the supremal controllable sublanguage, we design algorithms to compute the maximally permissive solutions to the formulated (heterogeneously) quantitatively nonblocking supervisory control problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.