Search Results for author: Qi Dai

Found 40 papers, 20 papers with code

MPII: Multi-Level Mutual Promotion for Inference and Interpretation

1 code implementation ACL 2022 Yan Liu, Sanyuan Chen, Yazheng Yang, Qi Dai

In this paper, we propose a multi-level Mutual Promotion mechanism for self-evolved Inference and sentence-level Interpretation (MPII).

Sentence

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

no code implementations13 Jun 2024 Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

no code implementations10 Jun 2024 Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang

Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation.

Language Modelling Large Language Model +1

Effectiveness of Self-Assessment Software to Evaluate Preclinical Operative Procedures

no code implementations8 Apr 2024 Qi Dai, Ryan Davis, Houlin Hong, Ying Gu

Class II preparation at 400{\mu}m tolerance had the smallest mean difference of 0. 41 points.

An edge detection-based deep learning approach for tear meniscus height measurement

no code implementations23 Mar 2024 Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

For improved segmentation of the pupil and tear meniscus areas, the convolutional neural network Inceptionv3 was first implemented as an image quality assessment model, effectively identifying higher-quality images with an accuracy of 98. 224%.

Edge Detection Image Quality Assessment

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition

1 code implementation CVPR 2024 Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng, Yan Yan, Qi Dai, Xian-Sheng Hua

To remedy this we propose a two-fold strategy: (1) We introduce an innovative approach that encodes bone connectivity by harnessing the power of graph distances to describe the physical topology; we further incorporate action-specific topological representation via persistent homology analysis to depict systemic dynamics.

Action Recognition Skeleton Based Action Recognition

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation CVPR 2024 Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

A Survey on Video Diffusion Models

1 code implementation16 Oct 2023 Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.

Image Generation Video Editing +2

SimDA: Simple Diffusion Adapter for Efficient Video Generation

no code implementations CVPR 2024 Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang

In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1. 1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way.

Transfer Learning Video Editing +2

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

1 code implementation ICCV 2023 Shuyuan Tu, Qi Dai, Zuxuan Wu, Zhi-Qi Cheng, Han Hu, Yu-Gang Jiang

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

Action Classification Action Recognition +1

Parallel Sentence-Level Explanation Generation for Real-World Low-Resource Scenarios

no code implementations21 Feb 2023 Yan Liu, Xiaokang Chen, Qi Dai

However, current works pursuing sentence-level explanations rely heavily on annotated training data, which limits the development of interpretability to only a few tasks.

Explanation Generation Natural Language Inference +1

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

1 code implementation ICCV 2023 Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

With these new techniques and other designs, we show that the proposed general-purpose task-solver can perform both instance segmentation and depth estimation well.

Instance Segmentation Monocular Depth Estimation +1

ResFormer: Scaling ViTs with Multi-Resolution Training

1 code implementation CVPR 2023 Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu Qiao, Yu-Gang Jiang

We introduce, ResFormer, a framework that is built upon the seminal idea of multi-resolution training for improved performance on a wide spectrum of, mostly unseen, testing resolutions.

Action Recognition Image Classification +4

HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling

1 code implementation30 May 2022 Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian

A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.

Transfer Learning

Deeper Insights into the Robustness of ViTs towards Common Corruptions

no code implementations26 Apr 2022 Rui Tian, Zuxuan Wu, Qi Dai, Han Hu, Yu-Gang Jiang

With Vision Transformers (ViTs) making great advances in a variety of computer vision tasks, recent literature have proposed various variants of vanilla ViTs to achieve better efficiency and efficacy.

Benchmarking Data Augmentation

Multi-granularity Relabeled Under-sampling Algorithm for Imbalanced Data

no code implementations11 Jan 2022 Qi Dai, Jian-wei Liu, Yang Liu

The Tomek-Link sampling algorithm can effectively reduce the class overlap on data, remove the majority instances that are difficult to distinguish, and improve the algorithm classification accuracy.

Classification imbalanced classification

SimMIM: A Simple Framework for Masked Image Modeling

4 code implementations CVPR 2022 Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Representation Learning Self-Supervised Image Classification +1

On the Connection between Local Attention and Dynamic Depth-wise Convolution

1 code implementation ICLR 2022 Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, Jingdong Wang

Sparse connectivity: there is no connection across channels, and each position is connected to the positions within a small local window.

object-detection Object Detection +2

Calibration of Human Driving Behavior and Preference Using Naturalistic Traffic Data

no code implementations5 May 2021 Qi Dai, Di Shen, Jinhong Wang, Suzhou Huang, Dimitar Filev

Towards this end it is necessary that we have a comprehensive modeling framework for decision-making within which human driving preferences can be inferred statistically from observed driving behaviors in realistic and naturalistic traffic settings.

Autonomous Vehicles Decision Making

Learning to Estimate Kernel Scale and Orientation of Defocus Blur with Asymmetric Coded Aperture

no code implementations10 Mar 2021 Jisheng Li, Qi Dai, Jiangtao Wen

Consistent in-focus input imagery is an essential precondition for machine vision systems to perceive the dynamic environment.

Temporal Action Detection with Multi-level Supervision

no code implementations ICCV 2021 Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, Huijuan Xu

We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1. 2, and demonstrate the effectiveness of the proposed UFA and IB methods.

Action Detection Semi-Supervised Action Detection

Towards a Systematic Computational Framework for Modeling Multi-Agent Decision-Making at Micro Level for Smart Vehicles in a Smart World

no code implementations25 Sep 2020 Qi Dai, Xunnong Xu, Wen Guo, Suzhou Huang, Dimitar Filev

To demonstrate how our approach can be applied to realistic traffic settings, we conduct a simulation experiment: to derive merging and yielding behaviors on a double-lane highway with an unexpected barrier.

Autonomous Vehicles Computational Efficiency +1

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

1 code implementation ICML 2020 Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang

Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.

Domain Generalization Representation Learning

Reinforcing Short-Length Hashing

no code implementations24 Apr 2020 Xingbo Liu, Xiushan Nie, Qi Dai, Yupan Huang, Yilong Yin

Due to the compelling efficiency in retrieval and storage, similarity-preserving hashing has been widely applied to approximate nearest neighbor search in large-scale image retrieval.

Image Retrieval Retrieval

Self-supervised Object Motion and Depth Estimation from Video

no code implementations9 Dec 2019 Qi Dai, Vaishakh Patil, Simon Hecker, Dengxin Dai, Luc van Gool, Konrad Schindler

We present a self-supervised learning framework to estimate the individual object motion and monocular depth from video.

Depth Estimation Instance Segmentation +5

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

no code implementations17 Sep 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Jun-Yan He, Alexander Hauptmann

By minimizing the mutual information, each column is guided to learn features with different image scales.

Crowd Counting

Learning Spatial Awareness to Improve Crowd Counting

no code implementations ICCV 2019 Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, Alexander Hauptmann

Although the Maximum Excess over SubArrays (MESA) loss has been previously proposed to address the above issues by finding the rectangular subregion whose predicted density map has the maximum difference from the ground truth, it cannot be solved by gradient descent, thus can hardly be integrated into the deep learning framework.

Crowd Counting Weakly-supervised Learning

Decoupling Localization and Classification in Single Shot Temporal Action Detection

1 code implementation16 Apr 2019 Yupan Huang, Qi Dai, Yutong Lu

Each branch produces a set of action anchor layers by applying deconvolution to the feature maps of the main stream.

Action Detection Classification +2

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

no code implementations ECCV 2018 Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei

The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.

Action Detection Region Proposal

Cannot find the paper you are looking for? You can Submit a new open access paper.