Search Results for author: Yu Qiao

Found 413 papers, 248 papers with code

Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images

no code implementations • 22 May 2018 • Tao Yu, Yu Qiao, Huan Long

A variety of deep neural networks have been applied in medical image segmentation and achieve good performance.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

Boosting up Scene Text Detectors with Guided CNN

no code implementations • 10 May 2018 • Xiaoyu Yue, Zhanghui Kuang, Zhaoyang Zhang, Zhenfang Chen, Pan He, Yu Qiao, Wei zhang

Deep CNNs have achieved great success in text detection.

Text Detection

Paper
Add Code

Deep Embedding Convolutional Neural Network for Synthesizing CT Image from T1-Weighted MR Image

no code implementations • 7 Sep 2017 • Lei Xiang, Qian Wang, Xiyao Jin, Dong Nie, Yu Qiao, Dinggang Shen

After repeat-ing this embedding procedure for several times in the network, we can eventually synthesize a final CT image in the end of the DECNN.

Computed Tomography (CT) Image Generation

Paper
Add Code

Locally-Supervised Deep Hybrid Model for Scene Recognition

no code implementations • 27 Jan 2016 • Sheng Guo, Weilin Huang, Li-Min Wang, Yu Qiao

Secondly, we propose a new Local Convolutional Supervision (LCS) layer to enhance the local structure of the image by directly propagating the label information to the convolutional layers.

General Classification Image Classification +1

Paper
Add Code

Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still Images

no code implementations • 1 Sep 2016 • Limin Wang, Zhe Wang, Yu Qiao, Luc van Gool

These newly designed transferring techniques exploit multi-task learning frameworks to incorporate extra knowledge from other networks and additional datasets into the training procedure of event CNNs.

Multi-Task Learning

Paper
Add Code

DeepWriter: A Multi-Stream Deep CNN for Text-independent Writer Identification

no code implementations • 21 Jun 2016 • Linjie Xing, Yu Qiao

The main contributions are: 1) we design and optimize multi-stream structure for writer identification task; 2) we introduce data augmentation learning to enhance the performance of DeepWriter; 3) we introduce a patch scanning strategy to handle text image with different lengths.

Data Augmentation Sentence +1

Paper
Add Code

Actionness Estimation Using Hybrid Fully Convolutional Networks

no code implementations • CVPR 2016 • Limin Wang, Yu Qiao, Xiaoou Tang, Luc van Gool

Actionness was introduced to quantify the likelihood of containing a generic action instance at a specific location.

Ranked #11 on Action Detection on J-HMDB

Action Detection Action Recognition +1

Paper
Add Code

Text-Attentional Convolutional Neural Networks for Scene Text Detection

no code implementations • 12 Oct 2015 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components.

Multi-Task Learning Scene Text Detection +3

Paper
Add Code

Reading Scene Text in Deep Convolutional Sequences

1 code implementation • 14 Jun 2015 • Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang

We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem.

Scene Text Recognition

Paper
Code

Better Exploiting OS-CNNs for Better Event Recognition in Images

no code implementations • 14 Oct 2015 • Limin Wang, Zhe Wang, Sheng Guo, Yu Qiao

Event recognition from still images is one of the most important problems for image understanding.

Object Object Recognition +1

Paper
Add Code

Local Multi-Grouped Binary Descriptor with Ring-based Pooling Configuration and Optimization

no code implementations • 22 Sep 2015 • Yongqiang Gao, Weilin Huang, Yu Qiao

The performance of RMGD was evaluated on a number of publicly available benchmarks, where the RMGD outperforms the state-of-the-art binary descriptors significantly.

Paper
Add Code

Local Color Contrastive Descriptor for Image Classification

no code implementations • 3 Aug 2015 • Sheng Guo, Weilin Huang, Yu Qiao

Our descriptor enriches local image representation with both color and contrast information.

Classification General Classification +2

Paper
Add Code

Boosting Optical Character Recognition: A Super-Resolution Approach

no code implementations • 7 Jun 2015 • Chao Dong, Ximei Zhu, Yubin Deng, Chen Change Loy, Yu Qiao

Text image super-resolution is a challenging yet open research problem in the computer vision community.

Image Super-Resolution Optical Character Recognition +1

Paper
Add Code

Object-Scene Convolutional Neural Networks for Event Recognition in Images

no code implementations • 2 May 2015 • Limin Wang, Zhe Wang, Wenbin Du, Yu Qiao

Meanwhile, we investigate different network architectures for OS-CNN design, and adapt the deep (AlexNet) and very-deep (GoogLeNet) networks to the task of event recognition.

Paper
Add Code

Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice

no code implementations • 18 May 2014 • Xiaojiang Peng, Li-Min Wang, Xingxing Wang, Yu Qiao

Many efforts have been made in each step independently in different scenarios and their effect on action recognition is still unknown.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification

no code implementations • 2 Sep 2013 • Xiaojiang Peng, Qiang Peng, Yu Qiao, Junzhou Chen, Mehtab Afzal

Many efforts have been devoted to develop alternative methods to traditional vector quantization in image domain such as sparse coding and soft-assignment.

Action Classification Dictionary Learning +2

Paper
Add Code

Prostate Segmentation using 2D Bridged U-net

no code implementations • 12 Jul 2018 • Wanli Chen, Yue Zhang, Junjun He, Yu Qiao, Yi-fan Chen, Hongjian Shi, Xiaoying Tang

To address the aforementioned three problems, we propose and validate a deeper network that can fit medical image datasets that are usually small in the sample size.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

no code implementations • 3 Oct 2018 • Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X. Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, Cheolkon Jung

This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones.

Image Enhancement Image Super-Resolution

Paper
Add Code

Super-Identity Convolutional Neural Network for Face Hallucination

no code implementations • ECCV 2018 • Kaipeng Zhang, Zhanpeng Zhang, Chia-Wen Cheng, Winston H. Hsu, Yu Qiao, Wei Liu, Tong Zhang

Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information.

Face Generation Face Hallucination +1

Paper
Add Code

Temporal Hallucinating for Action Recognition With Few Still Images

no code implementations • CVPR 2018 • Yali Wang, Lei Zhou, Yu Qiao

To mimic this capacity, we propose a novel Hybrid Video Memory (HVM) machine, which can hallucinate temporal features of still images from video memory, in order to boost action recognition with few still images.

Action Recognition In Still Images Domain Adaptation

Paper
Add Code

Find and Focus: Retrieve and Localize Video Events with Natural Language Queries

no code implementations • ECCV 2018 • Dian Shao, Yu Xiong, Yue Zhao, Qingqiu Huang, Yu Qiao, Dahua Lin

The thriving of video sharing services brings new challenges to video retrieval, e. g. the rapid growth in video duration and content diversity.

Natural Language Queries Retrieval +2

Paper
Add Code

Motionlets: Mid-level 3D Parts for Human Motion Recognition

no code implementations • CVPR 2013 • Li-Min Wang, Yu Qiao, Xiaoou Tang

We postulate three key properties of motionlet for action recognition: high motion saliency, multiple scale representation, and representative-discriminative ability.

Action Recognition Temporal Action Localization

Paper
Add Code

Multi-View Super Vector for Action Recognition

no code implementations • CVPR 2014 • Zhuowei Cai, Li-Min Wang, Xiaojiang Peng, Yu Qiao

Kernel average is then applied on these components to produce recognition result.

Action Recognition Temporal Action Localization

Paper
Add Code

A Key Volume Mining Deep Framework for Action Recognition

no code implementations • CVPR 2016 • Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, Yu Qiao

Training with a large proportion of irrelevant volumes will hurt performance.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Latent Factor Guided Convolutional Neural Networks for Age-Invariant Face Recognition

no code implementations • CVPR 2016 • Yandong Wen, Zhifeng Li, Yu Qiao

In order to address this problem, we propose a novel deep face recognition framework to learn the age-invariant deep face features through a carefully designed CNN model.

Ranked #7 on Age-Invariant Face Recognition on CACDVS

Age-Invariant Face Recognition MORPH

Paper
Add Code

Detecting Faces Using Inside Cascaded Contextual CNN

no code implementations • ICCV 2017 • Kaipeng Zhang, Zhanpeng Zhang, Hao Wang, Zhifeng Li, Yu Qiao, Wei Liu

Deep Convolutional Neural Networks (CNNs) achieve substantial improvements in face detection in the wild.

Face Detection

Paper
Add Code

Range Loss for Deep Face Recognition With Long-Tailed Training Data

no code implementations • ICCV 2017 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Unlike these work, this paper investigated how long-tailed data impact the training of face CNNs and develop a novel loss function, called range loss, to effectively utilize the tailed data in training process.

Face Recognition

Paper
Add Code

P2SGrad: Refined Gradients for Optimizing Deep Face Models

no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Cosine-based softmax losses significantly improve the performance of deep face recognition networks.

Face Recognition

Paper
Add Code

Suppressing Model Overfitting for Image Super-Resolution Networks

no code implementations • 11 Jun 2019 • Ruicheng Feng, Jinjin Gu, Yu Qiao, Chao Dong

Large deep networks have demonstrated competitive performance in single image super-resolution (SISR), with a huge volume of data involved.

Image Super-Resolution Memorization

Paper
Add Code

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression

no code implementations • 8 Jul 2019 • Kai Wang, Jianfei Yang, Da Guo, Kaipeng Zhang, Xiaojiang Peng, Yu Qiao

Based on our winner solution last year, we mainly explore head features and body features with a bootstrap strategy and two novel loss functions in this paper.

regression

Paper
Add Code

Product Image Recognition with Guidance Learning and Noisy Supervision

no code implementations • 26 Jul 2019 • Qing Li, Xiaojiang Peng, Liangliang Cao, Wenbin Du, Hao Xing, Yu Qiao

Instead of collecting product images by labor-and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers.

Paper
Add Code

Learning Category Correlations for Multi-label Image Recognition with Graph Networks

no code implementations • 28 Sep 2019 • Qing Li, Xiaojiang Peng, Yu Qiao, Qiang Peng

In this paper, instead of using a pre-defined graph which is inflexible and may be sub-optimal for multi-label classification, we propose the A-GCN, which leverages the popular Graph Convolutional Networks with an Adaptive label correlation graph to model label dependencies.

Multi-Label Classification Word Embeddings

Paper
Add Code

Understanding Vocabulary Growth Through An Adaptive Language Learning System

no code implementations • WS 2019 • Elma Kerz, Andreas Burgdorf, Daniel Wiechmann, Stefan Meeger, Yu Qiao, Christian Kohlschein, Tobias Meisen

Paper
Add Code

Pose-Assisted Multi-Camera Collaboration for Active Object Tracking

no code implementations • 15 Jan 2020 • Jing Li, Jing Xu, Fangwei Zhong, Xiangyu Kong, Yu Qiao, Yizhou Wang

In the system, each camera is equipped with two controllers and a switcher: The vision-based controller tracks targets based on observed images.

Object Object Tracking

Paper
Add Code

FD-GAN: Generative Adversarial Networks with Fusion-discriminator for Single Image Dehazing

no code implementations • 20 Jan 2020 • Yu Dong, Yihao Liu, He Zhang, Shifeng Chen, Yu Qiao

With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts.

Image Dehazing Single Image Dehazing

Paper
Add Code

Progressive Object Transfer Detection

no code implementations • 12 Feb 2020 • Hao Chen, Yali Wang, Guoyou Wang, Xiang Bai, Yu Qiao

Inspired by this procedure of learning to detect, we propose a novel Progressive Object Transfer Detection (POTD) framework.

Object object-detection +1

Paper
Add Code

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

no code implementations • 26 Feb 2020 • Zhanzhan Cheng, Yunlu Xu, Mingjian Cheng, Yu Qiao, ShiLiang Pu, Yi Niu, Fei Wu

Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e. g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states).

Language Modelling Scene Text Recognition

Paper
Add Code

TTPP: Temporal Transformer with Progressive Prediction for Efficient Action Anticipation

no code implementations • 7 Mar 2020 • Wen Wang, Xiaojiang Peng, Yanzhou Su, Yu Qiao, Jian Cheng

Video action anticipation aims to predict future action categories from observed frames.

Action Anticipation

Paper
Add Code

Text-attentional convolutional neural network for scene text detection

no code implementations • IEEE Trans. on Image Processing, 2016 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images.

Multi-Task Learning Scene Text Detection +3

Paper
Add Code

Learning to Predict Context-adaptive Convolution for Semantic Segmentation

no code implementations • ECCV 2020 • Jianbo Liu, Junjun He, Jimmy S. Ren, Yu Qiao, Hongsheng Li

Long-range contextual information is essential for achieving high-performance semantic segmentation.

Segmentation Semantic Segmentation

Paper
Add Code

COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification

no code implementations • CVPR 2020 • Shijie Yu, Shihua Li, Dapeng Chen, Rui Zhao, Junjie Yan, Yu Qiao

To address the clothes changing person re-id problem, we construct a novel large-scale re-id benchmark named ClOthes ChAnging Person Set (COCAS), which provides multiple images of the same identity with different clothes.

Person Re-Identification

Paper
Add Code

Understanding the Dynamics of Second Language Writing through Keystroke Logging and Complexity Contours

no code implementations • LREC 2020 • Elma Kerz, Fabio Pruneri, Daniel Wiechmann, Yu Qiao, Marcus Str{\"o}bel

The purpose of this paper is twofold: [1] to introduce, to our knowledge, the largest available resource of keystroke logging (KSL) data generated by Etherpad (https://etherpad. org/), an open-source, web-based collaborative real-time editor, that captures the dynamics of second language (L2) production and [2] to relate the behavioral data from KSL to indices of syntactic and lexical complexity of the texts produced obtained from a tool that implements a sliding window approach capturing the progression of complexity within a text.

valid

Paper
Add Code

Becoming Linguistically Mature: Modeling English and German Children's Writing Development Across School Grades

no code implementations • WS 2020 • Elma Kerz, Yu Qiao, Daniel Wiechmann, Marcus Str{\"o}bel

In this paper we employ a novel approach to advancing our understanding of the development of writing in English and German children across school grades using classification tasks.

General Classification

Paper
Add Code

Exploring Multi-Scale Feature Propagation and Communication for Image Super Resolution

no code implementations • 1 Aug 2020 • Ruicheng Feng, Weipeng Guan, Yu Qiao, Chao Dong

Multi-scale techniques have achieved great success in a wide range of computer vision tasks.

Image Super-Resolution

Paper
Add Code

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

no code implementations • 15 Sep 2020 • Haisheng Su, Jing Su, Dongliang Wang, Weihao Gan, Wei Wu, Mengmeng Wang, Junjie Yan, Yu Qiao

Second, the parameter frequency distribution is further adopted to guide the student network to learn the appearance modeling process from the teacher.

Action Recognition Knowledge Distillation +1

Paper
Add Code

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition

no code implementations • 27 Dec 2020 • Hengshun Zhou, Debin Meng, Yuanyuan Zhang, Xiaojiang Peng, Jun Du, Kai Wang, Yu Qiao

The audio-video based emotion recognition aims to classify a given video into basic emotions.

Ranked #1 on Facial Expression Recognition (FER) on Acted Facial Expressions In The Wild (AFEW)

Facial Expression Recognition (FER) Video Emotion Recognition

Paper
Add Code

Multi-scale Information Assembly for Image Matting

no code implementations • 7 Jan 2021 • Yu Qiao, Yuhao Liu, Qiang Zhu, Xin Yang, Yuxin Wang, Qiang Zhang, Xiaopeng Wei

Image matting is a long-standing problem in computer graphics and vision, mostly identified as the accurate estimation of the foreground in input images.

Image Matting

Paper
Add Code

Unsupervised Person Re-Identification with Multi-Label Learning Guided Self-Paced Clustering

no code implementations • 8 Mar 2021 • Qing Li, Xiaojiang Peng, Yu Qiao, Qi Hao

The multi-label learning module leverages a memory feature bank and assigns each image with a multi-label vector based on the similarities between the image and feature bank.

Clustering Multi-Label Learning +2

Paper
Add Code

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos

no code implementations • 16 Mar 2021 • Tianyu Luan, Yali Wang, Junhao Zhang, Zhe Wang, Zhipeng Zhou, Yu Qiao

By coupling advanced 3D pose estimators and HMR in a serial or parallel manner, these two frameworks can effectively correct human mesh with guidance of a concise pose calibration module.

Ranked #4 on 3D Human Pose Estimation on Surreal

3D Human Pose Estimation Human Mesh Recovery

Paper
Add Code

Smart Scribbles for Image Mating

no code implementations • 31 Mar 2021 • Xin Yang, Yu Qiao, Shaozhe Chen, Shengfeng He, BaoCai Yin, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau

Image matting is an ill-posed problem that usually requires additional user input, such as trimaps or scribbles.

Image Matting

Paper
Add Code

Very Lightweight Photo Retouching Network with Conditional Sequential Modulation

no code implementations • 13 Apr 2021 • Yihao Liu, Jingwen He, Xiangyu Chen, Zhengwen Zhang, Hengyuan Zhao, Chao Dong, Yu Qiao

In practice, photo retouching can be accomplished by a series of image processing operations.

Image Retouching Photo Retouching

Paper
Add Code

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech

no code implementations • 17 Apr 2021 • Yu Qiao, Wei Zhou, Elma Kerz, Ralf Schlüter

In recent years, automated approaches to assessing linguistic complexity in second language (L2) writing have made significant progress in gauging learner performance, predicting human ratings of the quality of learner productions, and benchmarking L2 development.

Benchmarking

Paper
Add Code

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

no code implementations • 7 May 2021 • Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, SungJun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, ZiRui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021.

Image Quality Assessment Image Restoration

Paper
Add Code

Neighbourhood-guided Feature Reconstruction for Occluded Person Re-Identification

no code implementations • 16 May 2021 • Shijie Yu, Dapeng Chen, Rui Zhao, Haobin Chen, Yu Qiao

Person images captured by surveillance cameras are often occluded by various obstacles, which lead to defective feature representation and harm person re-identification (Re-ID) performance.

Person Re-Identification

Paper
Add Code

Multiple Domain Experts Collaborative Learning: Multi-Source Domain Generalization For Person Re-Identification

no code implementations • 26 May 2021 • Shijie Yu, Feng Zhu, Dapeng Chen, Rui Zhao, Haobin Chen, Shixiang Tang, Jinguo Zhu, Yu Qiao

In UDCL, a universal expert supervises the learning of domain experts and continuously gathers knowledge from all domain experts.

Domain Generalization Meta-Learning +1

Paper
Add Code

TSI: Temporal Saliency Integration for Video Action Recognition

no code implementations • 2 Jun 2021 • Haisheng Su, Jinyuan Feng, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao

Specifically, SME aims to highlight the motion-sensitive area through local-global motion modeling, where the saliency alignment and pyramidal feature difference are conducted successively between neighboring frames to capture motion dynamics with less noises caused by misaligned background.

Action Recognition Temporal Action Localization

Paper
Add Code

Scalable Transformers for Neural Machine Translation

no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation NMT +1

Paper
Add Code

Alzheimer's Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models

no code implementations • 16 Jun 2021 • Yu Qiao, Xuefeng Yin, Daniel Wiechmann, Elma Kerz

In this paper, we combined linguistic complexity and (dis)fluency features with pretrained language models for the task of Alzheimer's disease detection of the 2021 ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech) challenge.

Alzheimer's Disease Detection

Paper
Add Code

Prior-Induced Information Alignment for Image Matting

no code implementations • 28 Jun 2021 • Yuhao Liu, Jiake Xie, Yu Qiao, Yong Tang and, Xin Yang

Image matting is an ill-posed problem that aims to estimate the opacity of foreground pixels in an image.

Image Matting

Paper
Add Code

Blind Image Super-Resolution: A Survey and Beyond

no code implementations • 7 Jul 2021 • Anran Liu, Yihao Liu, Jinjin Gu, Yu Qiao, Chao Dong

This paper serves as a systematic review on recent progress in blind image SR, and proposes a taxonomy to categorize existing methods into three different classes according to their ways of degradation modelling and the data used for solving the SR model.

Image Super-Resolution

Paper
Add Code

RankSRGAN: Super Resolution Generative Adversarial Networks with Learning to Rank

no code implementations • 20 Jul 2021 • Wenlong Zhang, Yihao Liu, Chao Dong, Yu Qiao

To address the problem, we propose Super-Resolution Generative Adversarial Networks with Ranker (RankSRGAN) to optimize generator in the direction of different perceptual metrics.

Image Super-Resolution Learning-To-Rank

Paper
Add Code

Transferable Knowledge-Based Multi-Granularity Aggregation Network for Temporal Action Localization: Submission to ActivityNet Challenge 2021

no code implementations • 27 Jul 2021 • Haisheng Su, Peiqin Zhuang, Yukun Li, Dongliang Wang, Weihao Gan, Wei Wu, Yu Qiao

This technical report presents an overview of our solution used in the submission to 2021 HACS Temporal Action Localization Challenge on both Supervised Learning Track and Weakly-Supervised Learning Track.

Transfer Learning Weakly-supervised Learning +2

Paper
Add Code

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

no code implementations • 15 Sep 2021 • Junhao Zhang, Yali Wang, Zhipeng Zhou, Tianyu Luan, Zhe Wang, Yu Qiao

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos.

Ranked #10 on 3D Human Pose Estimation on HumanEva-I

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images

no code implementations • 26 Sep 2021 • Zijie Chen, Cheng Li, Junjun He, Jin Ye, Diping Song, Shanshan Wang, Lixu Gu, Yu Qiao

An essential step of RT planning is the accurate segmentation of various organs-at-risks (OARs) in HaN CT images.

Organ Segmentation Segmentation

Paper
Add Code

Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation

no code implementations • 26 Sep 2021 • Junjun He, Jin Ye, Cheng Li, Diping Song, Wanli Chen, Shanshan Wang, Lixu Gu, Yu Qiao

Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images.

Image Segmentation Semantic Segmentation +1

Paper
Add Code

Self-Slimming Vision Transformer

no code implementations • 29 Sep 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.

Knowledge Distillation

Paper
Add Code

Estimating IRI based on pavement distress type, density, and severity: Insights from machine learning techniques

no code implementations • 11 Oct 2021 • Yu Qiao, Sikai Chen, Majed Alinizzi, Miltos Alamaniotis, Samuel Labi

However, it is costly to measure IRI, and for this reason, certain road classes are excluded from IRI measurements at a network level.

Paper
Add Code

Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs

no code implementations • EACL (BEA) 2021 • Elma Kerz, Daniel Wiechmann, Yu Qiao, Emma Tseng, Marcus Ströbel

The key to the present paper is the combined use of what we refer to as ‘complexity contours’, a series of measurements of indices of L2 proficiency obtained by a computational tool that implements a sliding window technique, and recurrent neural network (RNN) classifiers that adequately capture the sequential information in those contours.

Paper
Add Code

A Language-Based Approach to Fake News Detection Through Interpretable Features and BRNN

no code implementations • RDSM (COLING) 2020 • Yu Qiao, Daniel Wiechmann, Elma Kerz

We demonstrate that our approach is promising as it achieves similar results on these two datasets as the best performing black box models reported in the literature.

Explainable Models Fake News Detection +1

Paper
Add Code

Language that Captivates the Audience: Predicting Affective Ratings of TED Talks in a Multi-Label Classification Task

no code implementations • EACL (WASSA) 2021 • Elma Kerz, Yu Qiao, Daniel Wiechmann

The aim of the paper is twofold: (1) to automatically predict the ratings assigned by viewers to 14 categories available for TED talks in a multi-label classification task and (2) to determine what types of features drive classification accuracy for each of the categories.

Multi-Label Classification

Paper
Add Code

Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features

no code implementations • 13 Nov 2021 • Yu Qiao, Sourabh Zanwar, Rishab Bhattacharyya, Daniel Wiechmann, Wei Zhou, Elma Kerz, Ralf Schlüter

One of the key communicative competencies is the ability to maintain fluency in monologic speech and the ability to produce sophisticated language to argue a position convincingly.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

INTERN: A New Learning Paradigm Towards General Vision

no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Paper
Add Code

CPRAL: Collaborative Panoptic-Regional Active Learning for Semantic Segmentation

no code implementations • 11 Dec 2021 • Yu Qiao, Jincheng Zhu, Chengjiang Long, Zeyao Zhang, Yuxin Wang, Zhenjun Du, Xin Yang

Acquiring the most representative examples via active learning (AL) can benefit many data-dependent computer vision tasks by minimizing efforts of image-level or pixel-wise annotations.

Active Learning Semantic Segmentation

Paper
Add Code

CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning

no code implementations • 20 Jan 2022 • Mingye Xu, Yali Wang, Zhipeng Zhou, Hongbin Xu, Yu Qiao

To fill this gap, we propose a generic Contour-Perturbed Reconstruction Network (CP-Net), which can effectively guide self-supervised reconstruction to learn semantic content in the point cloud, and thus promote discriminative power of point cloud representation.

Point cloud reconstruction Self-Supervised Learning

Paper
Add Code

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

no code implementations • 9 Feb 2022 • Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang

Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method.

Contrastive Learning Knowledge Distillation +1

Paper
Add Code

Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination

no code implementations • 21 Feb 2022 • Qingsong Zhao, Yi Wang, Zhipeng Zhou, Duoqian Miao, LiMin Wang, Yu Qiao, Cairong Zhao

Flattening is essential in computer vision by converting multi-dimensional feature maps or images into one-dimensional vectors.

Image Classification Representation Learning +1

Paper
Add Code

Measuring the Impact of (Psycho-)Linguistic and Readability Features and Their Spill Over Effects on the Prediction of Eye Movement Patterns

no code implementations • ACL 2022 • Daniel Wiechmann, Yu Qiao, Elma Kerz, Justus Mattern

There is a growing interest in the combined use of NLP and machine learning methods to predict gaze patterns during naturalistic reading.

Paper
Add Code

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations • 16 Mar 2022 • Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

Paper
Add Code

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition

no code implementations • CVPR 2022 • Mingfei Han, David Junhao Zhang, Yali Wang, Rui Yan, Lina Yao, Xiaojun Chang, Yu Qiao

Learning spatial-temporal relation among multiple actors is crucial for group activity recognition.

Group Activity Recognition

Paper
Add Code

Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features

no code implementations • WASSA (ACL) 2022 • Elma Kerz, Yu Qiao, Sourabh Zanwar, Daniel Wiechmann

Research at the intersection of personality psychology, computer science, and linguistics has recently focused increasingly on modeling and predicting personality from language use.

Language Modelling

Paper
Add Code

Evaluating the Generalization Ability of Super-Resolution Networks

no code implementations • 14 May 2022 • Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, Chao Dong

However, research on the generalization ability of Super-Resolution (SR) networks is currently absent.

Super-Resolution

Paper
Add Code

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot

no code implementations • 16 Jun 2022 • Li Chen, Tutian Tang, Zhitian Cai, Yang Li, Penghao Wu, Hongyang Li, Jianping Shi, Junchi Yan, Yu Qiao

Equipped with a wide span of sensors, predominant autonomous driving solutions are becoming more modular-oriented for safe system design.

Autonomous Driving

Paper
Add Code

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm

no code implementations • 12 Jul 2022 • Mingye Xu, Yali Wang, Yihao Liu, Tong He, Yu Qiao

Inspired by prompting approaches from NLP, we creatively reinterpret point cloud generation and refinement as the prompting and predicting stages, respectively.

Point Cloud Completion

Paper
Add Code

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

no code implementations • 16 Jul 2022 • Wei Wu, Junlin He, Yu Qiao, Guoheng Fu, Li Liu, Jin Yu

The in-memory approximate nearest neighbor search (ANNS) algorithms have achieved great success for fast high-recall query processing, but are extremely inefficient when handling hybrid queries with unstructured (i. e., feature vectors) and structured (i. e., related attributes) constraints.

Attribute

Paper
Add Code

GenText: Unsupervised Artistic Text Generation via Decoupled Font and Texture Manipulation

no code implementations • 20 Jul 2022 • Qirui Huang, Bin Fu, Aozhong zhang, Yu Qiao

Specifically, our current work incorporates three different stages, stylization, destylization, and font transfer, respectively, into a unified platform with a single powerful encoder network and two separate style generator networks, one for font transfer, the other for stylization and destylization.

Style Transfer Text Style Transfer

Paper
Add Code

Collaboration of Pre-trained Models Makes Better Few-shot Learner

no code implementations • 25 Sep 2022 • Renrui Zhang, Bohao Li, Wei zhang, Hao Dong, Hongsheng Li, Peng Gao, Yu Qiao

In this paper, we propose CoMo, a Collaboration of pre-trained Models that incorporates diverse prior knowledge from various pre-training paradigms for better few-shot learning.

Few-Shot Learning Representation Learning

Paper
Add Code

Low-Resolution Action Recognition for Tiny Actions Challenge

no code implementations • 28 Sep 2022 • BoYu Chen, Yu Qiao, Yali Wang

Second, these activities are naturally distributed in a long-tailed way.

Action Recognition Super-Resolution

Paper
Add Code

SPADE: A Big Five-Mturk Dataset of Argumentative Speech Enriched with Socio-Demographics for Personality Detection

1 code implementation • LREC 2022 • Elma Kerz, Yu Qiao, Sourabh Zanwar, Daniel Wiechmann

In recent years, there has been increasing interest in automatic personality detection based on language.

Paper
Code

MANTIS at SMM4H’2022: Pre-Trained Language Models Meet a Suite of Psycholinguistic Features for the Detection of Self-Reported Chronic Stress

no code implementations • SMM4H (COLING) 2022 • Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz

This paper describes our submission to Social Media Mining for Health (SMM4H) 2022 Shared Task 8, aimed at detecting self-reported chronic stress on Twitter.

Paper
Add Code

The Best of Both Worlds: Combining Engineered Features with Transformers for Improved Mental Health Prediction from Reddit Posts

no code implementations • SMM4H (COLING) 2022 • Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz

In recent years, there has been increasing interest in the application of natural language processing and machine learning techniques to the detection of mental health conditions (MHC) based on social media data.

Paper
Add Code

Wider and Higher: Intensive Integration and Global Foreground Perception for Image Matting

no code implementations • 13 Oct 2022 • Yu Qiao, Ziqi Wei, Yuhao Liu, Yuxin Wang, Dongsheng Zhou, Qiang Zhang, Xin Yang

This paper reviews recent deep-learning-based matting research and conceives our wider and higher motivation for image matting.

Decoder Image Matting

Paper
Add Code

Hierarchical and Progressive Image Matting

no code implementations • 13 Oct 2022 • Yu Qiao, Yuhao Liu, Ziqi Wei, Yuxin Wang, Qiang Cai, Guofeng Zhang, Xin Yang

In this paper, we propose an end-to-end Hierarchical and Progressive Attention Matting Network (HAttMatting++), which can better predict the opacity of the foreground from single RGB images without additional input.

Image Matting SSIM

Paper
Add Code

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection

no code implementations • 20 Oct 2022 • Yi Liu, Xuan Zhang, Ying Li, Guixin Liang, Yabing Jiang, Lixia Qiu, Haiping Tang, Fei Xie, Wei Yao, Yi Dai, Yu Qiao, Yali Wang

For this reason, we propose to advance research areas of video understanding, with a shift from traditional action recognition to industrial anomaly analysis.

Temporal Defect Localization Video Defect Classification

Paper
Add Code

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

Paper
Add Code

Improving Training and Inference of Face Recognition Models via Random Temperature Scaling

no code implementations • 2 Dec 2022 • Lei Shang, Mouxiao Huang, Wu Shi, Yuchen Liu, Yang Liu, Fei Wang, Baigui Sun, Xuansong Xie, Yu Qiao

Intuitively, FR algorithms can benefit from both the estimation of uncertainty and the detection of out-of-distribution (OOD) samples.

Face Recognition Out of Distribution (OOD) Detection

Paper
Add Code

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

no code implementations • 4 Dec 2022 • Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.

Ranked #1 on Common Sense Reasoning on ReCoRD

Common Sense Reasoning coreference-resolution +5

Paper
Add Code

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features

no code implementations • 19 Dec 2022 • Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz

In recent years, there has been increased interest in building predictive models that harness natural language processing and machine learning techniques to detect emotions from various text sources, including social media posts, micro-blogs or news articles.

Emotion Recognition Transfer Learning

Paper
Add Code

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency

no code implementations • CVPR 2023 • Mingye Xu, Mutian Xu, Tong He, Wanli Ouyang, Yali Wang, Xiaoguang Han, Yu Qiao

Besides, such scenes with progressive masking ratios can also serve to self-distill their intrinsic spatial consistency, requiring to learn the consistent representations from unmasked areas.

object-detection Object Detection +2

Paper
Add Code

MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical Simplification with Pretrained Encoders

no code implementations • 19 Dec 2022 • Xiaofei Li, Daniel Wiechmann, Yu Qiao, Elma Kerz

In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability.

Language Modelling Lexical Simplification +4

Paper
Add Code

(Psycho-)Linguistic Features Meet Transformer Models for Improved Explainable and Controllable Text Simplification

no code implementations • 19 Dec 2022 • Yu Qiao, Xiaofei Li, Daniel Wiechmann, Elma Kerz

State-of-the-art text simplification (TS) systems adopt end-to-end neural network models to directly generate the simplified version of the input text, and usually function as a blackbox.

Text Simplification

Paper
Add Code

Exploring Hybrid and Ensemble Models for Multiclass Prediction of Mental Health Status on Social Media

no code implementations • 19 Dec 2022 • Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz

In recent years, there has been a surge of interest in research on automatic mental health detection (MHD) from social media data leveraging advances in natural language processing and machine learning techniques.

Binary Classification

Paper
Add Code

Content Rating Classification for Fan Fiction

no code implementations • 23 Dec 2022 • Yu Qiao, James Pope

The problem is to take fan fiction text and determine the appropriate content rating.

Binary Classification Classification

Paper
Add Code

Uncertainty-Estimation with Normalized Logits for Out-of-Distribution Detection

no code implementations • 15 Feb 2023 • Mouxiao Huang, Yu Qiao

However, neural networks often suffer from the overconfidence issue, making high confidence for OOD data which are never seen during training process and may be irrelevant to training data, namely in-distribution (ID) data.

Autonomous Driving Medical Diagnosis +2

Paper
Add Code

FCN+: Global Receptive Convolution Makes FCN Great Again

no code implementations • 8 Mar 2023 • Zhongying Deng, Xiaoyu Ren, Jin Ye, Junjun He, Yu Qiao

The motivation of GRC is that different channels of a convolutional filter can have different grid sampling locations across the whole input feature map.

Segmentation Semantic Segmentation

Paper
Add Code

Rethinking Range View Representation for LiDAR Segmentation

no code implementations • ICCV 2023 • Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, Ziwei Liu

We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks, i. e., SemanticKITTI, nuScenes, and ScribbleKITTI.

Ranked #4 on 3D Semantic Segmentation on SemanticKITTI

3D Semantic Segmentation Autonomous Driving +4

Paper
Add Code

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

no code implementations • 21 Mar 2023 • Chaoning Zhang, Chenshuang Zhang, Sheng Zheng, Yu Qiao, Chenghao Li, Mengchun Zhang, Sumit Kumar Dam, Chu Myaet Thwal, Ye Lin Tun, Le Luang Huy, Donguk Kim, Sung-Ho Bae, Lik-Hang Lee, Yang Yang, Heng Tao Shen, In So Kweon, Choong Seon Hong

As ChatGPT goes viral, generative AI (AIGC, a. k. a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond.

Language Modelling

Paper
Add Code

Prototype Helps Federated Learning: Towards Faster Convergence

no code implementations • 22 Mar 2023 • Yu Qiao, Seong-Bae Park, Sun Moo Kang, Choong Seon Hong

In this paper, a prototype-based federated learning framework is proposed, which can achieve better inference performance with only a few changes to the last global iteration of the typical federated learning process.

Federated Learning

Paper
Add Code

MP-FedCL: Multiprototype Federated Contrastive Learning for Edge Intelligence

no code implementations • 1 Apr 2023 • Yu Qiao, Md. Shirajum Munir, Apurba Adhikary, Huy Q. Le, Avi Deb Raha, Chaoning Zhang, Choong Seon Hong

The existing single prototype-based strategy represents a class by using the mean of the feature space.

Contrastive Learning Federated Learning

Paper
Add Code

STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training

no code implementations • 13 Apr 2023 • Ziyan Huang, Haoyu Wang, Zhongying Deng, Jin Ye, Yanzhou Su, Hui Sun, Junjun He, Yun Gu, Lixu Gu, Shaoting Zhang, Yu Qiao

However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions.

Image Segmentation Medical Image Segmentation +2

Paper
Add Code

One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era

no code implementations • 4 Apr 2023 • Chaoning Zhang, Chenshuang Zhang, Chenghao Li, Yu Qiao, Sheng Zheng, Sumit Kumar Dam, Mengchun Zhang, Jung Uk Kim, Seong Tae Kim, Jinwoo Choi, Gyeong-Moon Park, Sung-Ho Bae, Lik-Hang Lee, Pan Hui, In So Kweon, Choong Seon Hong

Overall, this work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges.

Paper
Add Code

Perception Imitation: Towards Synthesis-free Simulator for Autonomous Vehicles

no code implementations • 19 Apr 2023 • Xiaoliang Ju, Yiyang Sun, Yiming Hao, Yikang Li, Yu Qiao, Hongsheng Li

We propose a perception imitation method to simulate results of a certain perception model, and discuss a new heuristic route of autonomous driving simulator without data synthesis.

Autonomous Driving

Paper
Add Code

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation

no code implementations • 24 Apr 2023 • Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu

To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.

Image Generation Image Manipulation +1

Paper
Add Code

Segment Anything Model (SAM) Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

no code implementations • 29 Apr 2023 • Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, Seungkyu Lee, Sung-Ho Bae, Choong Seon Hong

Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks.

Segmentation Semantic Segmentation +1

Paper
Add Code

Distilling Focal Knowledge From Imperfect Expert for 3D Object Detection

no code implementations • CVPR 2023 • Jia Zeng, Li Chen, Hanming Deng, Lewei Lu, Junchi Yan, Yu Qiao, Hongyang Li

Specifically, a set of queries are leveraged to locate the instance-level areas for masked feature generation, to intensify feature representation ability in these areas.

3D Object Detection Knowledge Distillation +2

Paper
Add Code

Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions

no code implementations • CVPR 2023 • Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, Xiaowei Hu

Inspired by this observation, we design an efficient unified framework with a two-stage training strategy to explore the weather-general and weather-specific features.

Image Restoration

Paper
Add Code

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

no code implementations • 2 Jun 2023 • Zeqiang Lai, Yuchen Duan, Jifeng Dai, Ziheng Li, Ying Fu, Hongsheng Li, Yu Qiao, Wenhai Wang

In this paper, we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently-developed denoising diffusion generative model.

Denoising Segmentation +1

Paper
Add Code

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

no code implementations • 12 May 2023 • Chaoning Zhang, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

This is an ongoing project and we intend to update the manuscript on a regular basis.

Edge Detection Prompt Engineering

Paper
Add Code

Robustness of SAM: Segment Anything Under Corruptions and Beyond

no code implementations • 13 Jun 2023 • Yu Qiao, Chaoning Zhang, Taegoo Kang, Donghun Kim, Chenshuang Zhang, Choong Seon Hong

Following by interpreting the effects of synthetic corruption as style changes, we proceed to conduct a comprehensive evaluation for its robustness against 15 types of common corruption.

Style Transfer

Paper
Add Code

Align, Adapt and Inject: Sound-guided Unified Image Generation

no code implementations • 20 Jun 2023 • Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo

Then, we propose the audio adapter to adapt audio representation into an audio token enriched with specific semantics, which can be injected into a frozen T2I model flexibly.

Image Generation Retrieval +1

Paper
Add Code

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

no code implementations • 15 Jun 2023 • Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li

Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs).

Ranked #3 on Temporal/Casual QA on NExT-QA (using extra training data)

Domain Generalization Retrieval +2

Paper
Add Code

Boosting Federated Learning Convergence with Prototype Regularization

no code implementations • 20 Jul 2023 • Yu Qiao, Huy Q. Le, Choong Seon Hong

As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data.

Federated Learning

Paper
Add Code

FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning

no code implementations • 25 Jul 2023 • Huy Q. Le, Minh N. H. Nguyen, Chu Myaet Thwal, Yu Qiao, Chaoning Zhang, Choong Seon Hong

Bringing this concept into a system, we develop a distillation-based multimodal embedding knowledge transfer mechanism, namely FedMEKT, which allows the server and clients to exchange the joint knowledge of their learning models extracted from a small multimodal proxy dataset.

Federated Learning Human Activity Recognition +1

Paper
Add Code

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation

no code implementations • ICCV 2023 • Mingfei Han, Yali Wang, Zhihui Li, Lina Yao, Xiaojun Chang, Yu Qiao

To tackle this problem, we propose a concise Hybrid Temporal-scale Multimodal Learning (HTML) framework, which can effectively align lingual and visual features to discover core object semantics in the video, by learning multimodal interaction hierarchically from different temporal scales.

Ranked #6 on Referring Video Object Segmentation on Refer-YouTube-VOS (using extra training data)

Object Referring Video Object Segmentation +2

Paper
Add Code

Multi-view Spectral Polarization Propagation for Video Glass Segmentation

no code implementations • ICCV 2023 • Yu Qiao, Bo Dong, Ao Jin, Yu Fu, Seung-Hwan Baek, Felix Heide, Pieter Peers, Xiaopeng Wei, Xin Yang

In this paper, we present the first polarization-guided video glass segmentation propagation solution (PGVS-Net) that can robustly and coherently propagate glass segmentation in RGB-P video sequences.

Image Segmentation Segmentation +1

Paper
Add Code

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding

no code implementations • ICCV 2023 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

The prolific performances of Vision Transformers (ViTs) in image tasks have prompted research into adapting the image ViTs for video tasks.

Video Understanding

Paper
Add Code

Exploring Counterfactual Alignment Loss towards Human-centered AI

no code implementations • 3 Oct 2023 • Mingzhou Liu, Xinwei Sun, Ching-Wen Lee, Yu Qiao, Yizhou Wang

In particular, we utilize the counterfactual generation's ability for causal attribution to introduce a novel loss called the CounterFactual Alignment (CF-Align) loss.

Attribute counterfactual +1

Paper
Add Code

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

no code implementations • 8 Oct 2023 • Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang

Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches.

Keypoint Detection

Paper
Add Code

On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets

no code implementations • 10 Oct 2023 • Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan

Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets.

Benchmarking

Paper
Add Code

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

no code implementations • 12 Oct 2023 • Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo

This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations.

Decision Making

Paper
Add Code

Unifying Image Processing as Visual Prompting Question Answering

no code implementations • 16 Oct 2023 • Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc.

Image Enhancement Image Restoration +4

Paper
Add Code

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

no code implementations • 31 Oct 2023 • Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu

The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos.

Paper
Add Code

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

no code implementations • 6 Nov 2023 • Zhiyu Zhao, Bingkun Huang, Sen Xing, Gangshan Wu, Yu Qiao, LiMin Wang

And AMD achieves 73. 3% classification accuracy using the ViT-B model on the Something-in-Something V2 dataset, a 3. 7% improvement over the original ViT-B model from VideoMAE.

Ranked #20 on Action Recognition on Something-Something V2

Action Classification Action Recognition +3

Paper
Add Code

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

no code implementations • 15 Nov 2023 • Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu

Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e. g., chemical molecular formula).

World Knowledge

Paper
Add Code

Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape

no code implementations • 3 Jun 2023 • Chaoning Zhang, Yu Qiao, Shehbaz Tariq, Sheng Zheng, Chenshuang Zhang, Chenghao Li, Hyundong Shin, Choong Seon Hong

Different from label-oriented recognition tasks, the SAM is trained to predict a mask for covering the object shape based on a promt.

Image Segmentation Semantic Segmentation

Paper
Add Code

DiffusionMat: Alpha Matting as Sequential Refinement Learning

no code implementations • 22 Nov 2023 • Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo

In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes.

Denoising Image Matting

Paper
Add Code

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations • NeurIPS 2023 • Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

Paper
Add Code

VideoBooth: Diffusion-based Video Generation with Image Prompts

no code implementations • 1 Dec 2023 • Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.

Video Generation

Paper
Add Code

Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies

no code implementations • 7 Dec 2023 • Pengcheng Chen, Ziyan Huang, Zhongying Deng, Tianbin Li, Yanzhou Su, Haoyu Wang, Jin Ye, Yu Qiao, Junjun He

OpenAI's latest large vision-language model (LVLM), GPT-4V(ision), has piqued considerable interest for its potential in medical applications.

Language Modelling Prompt Engineering

Paper
Add Code

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

no code implementations • 8 Dec 2023 • Hongjie Zhang, Yi Liu, Lu Dong, Yifei HUANG, Zhen-Hua Ling, Yali Wang, LiMin Wang, Yu Qiao

While several long-form VideoQA datasets have been introduced, the length of both videos used to curate questions and sub-clips of clues leveraged to answer those questions have not yet reached the criteria for genuine long-form video understanding.

Question Answering Video Question Answering +1

Paper
Add Code

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

no code implementations • 11 Dec 2023 • Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng

Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image.

SSIM

Paper
Add Code

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

no code implementations • 14 Dec 2023 • Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai

Many reinforcement learning environments (e. g., Minecraft) provide only sparse rewards that indicate task completion or failure with binary values.

reinforcement-learning

Paper
Add Code

Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey

no code implementations • 15 Dec 2023 • Xu Liu, Tong Zhou, Yuanxin Wang, Yuping Wang, Qinjingwen Cao, Weizhi Du, Yonghuan Yang, Junjun He, Yu Qiao, Yiqing Shen

The advent of foundation models, which are pre-trained on vast datasets, has ushered in a new era of computer vision, characterized by their robustness and remarkable zero-shot generalization capabilities.

Image Generation Image Segmentation +2

Paper
Add Code

Critic-Guided Decision Transformer for Offline Reinforcement Learning

no code implementations • 21 Dec 2023 • Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao

Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner.

D4RL Offline RL +3

Paper
Add Code

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

no code implementations • 24 Jan 2024 • Fanghua Yu, Jinjin Gu, Zheyuan Li, JinFan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up.

Descriptive Image Restoration

Paper
Add Code

SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning

no code implementations • 24 Jan 2024 • Guoxin Chen, Kexin Tang, Chao Yang, Fuying Ye, Yu Qiao, Yiming Qian

Moreover, existing reinforcement learning (RL) based methods overlook the structured relationships, underutilizing the potential of RL in structured reasoning.

Question Answering reinforcement-learning +1

Paper
Add Code

Cross-Modal Prototype based Multimodal Federated Learning under Severely Missing Modality

no code implementations • 25 Jan 2024 • Huy Q. Le, Chu Myaet Thwal, Yu Qiao, Ye Lin Tun, Minh N. H. Nguyen, Choong Seon Hong

In this paper, we propose Multimodal Federated Cross Prototype Learning (MFCPL), a novel approach for MFL under severely missing modalities by conducting the complete prototypes to provide diverse modality knowledge in modality-shared level with the cross-modal regularization and modality-specific level with cross-modal contrastive mechanism.

Federated Learning

Paper
Add Code

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

no code implementations • 26 Jan 2024 • Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, LiMin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang

Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents.

Paper
Add Code

Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation

no code implementations • 12 Dec 2023 • Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao, Yu Liu

Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks.

Decision Making Language Modelling +1

Paper
Add Code

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

no code implementations • 22 Feb 2024 • Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language.

Code Generation Common Sense Reasoning +2

Paper
Add Code

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

no code implementations • 25 Feb 2024 • Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI.

Ranked #76 on Visual Question Answering on MM-Vet

Code Generation Multimodal Reasoning +1

Paper
Add Code

Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning

no code implementations • 27 Feb 2024 • Zhaoxun Ju, Chao Yang, Hongbo Wang, Yu Qiao, Fuchun Sun

Language-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions.

Imitation Learning Quantization

Paper
Add Code

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

no code implementations • 29 Feb 2024 • BoYu Chen, Siran Chen, Kunchang Li, Qinglin Xu, Yu Qiao, Yali Wang

Finally, we blend external multimodal knowledge in Adapt stage, by inserting multimodal knowledge adaptation modules into networks.

Transfer Learning Video Recognition

Paper
Add Code

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset

no code implementations • 29 Feb 2024 • Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, JIA YU, Chaobin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan, Conghui He

To evaluate the quality and utility of the dataset, we trained 1B-parameter and 3B-parameter models using WanJuan-CC and another dataset, RefinedWeb.

Paper
Add Code

Towards Implicit Prompt For Text-To-Image Models

no code implementations • 4 Mar 2024 • Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo

We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.

Position

Paper
Add Code

Towards Robust Federated Learning via Logits Calibration on Non-IID Data

no code implementations • 5 Mar 2024 • Yu Qiao, Apurba Adhikary, Chaoning Zhang, Choong Seon Hong

Meanwhile, the non-independent and identically distributed (non-IID) challenge of data distribution between edge devices can further degrade the performance of models.

Federated Learning Privacy Preserving

Paper
Add Code

Exploring Safety Generalization Challenges of Large Language Models via Code

no code implementations • 12 Mar 2024 • Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Yu Qiao, Wai Lam, Lizhuang Ma

The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse.

Code Completion

Paper
Add Code

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions

no code implementations • 14 Mar 2024 • Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang

To bridge this gap, we introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions (AVIs), including four types of image-based AVIs, ten types of text-based AVIs, and nine types of content bias AVIs (such as gender, violence, cultural, and racial biases, among others).

Fairness Language Modelling

Paper
Add Code

Desigen: A Pipeline for Controllable Design Template Generation

no code implementations • 14 Mar 2024 • Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chen

In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background.

Paper
Add Code

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

no code implementations • 28 Mar 2024 • Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments.

Motion Planning

Paper
Add Code

Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence

no code implementations • 28 Mar 2024 • Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng

Neural rendering techniques have significantly advanced 3D human body modeling.

Neural Rendering Quantization

Paper
Add Code

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

no code implementations • 1 Apr 2024 • Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao

LLaMA-Excitor ensures a self-adaptive allocation of additional attention to input instructions, thus effectively preserving LLMs' pre-trained knowledge when fine-tuning LLMs on low-quality instruction-following datasets.

Image Captioning Instruction Following

Paper
Add Code

VideoDistill: Language-aware Vision Distillation for Video Question Answering

no code implementations • 1 Apr 2024 • Bo Zou, Chao Yang, Yu Qiao, Chengbin Quan, Youjian Zhao

In this paper, we are inspired by the human recognition and learning pattern and propose VideoDistill, a framework with language-aware (i. e., goal-driven) behavior in both vision perception and answer generation process.

Answer Generation Question Answering +1

Paper
Add Code

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement

no code implementations • 3 Apr 2024 • Hao Wu, Huabin Liu, Yu Qiao, Xiao Sun

We present Dive Into the BoundarieS (DIBS), a novel pretraining framework for dense video captioning (DVC), that elaborates on improving the quality of the generated event captions and their associated pseudo event boundaries from unlabeled videos.

Dense Video Captioning

Paper
Add Code

Logit Calibration and Feature Contrast for Robust Federated Learning on Non-IID Data

no code implementations • 10 Apr 2024 • Yu Qiao, Chaoning Zhang, Apurba Adhikary, Choong Seon Hong

Federated learning (FL) is a privacy-preserving distributed framework for collaborative model training on devices in edge networks.

Adversarial Robustness Federated Learning +1

Paper
Add Code

FedCCL: Federated Dual-Clustered Feature Contrast Under Domain Heterogeneity

no code implementations • 14 Apr 2024 • Yu Qiao, Huy Q. Le, Mengchun Zhang, Apurba Adhikary, Chaoning Zhang, Choong Seon Hong

First, we employ clustering on the local representations of each client, aiming to capture intra-class information based on these local clusters at a high level of granularity.

Clustering Federated Learning +1

Paper
Add Code

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

no code implementations • 24 Apr 2024 • Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation.

Paper
Add Code

Towards Real-world Video Face Restoration: A New Benchmark

no code implementations • 30 Apr 2024 • Ziyan Chen, Jingwen He, Xinqi Lin, Yu Qiao, Chao Dong

Blind face restoration (BFR) on images has significantly progressed over the last several years, while real-world video face restoration (VFR), which is more challenging for more complex face motions such as moving gaze directions and facial orientations involved, remains unsolved.

Blind Face Restoration Image Quality Assessment +1

Paper
Add Code

A Comprehensive Study on Temporal Modeling for Online Action Detection

1 code implementation • 21 Jan 2020 • Wen Wang, Xiaojiang Peng, Yu Qiao, Jian Cheng

Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years.

Online Action Detection

Paper
Code

Fake Alignment: Are LLMs Really Aligned Well?

1 code implementation • 10 Nov 2023 • Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang

The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety.

Multiple-choice

Paper
Code

Causal Evaluation of Language Models

1 code implementation • 1 May 2024 • Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning.

Causal Discovery Causal Inference +1

Paper
Code

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

1 code implementation • 29 Jan 2024 • Weigao Sun, Zhen Qin, Weixuan Sun, Shidi Li, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

CO2 is able to attain a high scalability even on extensive multi-node clusters constrained by very limited communication bandwidth.

Paper
Code

Range Loss for Deep Face Recognition with Long-tail

2 code implementations • 28 Nov 2016 • Xiao Zhang, Zhiyuan Fang, Yandong Wen, Zhifeng Li, Yu Qiao

Convolutional neural networks have achieved great improvement on face recognition in recent years because of its extraordinary ability in learning discriminative features of people with different identities.

Face Recognition

Paper
Code

FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German

1 code implementation • EMNLP (FEVER) 2021 • Justus Mattern, Yu Qiao, Elma Kerz, Daniel Wiechmann, Markus Strohmaier

As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an ‘infodemic’ – a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society.

Fake News Detection

Paper
Code

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

1 code implementation • 1 Jun 2023 • Xiaoliang Ju, Zhaoyang Huang, Yijin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li

In addition to the scene generation, the final part of DiffInDScene can be used as a post-processing module to refine the 3D reconstruction results from multi-view stereo.

3D Generation 3D Reconstruction +1

Paper
Code

Vlogger: Make Your Dream A Vlog

1 code implementation • 17 Jan 2024 • Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang

More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor.

Language Modelling Large Language Model +1

Paper
Code

Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network

1 code implementation • 31 Mar 2016 • Tong He, Weilin Huang, Yu Qiao, Jian Yao

We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization.

Scene Text Detection Text Detection

Paper
Code

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

1 code implementation • CVPR 2023 • Xin Li, Tao Ma, Yuenan Hou, Botian Shi, Yuchen Yang, Youquan Liu, Xingjiao Wu, Qin Chen, Yikang Li, Yu Qiao, Liang He

Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81. 02 mAPH (L2) detection performance.

3D Object Detection object-detection +1

Paper
Code

Causal Discovery via Conditional Independence Testing with Proxy Variables

1 code implementation • 9 May 2023 • Mingzhou Liu, Xinwei Sun, Yu Qiao, Yizhou Wang

Distinguishing causal connections from correlations is important in many scenarios.

Causal Discovery Causal Identification

Paper
Code

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

1 code implementation • 16 Jun 2022 • Penghao Wu, Xiaosong Jia, Li Chen, Junchi Yan, Hongyang Li, Yu Qiao

The two branches are connected so that the control branch receives corresponding guidance from the trajectory branch at each time step.

Ranked #3 on Autonomous Driving on CARLA Leaderboard

Autonomous Driving CARLA longest6 +1

Paper
Code

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation

1 code implementation • 20 Dec 2022 • Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei LI, Yu Qiao, Jingjing Xu

To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.

Machine Translation Translation

Paper
Code

Real-time Holistic Robot Pose Estimation with Unknown States

1 code implementation • 8 Feb 2024 • Shikun Ban, Juling Fan, Wentao Zhu, Xiaoxuan Ma, Yu Qiao, Yizhou Wang

We propose an end-to-end pipeline for real-time, holistic robot pose estimation from a single RGB image, even in the absence of known robot states.

Ranked #1 on Robot Pose Estimation on DREAM-dataset

6D Pose Estimation using RGB Robot Pose Estimation

Paper
Code

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

2 code implementations • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

Paper
Code

Efficient Action Counting with Dynamic Queries

1 code implementation • 3 Mar 2024 • Zishi Li, Xiaoxuan Ma, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang

Temporal repetition counting aims to quantify the repeated action cycles within a video.

Contrastive Learning

Paper
Code

Investigate Indistinguishable Points in Semantic Segmentation of 3D Point Cloud

1 code implementation • 18 Mar 2021 • Mingye Xu, Zhipeng Zhou, Junhao Zhang, Yu Qiao

This paper investigates the indistinguishable points (difficult to predict label) in semantic segmentation for large-scale 3D point clouds.

3D Semantic Segmentation Segmentation

Paper
Code

Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification

1 code implementation • CVPR 2021 • Xiao Zhang, Yixiao Ge, Yu Qiao, Hongsheng Li

Unsupervised object re-identification targets at learning discriminative representations for object retrieval without any annotations.

Clustering Pseudo Label +1

Paper
Code

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

1 code implementation • NeurIPS 2023 • Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao

What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity.

Instance Segmentation Semantic Segmentation +1

Paper
Code

Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation

1 code implementation • 12 Dec 2023 • Yuchen Yang, Yu Qiao, Xiao Sun

Automatic estimation of 3D human pose from monocular RGB images is a challenging and unsolved problem in computer vision.

3D Pose Estimation

Paper
Code

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

1 code implementation • 31 Mar 2024 • Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji

Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.

Language Modelling Large Language Model

Paper
Code

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation

1 code implementation • 15 Sep 2020 • Haisheng Su, Weihao Gan, Wei Wu, Yu Qiao, Junjie Yan

In this paper, we present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation.

Ranked #6 on Temporal Action Proposal Generation on ActivityNet-1.3

Relation Temporal Action Proposal Generation

Paper
Code

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization

1 code implementation • 24 May 2021 • Yi Liu, LiMin Wang, Yali Wang, Xiao Ma, Yu Qiao

Temporal action localization (TAL) is an important and challenging problem in video understanding.

Fine-Grained Action Detection Temporal Localization +2

Paper
Code

POS-BERT: Point Cloud One-Stage BERT Pre-Training

1 code implementation • 3 Apr 2022 • Kexue Fu, Peng Gao, Shaolei Liu, Renrui Zhang, Yu Qiao, Manning Wang

We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process.

Contrastive Learning Language Modelling +3

Paper
Code

Recurrent Bilinear Optimization for Binary Neural Networks

2 code implementations • 4 Sep 2022 • Sheng Xu, Yanjing Li, Tiancheng Wang, Teli Ma, Baochang Zhang, Peng Gao, Yu Qiao, Jinhu Lv, Guodong Guo

To address this issue, Recurrent Bilinear Optimization is proposed to improve the learning process of BNNs (RBONNs) by associating the intrinsic bilinear variables in the back propagation process.

object-detection Object Detection

Paper
Code

DreamDA: Generative Data Augmentation with Diffusion Models

1 code implementation • 19 Mar 2024 • Yunxiang Fu, Chaoqi Chen, Yu Qiao, Yizhou Yu

The acquisition of large-scale, high-quality data is a resource-intensive and time-consuming endeavor.

Data Augmentation

Paper
Code

MGMAE: Motion Guided Masking for Video Masked Autoencoding

1 code implementation • ICCV 2023 • Bingkun Huang, Zhiyu Zhao, Guozhen Zhang, Yu Qiao, LiMin Wang

Based on this masking volume, we can track the unmasked tokens in time and sample a set of temporal consistent cubes from videos.

Optical Flow Estimation Representation Learning

Paper
Code

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety

1 code implementation • 22 Jan 2024 • Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao

In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety.

Paper
Code

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

1 code implementation • 20 Sep 2023 • Renqiu Xia, Bo Zhang, Haoyang Peng, Hancheng Ye, Xiangchao Yan, Peng Ye, Botian Shi, Yu Qiao, Junchi Yan

Charts are common in literature across different scientific fields, conveying rich information easily accessible to readers.

Ranked #19 on Chart Question Answering on ChartQA (using extra training data)

Chart Question Answering Language Modelling +2

Paper
Code

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

1 code implementation • 5 Oct 2023 • Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao

A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences.

Language Modelling Long Form Question Answering

Paper
Code

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

1 code implementation • 19 Feb 2024 • Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Large language models (LLMs) need to undergo safety alignment to ensure safe conversations with humans.

Language Modelling

Paper
Code

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

1 code implementation • 24 Jan 2018 • Zhe Wang, Xiaoyi Liu, Liangjian Chen, Li-Min Wang, Yu Qiao, Xiaohui Xie, Charless Fowlkes

Visual question answering (VQA) is of significant interest due to its potential to be a strong test of image understanding systems and to probe the connection between language and vision.

Multiple-choice POS +3

Paper
Code

PalGAN: Image Colorization with Palette Generative Adversarial Networks

1 code implementation • 20 Oct 2022 • Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao

Multimodal ambiguity and color bleeding remain challenging in colorization.

Colorization Image Colorization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.