Search Results for author: Li Yuan

Found 25 papers, 13 papers with code

Improving Vision Transformers by Revisiting High-frequency Components

no code implementations3 Apr 2022 Jiawang Bai, Li Yuan, Shu-Tao Xia, Shuicheng Yan, Zhifeng Li, Wei Liu

The transformer models have shown promising effectiveness in dealing with various vision tasks.

Masked Autoencoders for Point Cloud Self-supervised Learning

1 code implementation13 Mar 2022 Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yonghong Tian, Li Yuan

Then, a standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches, aiming to reconstruct the masked point patches.

Ranked #3 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Part Segmentation 3D Point Cloud Classification +1

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

1 code implementation28 Jan 2022 Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu

In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.

Image Classification

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

no code implementations17 Dec 2021 Guangyan Chen, Meiling Wang, Yufeng Yue, Qingxiang Zhang, Li Yuan

Recent Transformer-based methods have achieved advanced performance in point cloud registration by utilizing advantages of the Transformer in order-invariance and modeling dependency to aggregate information.

Geometric Matching Point Cloud Registration

PnP-DETR: Towards Efficient Visual Analysis with Transformers

1 code implementation ICCV 2021 Tao Wang, Li Yuan, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result.

Object Detection Panoptic Segmentation

VOLO: Vision Outlooker for Visual Recognition

5 code implementations24 Jun 2021 Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan

Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.

Image Classification Semantic Segmentation

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition

1 code implementation23 Jun 2021 Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng

By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

2 code implementations31 Mar 2021 Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks.

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

10 code implementations ICCV 2021 Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan

To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.

Image Classification Language Modelling

Fooling the primate brain with minimal, targeted image manipulation

no code implementations11 Nov 2020 Li Yuan, Will Xiao, Giorgia Dellaferrera, Gabriel Kreiman, Francis E. H. Tay, Jiashi Feng, Margaret S. Livingstone

Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.

Adversarial Attack Image Manipulation

Towards Accurate Human Pose Estimation in Videos of Crowded Scenes

no code implementations16 Oct 2020 Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.

Frame Optical Flow Estimation +1

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes

no code implementations16 Oct 2020 Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yunpeng Chen, Xuecheng Nie, Tao Wang, Jiashi Feng, Shuicheng Yan

Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes.

Action Recognition Action Recognition In Videos +3

A Simple Baseline for Pose Tracking in Videos of Crowded Scenes

no code implementations16 Oct 2020 Li Yuan, Shuning Chang, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Xuecheng Nie, Francis E. H. Tay, Jiashi Feng, Shuicheng Yan

This paper presents our solution to ACM MM challenge: Large-scale Human-centric Video Analysis in Complex Events\cite{lin2020human}; specifically, here we focus on Track3: Crowd Pose Tracking in Complex Events.

Multi-Object Tracking Optical Flow Estimation +1

Exploring global diverse attention via pairwise temporal relation for video summarization

no code implementations23 Sep 2020 Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao

In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames.

Frame Video Summarization

YNU-HPCC at SemEval-2020 Task 8: Using a Parallel-Channel Model for Memotion Analysis

1 code implementation SEMEVAL 2020 Li Yuan, Jin Wang, Xue-jie Zhang

In recent years, the growing ubiquity of Internet memes on social media platforms, such as Facebook, Instagram, and Twitter, has become a topic of immense interest.

Emotion Recognition Sentiment Analysis +1

Revisiting Knowledge Distillation via Label Smoothing Regularization

2 code implementations CVPR 2020 Li Yuan, Francis E. H. Tay, Guilin Li, Tao Wang, Jiashi Feng

Without any extra computation cost, Tf-KD achieves up to 0. 65\% improvement on ImageNet over well-established baseline models, which is superior to label smoothing regularization.

Self-Knowledge Distillation

Central Similarity Quantization for Efficient Image and Video Retrieval

1 code implementation CVPR 2020 Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng

In this work, we propose a new \emph{global} similarity metric, termed as \emph{central similarity}, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy.

Quantization Video Retrieval

Distilling Object Detectors with Fine-grained Feature Imitation

3 code implementations CVPR 2019 Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng

To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.

Knowledge Distillation Object Detection

Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization

no code implementations17 Apr 2019 Li Yuan, Francis EH Tay, Ping Li, Li Zhou, Jiashi Feng

The evaluator defines a learnable information preserving metric between original video and summary video and "supervises" the selector to identify the most informative frames to form the summary video.

Frame Unsupervised Video Summarization

Few-shot Adaptive Faster R-CNN

no code implementations CVPR 2019 Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng

To address these challenges, we first introduce a pairing mechanism over source and target features to alleviate the issue of insufficient target domain samples.

Object Detection Unsupervised Domain Adaptation

The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix

no code implementations11 Mar 2019 Žiga Emeršič, Aruna Kumar S. V., B. S. Harish, Weronika Gutfeter, Jalil Nourmohammadi Khiarak, Andrzej Pacut, Earnest Hansley, Mauricio Pamplona Segundo, Sudeep Sarkar, Hyeonjung Park, Gi Pyo Nam, Ig-Jae Kim, Sagar G. Sangodkar, Ümit Kaçar, Murvet Kirci, Li Yuan, Jishou Yuan, Haonan Zhao, Fei Lu, Junying Mao, Xiaoshuang Zhang, Dogucan Yaman, Fevziye Irem Eyiokur, Kadir Bulut Özler, Hazim Kemal Ekenel, Debbrota Paul Chowdhury, Sambit Bakshi, Pankaj K. Sa, Banshidhar Majhi, Peter Peer, Vitomir Štruc

The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i. e. gender and ethnicity.

Person Recognition

Object Relation Detection Based on One-shot Learning

no code implementations16 Jul 2018 Li Zhou, Jian Zhao, Jianshu Li, Li Yuan, Jiashi Feng

Detecting the relations among objects, such as "cat on sofa" and "person ride horse", is a crucial task in image understanding, and beneficial to bridging the semantic gap between images and natural language.

One-Shot Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.