Search Results for author: Dongliang He

Found 44 papers, 24 papers with code

Vision Transformer with Attention Map Hallucination and FFN Compaction

no code implementations • 19 Jun 2023 • Haiyang Xu, Zhichao Zhou, Dongliang He, Fu Li, Jingdong Wang

Vision Transformer(ViT) is now dominating many vision tasks.

Paper
Add Code

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

no code implementations • CVPR 2023 • Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.

Paper
Add Code

Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer

no code implementations • CVPR 2023 • Hao Tang, Songhua Liu, Tianwei Lin, Shaoli Huang, Fu Li, Dongliang He, Xinchao Wang

On the other hand, different from the vanilla version, we adopt a learnable scaling operation on content features before content-style feature interaction, which better preserves the original similarity between a pair of content features while ensuring the stylization quality.

Meta-Learning Style Transfer

Paper
Add Code

DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

1 code implementation • CVPR 2023 • Yueming Lyu, Tianwei Lin, Fu Li, Dongliang He, Jing Dong, Tieniu Tan

Our key idea is to investigate and identify a space, namely delta image and text space that has well-aligned distribution between CLIP visual feature differences of two images and CLIP textual embedding differences of source and target texts.

Image Manipulation

Paper
Code

LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution

1 code implementation • ICCV 2023 • Lin Zhang, Xin Li, Dongliang He, Errui Ding, Zhaoxiang Zhang

To this end, we construct a large-scale, multi-reference super-resolution dataset, named LMR.

feature selection Image Super-Resolution +1

Paper
Code

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

no code implementations • 3 Dec 2022 • Tianwei Lin, Honglin Lin, Fu Li, Dongliang He, Wenhao Wu, Meiling Wang, Xin Li, Yong liu

Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair.

4k Style Transfer

Paper
Add Code

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

1 code implementation • 22 Nov 2022 • Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang

While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage.

Talking Face Generation

820

Paper
Code

RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection

no code implementations • 8 Nov 2022 • Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhaoxiang Zhang

While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as.

feature selection Image Super-Resolution

Paper
Add Code

It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training

no code implementations • 11 Oct 2022 • Yuxin Song, Min Yang, Wenhao Wu, Dongliang He, Fu Li, Jingdong Wang

In order to guide the encoder to fully excavate spatial-temporal features, two separate decoders are used for two pretext tasks of disentangled appearance and motion prediction.

motion prediction

Paper
Add Code

Effective Invertible Arbitrary Image Rescaling

no code implementations • 26 Sep 2022 • Zhihong Pan, Baopu Li, Dongliang He, Wenhao Wu, Errui Ding

To increase its real world applicability, numerous models have also been proposed to restore SR images with arbitrary scale factors, including asymmetric ones where images are resized to different scales along horizontal and vertical directions.

Image Super-Resolution

Paper
Add Code

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

3 code implementations • 23 Aug 2022 • Ren Yang, Radu Timofte, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu Li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei LI, Jingzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota, Marco Buzzelli, Simone Bianco, Raimondo Schettini, Dafeng Zhang, Feiyu Huang, Shizhuo Liu, Xiaobing Wang, Zhezhu Jin, Bingchen Li, Xin Li, Mingxi Li, Ding Liu, Wenbin Zou, Peijie Dong, Tian Ye, Yunchen Zhang, Ming Tan, Xin Niu, Mustafa Ayazoglu, Marcos Conde, Ui-Jin Choi, Zhuang Jia, Tianyu Xu, Yijian Zhang, Mao Ye, Dengyan Luo, Xiaofeng Pan, Liuhan Peng

The homepage of this challenge is at https://github. com/RenYang-home/AIM22_CompressSR.

Super-Resolution

526

Paper
Code

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

no code implementations • 21 Aug 2022 • Haoran Wang, Dongliang He, Wenhao Wu, Boyang xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang

We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.

Clustering Contrastive Learning +4

Paper
Add Code

Boosting Video-Text Retrieval with Explicit High-Level Semantics

no code implementations • 8 Aug 2022 • Haoran Wang, Di Xu, Dongliang He, Fu Li, Zhong Ji, Jungong Han, Errui Ding

Video-text retrieval (VTR) is an attractive yet challenging task for multi-modal understanding, which aims to search for relevant video (text) given a query (video).

Retrieval Text Retrieval +3

Paper
Add Code

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations • 21 Jul 2022 • Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Ranked #4 on Action Recognition on ActivityNet

Action Recognition Video Classification +1

Paper
Add Code

Neural Color Operators for Sequential Image Retouching

2 code implementations • 17 Jul 2022 • Yili Wang, Xin Li, Kun Xu, Dongliang He, Qi Zhang, Fu Li, Errui Ding

The neural color operator mimics the behavior of traditional color operators and learns pixelwise color transformation while its strength is controlled by a scalar.

Image Enhancement Image Retouching

Paper
Code

DALG: Deep Attentive Local and Global Modeling for Image Retrieval

no code implementations • 1 Jul 2022 • Yuxin Song, Ruolin Zhu, Min Yang, Dongliang He

Deeply learned representations have achieved superior image retrieval performance in a retrieve-then-rerank manner.

Image Retrieval Representation Learning +1

Paper
Add Code

Adversarial Dual-Student with Differentiable Spatial Warping for Semi-Supervised Semantic Segmentation

1 code implementation • 5 Mar 2022 • Cong Cao, Tianwei Lin, Dongliang He, Fu Li, Huanjing Yue, Jingyu Yang, Errui Ding

The perturbations for unlabeled data enable the consistency training loss, which benefits semi-supervised semantic segmentation.

Data Augmentation Pseudo Label +2

Paper
Code

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

no code implementations • CVPR 2022 • Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding

Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel.

Image Super-Resolution

Paper
Add Code

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model

1 code implementation • CVPR 2022 • Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc van Gool, Errui Ding

We propose a novel framework, i. e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations.

Image Manipulation Language Modelling

Paper
Code

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

2 code implementations • ICCV 2021 • Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Ruifeng Deng, Xin Li, Errui Ding, Hao Wang

Neural painting refers to the procedure of producing a series of strokes for a given image and non-photo-realistically recreating it using neural networks.

Ranked #1 on Object Detection on A2D

Object Detection Reinforcement Learning (RL) +1

478

Paper
Code

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer

3 code implementations • ICCV 2021 • Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Meiling Wang, Xin Li, Zhengxing Sun, Qian Li, Errui Ding

Finally, the content feature is normalized so that they demonstrate the same local feature statistics as the calculated per-point weighted style feature statistics.

Style Transfer Video Style Transfer

197

Paper
Code

DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

5 code implementations • ICCV 2021 • Min Yang, Dongliang He, Miao Fan, Baorong Shi, Xuetong Xue, Fu Li, Errui Ding, Jizhou Huang

Components orthogonal to the global image representation are then extracted from the local information.

Image Retrieval Representation Learning +1

126

Paper
Code

Color2Embed: Fast Exemplar-Based Image Colorization using Color Embeddings

3 code implementations • 15 Jun 2021 • Hengyuan Zhao, Wenhao Wu, Yihao Liu, Dongliang He

In this paper, we present a fast exemplar-based image colorization approach using color embeddings named Color2Embed.

Colorization Image Colorization +1

Paper
Code

ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency

no code implementations • ICCV 2021 • Deng Huang, Wenhao Wu, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding

Specifically, we propose two tasks to learn the appearance and speed consistency, respectively.

Action Recognition Representation Learning +2

Paper
Add Code

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

1 code implementation • 25 May 2021 • Wenhao Wu, Yuxiang Zhao, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, YingYing Li, Mingde Yao, ZiChao Dong, Yifeng Shi

Long-range and short-range temporal modeling are two complementary and crucial aspects of video recognition.

Ranked #6 on Action Recognition on ActivityNet

Action Recognition Long-range modeling +2

Paper
Code

Image Inpainting by End-to-End Cascaded Refinement with Mask Awareness

1 code implementation • 28 Apr 2021 • Manyu Zhu, Dongliang He, Xin Li, Chao Li, Fu Li, Xiao Liu, Errui Ding, Zhaoxiang Zhang

Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial.

Ranked #4 on Image Inpainting on CelebA-HQ

Image Inpainting valid

Paper
Code

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

1 code implementation • 21 Apr 2021 • Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng Li, Thomas Tanay, Fenglong Song, Wentao Chao, Qiang Guo, Yan Liu, Jiang Li, Xiaochao Qu, Dewang Hou, Jiayu Yang, Lyn Jiang, Di You, Zhenyu Zhang, Chong Mou, Iaroslav Koshelev, Pavel Ostyakov, Andrey Somov, Jia Hao, Xueyi Zou, Shijie Zhao, Xiaopeng Sun, Yiting Liao, Yuanzhi Zhang, Qing Wang, Gen Zhan, Mengxi Guo, Junlin Li, Ming Lu, Zhan Ma, Pablo Navarrete Michelini, Hai Wang, Yiyun Chen, Jingyu Guo, Liliang Zhang, Wenming Yang, Sijung Kim, Syehoon Oh, Yucong Wang, Minjie Cai, Wei Hao, Kangdi Shi, Liangyan Li, Jun Chen, Wei Gao, Wang Liu, XiaoYu Zhang, Linjie Zhou, Sixin Lin, Ru Wang

This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results.

Paper
Code

Learning Semantic Person Image Generation by Region-Adaptive Normalization

1 code implementation • CVPR 2021 • Zhengyao Lv, Xiaoming Li, Xin Li, Fu Li, Tianwei Lin, Dongliang He, WangMeng Zuo

In the first stage, we predict the target semantic parsing maps to eliminate the difficulties of pose transfer and further benefit the latter translation of per-region appearance style.

Pose Transfer Semantic Parsing +1

Paper
Code

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

2 code implementations • CVPR 2021 • Tianwei Lin, Zhuoqi Ma, Fu Li, Dongliang He, Xin Li, Errui Ding, Nannan Wang, Jie Li, Xinbo Gao

Inspired by the common painting process of drawing a draft and revising the details, we introduce a novel feed-forward method named Laplacian Pyramid Network (LapStyle).

Style Transfer

7,679

Paper
Code

Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones

2 code implementations • 10 Mar 2021 • Cheng Cui, Ruoyu Guo, Yuning Du, Dongliang He, Fu Li, Zewu Wu, Qiwen Liu, Shilei Wen, Jizhou Huang, Xiaoguang Hu, dianhai yu, Errui Ding, Yanjun Ma

Recently, research efforts have been concentrated on revealing how pre-trained model makes a difference in neural network performance.

Knowledge Distillation object-detection +3

5,257

Paper
Code

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

3 code implementations • 13 Dec 2020 • Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance.

Ranked #33 on Action Recognition on Something-Something V1

Action Classification Action Recognition +2

140

Paper
Code

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

1 code implementation • 27 Oct 2020 • Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition.

Ranked #11 on Self-Supervised Action Recognition on UCF101

Representation Learning Retrieval +2

Paper
Code

HS-ResNet: Hierarchical-Split Block on Convolutional Neural Network

2 code implementations • 15 Oct 2020 • Pengcheng Yuan, Shufei Lin, Cheng Cui, Yuning Du, Ruoyu Guo, Dongliang He, Errui Ding, Shumin Han

Moreover, Hierarchical-Split block is very flexible and efficient, which provides a large space of potential network architectures for different applications.

Image Classification Image Segmentation +5

5,257

Paper
Code

NTIRE 2020 Challenge on Video Quality Mapping: Methods and Results

no code implementations • 5 May 2020 • Dario Fuoli, Zhiwu Huang, Martin Danelljan, Radu Timofte, Hua Wang, Longcun Jin, Dewei Su, Jing Liu, Jaehoon Lee, Michal Kudelski, Lukasz Bala, Dmitry Hrybov, Marcin Mozejko, Muchen Li, Si-Yao Li, Bo Pang, Cewu Lu, Chao Li, Dongliang He, Fu Li, Shilei Wen

For track 2, some existing methods are evaluated, showing promising solutions to the weakly-supervised video quality mapping problem.

Paper
Add Code

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

no code implementations • 3 May 2020 • Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, Huanrong Zhang, Jeongki Min, Hrishikesh P. S, Densen Puthussery, Jiji C. V

This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results.

Image Super-Resolution

Paper
Add Code

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

no code implementations • 9 Feb 2020 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Yi Yang, Shilei Wen

In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Multi-Label Classification with Label Graph Superimposing

2 code implementations • 21 Nov 2019 • Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, Shilei Wen

In this paper, we propose a label graph superimposing framework to improve the conventional GCN+CNN framework developed for multi-label recognition in the following two aspects.

Ranked #28 on Multi-Label Classification on MS-COCO

Attribute Classification +3

461

Paper
Code

TruNet: Short Videos Generation from Long Videos via Story-Preserving Truncation

no code implementations • 14 Oct 2019 • Fan Yang, Xiao Liu, Dongliang He, Chuang Gan, Jian Wang, Chao Li, Fu Li, Shilei Wen

In this work, we introduce a new problem, named as {\em story-preserving long video truncation}, that requires an algorithm to automatically truncate a long-duration video into multiple short and attractive sub-videos with each one containing an unbroken story.

Highlight Detection Video Summarization

Paper
Add Code

Deep Concept-wise Temporal Convolutional Networks for Action Localization

2 code implementations • 26 Aug 2019 • Xin Li, Tianwei Lin, Xiao Liu, Chuang Gan, WangMeng Zuo, Chao Li, Xiang Long, Dongliang He, Fu Li, Shilei Wen

In this paper, we empirically find that stacking more conventional temporal convolution layers actually deteriorates action classification performance, possibly ascribing to that all channels of 1D feature map, which generally are highly abstract and can be regarded as latent concepts, are excessively recombined in temporal convolution.

Action Classification Action Localization

6,869

Paper
Code

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

no code implementations • ICCV 2019 • Wenhao Wu, Dongliang He, Xiao Tan, Shifeng Chen, Shilei Wen

Video Recognition has drawn great research interest and great progress has been made.

Ranked #7 on Action Recognition on ActivityNet

Action Recognition General Classification +5

Paper
Add Code

Adapting Image Super-Resolution State-of-the-arts and Learning Multi-model Ensemble for Video Super-Resolution

no code implementations • 7 May 2019 • Chao Li, Dongliang He, Xiao Liu, Yukang Ding, Shilei Wen

Recently, image super-resolution has been widely studied and achieved significant progress by leveraging the power of deep convolutional neural networks.

Image Super-Resolution Video Super-Resolution

Paper
Add Code

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

1 code implementation • 21 Jan 2019 • Dongliang He, Xiang Zhao, Jizhou Huang, Fu Li, Xiao Liu, Shilei Wen

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos.

Decision Making Multi-Task Learning +3

Paper
Code

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

8 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen

In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.

Action Recognition Temporal Action Localization

334

Paper
Code

Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition

no code implementations • 27 Jun 2018 • Dongliang He, Fu Li, Qijie Zhao, Xiang Long, Yi Fu, Shilei Wen

In this challenge, we propose spatial-temporal network (StNet) for better joint spatial-temporal modelling and comprehensively video understanding.

Action Recognition Temporal Action Localization +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.