Search Results for author: Jianlong Fu

Found 87 papers, 47 papers with code

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

no code implementations • 19 Mar 2024 • Wenjing Wang, Huan Yang, Jianlong Fu, Jiaying Liu

This prior serves as the bridge between normal and low-light images.

Decoder

Paper
Add Code

Learning Position-Aware Implicit Neural Network for Real-World Face Inpainting

no code implementations • 19 Jan 2024 • Bo Zhao, Huan Yang, Jianlong Fu

Face inpainting requires the model to have a precise global understanding of the facial position structure.

Decoder Facial Inpainting +1

Paper
Add Code

ViCo: Engaging Video Comment Generation with Human Preference Rewards

no code implementations • 22 Aug 2023 • Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.

Caption Generation Comment Generation +1

Paper
Add Code

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

no code implementations • ICCV 2023 • Seogkyu Jeon, Bei Liu, Pilhyeon Lee, Kibeom Hong, Jianlong Fu, Hyeran Byun

Due to the data absence, the textual description of the target domain and the vision-language models, e. g., CLIP, are utilized to effectively guide the generator.

Paper
Add Code

MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text

no code implementations • 31 Jul 2023 • Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo

In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.

Video Generation

Paper
Add Code

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

no code implementations • ICCV 2023 • Yi-Syuan Chen, Yun-Zhu Song, Cheng Yu Yeo, Bei Liu, Jianlong Fu, Hong-Han Shuai

To this end, we raise a question: ``How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?".

Hallucination In-Context Learning

Paper
Add Code

Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning

no code implementations • 20 Jun 2023 • Huiguo He, Tianfu Wang, Huan Yang, Jianlong Fu, Nicholas Jing Yuan, Jian Yin, Hongyang Chao, Qi Zhang

The proposed framework consists of a large language model (LLM), a diffusion-based image generator, and a series of visual rewards by design.

Attribute Image Generation +3

Paper
Add Code

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

no code implementations • 12 Jun 2023 • Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu

To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process.

Retrieval

Paper
Add Code

Transferring Foundation Models for Generalizable Robotic Manipulation

no code implementations • 9 Jun 2023 • Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, LiMin Wang

In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks.

Imitation Learning Object +1

Paper
Add Code

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

no code implementations • 30 May 2023 • Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, LiMin Wang, Jianlong Fu

We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks.

Robot Manipulation

Paper
Add Code

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

no code implementations • 24 May 2023 • Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu

Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks.

Efficient Exploration Image Super-Resolution

Paper
Add Code

Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

1 code implementation • 18 May 2023 • Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu

Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.

Ranked #1 on Text-to-Video Generation on WebVid

Text-to-Image Generation Text-to-Video Generation +2

Paper
Code

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.

Video Generation

Paper
Add Code

Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution

1 code implementation • ICCV 2023 • Zixi Tuo, Huan Yang, Jianlong Fu, Yujie Dun, Xueming Qian

Existing real-world video super-resolution (VSR) methods focus on designing a general degradation pipeline for open-domain videos while ignoring data intrinsic characteristics which strongly limit their performance when applying to some specific domains (eg., animation videos).

valid Video Super-Resolution

Paper
Code

Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation

no code implementations • 16 Mar 2023 • Yiyang Ma, Huan Yang, Wenjing Wang, Jianlong Fu, Jiaying Liu

Language-guided image generation has achieved great success nowadays by using diffusion models.

Conditional Image Generation Text-to-Image Generation

Paper
Add Code

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

1 code implementation • 27 Dec 2022 • Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu

Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos.

Ranked #2 on Video Super-Resolution on REDS4- 4x upscaling

Video Enhancement Video Super-Resolution

146

Paper
Code

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

1 code implementation • CVPR 2023 • Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.

Denoising FAD +1

339

Paper
Code

Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge

no code implementations • 22 Nov 2022 • Zhongwei Qiu, Kai Qiu, Jianlong Fu, Dongmei Fu

Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

1 code implementation • 12 Oct 2022 • Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.

Ranked #2 on Video Retrieval on QuerYD (using extra training data)

Contrastive Learning Question Answering +3

439

Paper
Code

Fine-Grained Image Style Transfer with Visual Transformers

1 code implementation • 11 Oct 2022 • Jianbo Wang, Huan Yang, Jianlong Fu, Toshihiko Yamasaki, Baining Guo

Such a design usually destroys the spatial information of the input images and fails to transfer fine-grained style patterns into style transfer results.

Style Transfer

Paper
Code

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

and 2) how to mitigate the impact of these factors?

Ranked #2 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

439

Paper
Code

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

1 code implementation • 7 Sep 2022 • Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu

To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN.

Image Generation

Paper
Code

4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement

no code implementations • 5 Sep 2022 • Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

In particular, we first introduce a lightweight context encoder and a parameter encoder to learn a context map for the pixel-level category and a group of image-adaptive coefficients, respectively.

Ranked #7 on Image Enhancement on MIT-Adobe 5k (SSIM on proRGB metric)

Image Enhancement

Paper
Add Code

Language-Guided Face Animation by Recurrent StyleGAN-based Generator

1 code implementation • 11 Aug 2022 • Tiankai Hang, Huan Yang, Bei Liu, Jianlong Fu, Xin Geng, Baining Guo

Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.

Image Manipulation

Paper
Code

Exploring Anchor-based Detection for Ego4D Natural Language Query

no code implementations • 10 Aug 2022 • Sipeng Zheng, Qi Zhang, Bei Liu, Qin Jin, Jianlong Fu

In this paper we provide the technique report of Ego4D natural language query challenge in CVPR 2022.

Video Understanding

Paper
Add Code

GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training

1 code implementation • 8 Aug 2022 • Jaeseok Byun, Taebaek Hwang, Jianlong Fu, Taesup Moon

In contrast to the mainstream VLP methods, we highlight that two routinely applied steps during pre-training have crucial impact on the performance of the pre-trained model: in-batch hard negative sampling for image-text matching (ITM) and assigning the large masking probability for the masked language modeling (MLM).

Image-text matching Language Modelling +2

Paper
Code

Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

1 code implementation • 5 Aug 2022 • Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu

First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band.

Ranked #3 on Video Super-Resolution on REDS4- 4x upscaling

Video Enhancement Video Super-Resolution

146

Paper
Code

Expanding Language-Image Pretrained Models for General Video Recognition

2 code implementations • 4 Aug 2022 • Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling

Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.

Ranked #8 on Zero-Shot Action Recognition on Kinetics

Action Classification Action Recognition +3

933

Paper
Code

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

2 code implementations • 21 Jul 2022 • Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.

Ranked #135 on Image Classification on ImageNet

Image Classification Knowledge Distillation

29,974

Paper
Code

TTVFI: Learning Trajectory-Aware Transformer for Video Frame Interpolation

no code implementations • 19 Jul 2022 • Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

In particular, we formulate the warped features with inconsistent motions as query tokens, and formulate relevant regions in a motion trajectory from two original consecutive frames into keys and values.

Video Frame Interpolation

Paper
Add Code

Degradation-Guided Meta-Restoration Network for Blind Super-Resolution

no code implementations • 3 Jul 2022 • Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu

The extractor estimates the degradations in LR inputs and guides the meta-restoration modules to predict restoration parameters for different degradations on-the-fly.

Blind Super-Resolution Image Restoration +1

Paper
Add Code

MiniViT: Compressing Vision Transformers with Weight Multiplexing

2 code implementations • CVPR 2022 • Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks.

Ranked #212 on Image Classification on ImageNet (using extra training data)

Image Classification

1,574

Paper
Code

Learning Trajectory-Aware Transformer for Video Super-Resolution

1 code implementation • CVPR 2022 • Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

Existing approaches usually align and aggregate video frames from limited adjacent frames (e. g., 5 or 7 frames), which prevents these approaches from satisfactory results.

Ranked #4 on Video Super-Resolution on UDM10 - 4x upscaling

Video Super-Resolution

190

Paper
Code

Searching the Search Space of Vision Transformer

2 code implementations • NeurIPS 2021 • Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling

Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures.

Neural Architecture Search object-detection +4

1,574

Paper
Code

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

1 code implementation • CVPR 2022 • Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Ranked #16 on Video Retrieval on MSR-VTT

Retrieval Super-Resolution +4

439

Paper
Code

Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

no code implementations • NeurIPS 2021 • Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu

Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer.

Image Generation

Paper
Add Code

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

1 code implementation • 19 Oct 2021 • Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images.

Paper
Code

Learning Fine-Grained Motion Embedding for Landscape Animation

no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

Paper
Add Code

Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

1 code implementation • ICCV 2021 • Heliang Zheng, Huan Yang, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

And the reference space is optimized to capture deep image priors that are useful for quality assessment.

Image Quality Assessment Image Restoration +1

Paper
Code

Domain-Aware Universal Style Transfer

1 code implementation • ICCV 2021 • Kibeom Hong, Seogkyu Jeon, Huan Yang, Jianlong Fu, Hyeran Byun

To this end, we design a novel domainness indicator that captures the domainness value from the texture and structural features of reference images.

Style Transfer

101

Paper
Code

Reference-based Defect Detection Network

no code implementations • 10 Aug 2021 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao

To solve the partial visual confusion issue, we propose to leverage the carried context information of context reference, which is the concentric bigger box of each region proposal, to perform more accurate region classification and regression.

Defect Detection object-detection +2

Paper
Add Code

Rethinking and Improving Relative Position Encoding for Vision Transformer

1 code implementation • ICCV 2021 • Kan Wu, Houwen Peng, Minghao Chen, Jianlong Fu, Hongyang Chao

We then propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE).

Ranked #140 on Object Detection on COCO minival

Image Classification Object Detection +1

1,574

Paper
Code

AutoFormer: Searching Transformers for Visual Recognition

2 code implementations • ICCV 2021 • Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling

Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch.

Ranked #1 on Fine-Grained Image Classification on Oxford 102 Flowers (Top 1 Accuracy metric)

AutoML Fine-Grained Image Classification

1,574

Paper
Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +5

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +3

Paper
Add Code

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

1 code implementation • CVPR 2021 • Bin Yan, Houwen Peng, Kan Wu, Dong Wang, Jianlong Fu, Huchuan Lu

Object tracking has achieved significant progress over the past few years.

Neural Architecture Search Object +1

383

Paper
Code

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

3 code implementations • CVPR 2021 • Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Ranked #5 on Visual Entailment on SNLI-VE val

Representation Learning Retrieval +3

206

Paper
Code

3D Human Body Reshaping with Anthropometric Modeling

1 code implementation • 5 Apr 2021 • Yanhong Zeng, Jianlong Fu, Hongyang Chao

First, we calculate full-body anthropometric parameters from limited user inputs by imputation technique, and thus essential anthropometric parameters for 3D body reshaping can be obtained.

feature selection Imputation +1

334

Paper
Code

Aggregated Contextual Transformations for High-Resolution Image Inpainting

2 code implementations • 3 Apr 2021 • Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.

Ranked #9 on Image Inpainting on Places2

Image Inpainting Texture Synthesis +1

4,327

Paper
Code

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

1 code implementation • CVPR 2021 • Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling

In this paper, we propose a one-shot neural ensemble architecture search (NEAS) solution that addresses the two challenges.

Neural Architecture Search

Paper
Code

Learning Spatio-Temporal Transformer for Visual Tracking

1 code implementation • ICCV 2021 • Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu

In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component.

Ranked #19 on Visual Object Tracking on TrackingNet

Decoder Visual Object Tracking +1

617

Paper
Code

Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language

1 code implementation • 4 Dec 2020 • Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, Jiebo Luo

It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.

934

Paper
Code

Learning Semantic-aware Normalization for Generative Adversarial Networks

1 code implementation • NeurIPS 2020 • Heliang Zheng, Jianlong Fu, Yanhong Zeng, Jiebo Luo, Zheng-Jun Zha

Such a model disentangles latent factors according to the semantic of feature channels by channel-/group- wise fusion of latent codes and feature channels.

Image Inpainting Unconditional Image Generation

Paper
Code

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

2 code implementations • NeurIPS 2020 • Houwen Peng, Hao Du, Hongyuan Yu, Qi Li, Jing Liao, Jianlong Fu

The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks.

Neural Architecture Search object-detection +1

1,574

Paper
Code

Revisiting Anchor Mechanisms for Temporal Action Localization

1 code implementation • 22 Aug 2020 • Le Yang, Houwen Peng, Dingwen Zhang, Jianlong Fu, Junwei Han

To address this problem, this paper proposes a novel anchor-free action localization module that assists action localization by temporal points.

Temporal Action Localization

Paper
Code

Learning Joint Spatial-Temporal Transformations for Video Inpainting

2 code implementations • ECCV 2020 • Yanhong Zeng, Jianlong Fu, Hongyang Chao

In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.

Ranked #5 on Seeing Beyond the Visible on KITTI360-EX

Seeing Beyond the Visible Video Inpainting

435

Paper
Code

Ocean: Object-aware Anchor-free Tracking

4 code implementations • ECCV 2020 • Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, Weiming Hu

In this paper, we propose a novel object-aware anchor-free network to address this issue.

Ranked #2 on Visual Object Tracking on VOT2019

Object Visual Object Tracking

749

Paper
Code

Cyclic Differentiable Architecture Search

3 code implementations • 18 Jun 2020 • Hongyuan Yu, Houwen Peng, Yan Huang, Jianlong Fu, Hao Du, Liang Wang, Haibin Ling

First, the search network generates an initial architecture for evaluation, and the weights of the evaluation network are optimized.

Ranked #17 on Neural Architecture Search on NAS-Bench-201, CIFAR-10

Neural Architecture Search

1,574

Paper
Code

Learning Texture Transformer Network for Image Super-Resolution

1 code implementation • CVPR 2020 • Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, Baining Guo

In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively.

Hard Attention Image Generation +2

751

Paper
Code

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

no code implementations • 3 May 2020 • Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, Huanrong Zhang, Jeongki Min, Hrishikesh P. S, Densen Puthussery, Jiji C. V

This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results.

Image Super-Resolution

Paper
Add Code

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

1 code implementation • 2 Apr 2020 • Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu

We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks.

Image-text matching Language Modelling +7

439

Paper
Code

Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language

3 code implementations • 8 Dec 2019 • Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo

We address the problem of retrieving a specific moment from an untrimmed video by a query sentence.

Sentence

934

Paper
Code

Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization

2 code implementations • 8 Dec 2019 • Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, Jiebo Luo

In this report, we introduce the Winner method for HACS Temporal Action Localization Challenge 2019.

Temporal Action Localization

934

Paper
Code

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences

no code implementations • 24 Nov 2019 • Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou

A storyboard is a sequence of images to illustrate a story containing multiple sentences, which has been a key process to create different story products.

Paper
Add Code

Learning Deep Bilinear Transformation for Fine-grained Image Representation

1 code implementation • NeurIPS 2019 • Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks.

Fine-Grained Image Recognition

105

Paper
Code

Learning Rich Image Region Representation for Visual Question Answering

no code implementations • 29 Oct 2019 • Bei Liu, Zhicheng Huang, Zhaoyang Zeng, Zheyu Chen, Jianlong Fu

We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models.

Language Modelling Question Answering +1

Paper
Add Code

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

no code implementations • 3 Oct 2019 • Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu

In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.

Object object-detection +1

Paper
Add Code

WSOD2: Learning Bottom-up and Top-down Objectness Distillation forWeakly-supervised Object Detection

no code implementations • ICCV 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD)which plays a vital role in relieving human involvement fromobject-level annotations.

Ranked #6 on Weakly Supervised Object Detection on PASCAL VOC 2007

Object object-detection +2

Paper
Add Code

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

1 code implementation • 11 Sep 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.

Object object-detection +3

Paper
Code

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

no code implementations • ICCV 2019 • Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.

Crowd Counting Density Estimation

Paper
Add Code

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations • 11 Jul 2019 • Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning

Paper
Add Code

From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots

no code implementations • 3 Jun 2019 • Shizhe Chen, Qin Jin, Jianlong Fu

However, a picture tells a thousand words, which makes multi-lingual sentences pivoted by the same image noisy as mutual translations and thus hinders the translation model learning.

Machine Translation Sentence +2

Paper
Add Code

Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting

2 code implementations • CVPR 2019 • Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured.

Decoder Image Inpainting +1

350

Paper
Code

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition

1 code implementation • CVPR 2019 • Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

Learning subtle yet discriminative features (e. g., beak and eyes for a bird) plays a significant role in fine-grained image recognition.

Ranked #1 on Fine-Grained Image Classification on iNaturalist

Fine-Grained Image Classification Fine-Grained Image Recognition

219

Paper
Code

Deep Attention Neural Tensor Network for Visual Question Answering

no code implementations • ECCV 2018 • Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei

First, we model one of the pairwise interaction (e. g., image and question) by bilinear features, which is further encoded with the third dimension (e. g., answer) to be a triplet by bilinear tensor product.

Deep Attention Question Answering +1

Paper
Add Code

Image Inspired Poetry Generation in XiaoIce

no code implementations • 9 Aug 2018 • Wen-Feng Cheng, Chao-Chung Wu, Ruihua Song, Jianlong Fu, Xing Xie, Jian-Yun Nie

This is one of the few attempts to generate poetry from images.

Paper
Add Code

DA-GAN: Instance-Level Image Translation by Deep Attention Generative Adversarial Networks

no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.

Data Augmentation Deep Attention +2

Paper
Add Code

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

3 code implementations • 23 Apr 2018 • Bei Liu, Jianlong Fu, Makoto P. Kato, Masatoshi Yoshikawa

Extensive experiments are conducted with 8K images, among which 1. 5K image are randomly picked for evaluation.

282

Paper
Code

DA-GAN: Instance-level Image Translation by Deep Attention Generative Adversarial Networks (with Supplementary Materials)

no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei

Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.

Data Augmentation Deep Attention +2

Paper
Add Code

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

no code implementations • EMNLP 2018 • Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo

Most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer.

Attribute Image Captioning +2

Paper
Add Code

Self-view Grounding Given a Narrated 360° Video

1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Sentence Visual Grounding

Paper
Code

Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition

3 code implementations • ICCV 2017 • Heliang Zheng, Jianlong Fu, Tao Mei, Jiebo Luo

Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way.

Ranked #22 on Fine-Grained Image Classification on CUB-200-2011

Clustering Fine-Grained Image Classification +3

Paper
Code

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

no code implementations • CVPR 2017 • Jianlong Fu, Heliang Zheng, Tao Mei

The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN).

Fine-Grained Image Classification Fine-Grained Image Recognition +1

Paper
Add Code

Multi-Level Attention Networks for Visual Question Answering

no code implementations • CVPR 2017 • Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui

To solve the challenges, we propose a multi-level attention network for visual question answering that can simultaneously reduce the semantic gap by semantic attention and benefit fine-grained spatial inference by visual attention.

Question Answering Visual Question Answering

Paper
Add Code

Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

1 code implementation • ICCV 2017 • Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun

The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain.

Image Captioning Sentence +1

148

Paper
Code

Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network

no code implementations • 2 Jun 2016 • Yu Liu, Jianlong Fu, Tao Mei, Chang Wen Chen

Second, by using sGRU as basic units, the BMRNN is trained to align the local storylines into the global sequential timeline.

Video Captioning Visual Storytelling

Paper
Add Code

Relaxing From Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging

no code implementations • ICCV 2015 • Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, Yong Rui

The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.