no code implementations • 27 May 2024 • Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks.
no code implementations • CVPR 2024 • Wenjing Wang, Huan Yang, Jianlong Fu, Jiaying Liu
This prior serves as the bridge between normal and low-light images.
no code implementations • 19 Jan 2024 • Bo Zhao, Huan Yang, Jianlong Fu
Face inpainting requires the model to have a precise global understanding of the facial position structure.
no code implementations • 22 Aug 2023 • Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu
Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.
no code implementations • ICCV 2023 • Seogkyu Jeon, Bei Liu, Pilhyeon Lee, Kibeom Hong, Jianlong Fu, Hyeran Byun
Due to the data absence, the textual description of the target domain and the vision-language models, e. g., CLIP, are utilized to effectively guide the generator.
no code implementations • 31 Jul 2023 • Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo
In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.
no code implementations • ICCV 2023 • Yi-Syuan Chen, Yun-Zhu Song, Cheng Yu Yeo, Bei Liu, Jianlong Fu, Hong-Han Shuai
To this end, we raise a question: ``How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?".
no code implementations • 20 Jun 2023 • Huiguo He, Tianfu Wang, Huan Yang, Jianlong Fu, Nicholas Jing Yuan, Jian Yin, Hongyang Chao, Qi Zhang
The proposed framework consists of a large language model (LLM), a diffusion-based image generator, and a series of visual rewards by design.
no code implementations • 12 Jun 2023 • Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu
To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process.
no code implementations • 9 Jun 2023 • Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, LiMin Wang
In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks.
no code implementations • 30 May 2023 • Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, LiMin Wang, Jianlong Fu
We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks.
no code implementations • 24 May 2023 • Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu
Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks.
1 code implementation • 18 May 2023 • Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu
Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.
Ranked #1 on Text-to-Video Generation on WebVid
no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan
In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.
1 code implementation • ICCV 2023 • Zixi Tuo, Huan Yang, Jianlong Fu, Yujie Dun, Xueming Qian
Existing real-world video super-resolution (VSR) methods focus on designing a general degradation pipeline for open-domain videos while ignoring data intrinsic characteristics which strongly limit their performance when applying to some specific domains (eg., animation videos).
no code implementations • 16 Mar 2023 • Yiyang Ma, Huan Yang, Wenjing Wang, Jianlong Fu, Jiaying Liu
Language-guided image generation has achieved great success nowadays by using diffusion models.
1 code implementation • 27 Dec 2022 • Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos.
Ranked #4 on Video Super-Resolution on REDS4- 4x upscaling
1 code implementation • CVPR 2023 • Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo
To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.
no code implementations • 22 Nov 2022 • Zhongwei Qiu, Kai Qiu, Jianlong Fu, Dongmei Fu
Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.
1 code implementation • 12 Oct 2022 • Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu
Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.
Ranked #2 on Video Retrieval on QuerYD (using extra training data)
1 code implementation • 11 Oct 2022 • Jianbo Wang, Huan Yang, Jianlong Fu, Toshihiko Yamasaki, Baining Guo
Such a design usually destroys the spatial information of the input images and fails to transfer fine-grained style patterns into style transfer results.
1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo
and 2) how to mitigate the impact of these factors?
Ranked #2 on Video Retrieval on MSR-VTT-1kA (using extra training data)
1 code implementation • 7 Sep 2022 • Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu
To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN.
no code implementations • 5 Sep 2022 • Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
In particular, we first introduce a lightweight context encoder and a parameter encoder to learn a context map for the pixel-level category and a group of image-adaptive coefficients, respectively.
Ranked #7 on Image Enhancement on MIT-Adobe 5k (SSIM on proRGB metric)
1 code implementation • 11 Aug 2022 • Tiankai Hang, Huan Yang, Bei Liu, Jianlong Fu, Xin Geng, Baining Guo
Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.
no code implementations • 10 Aug 2022 • Sipeng Zheng, Qi Zhang, Bei Liu, Qin Jin, Jianlong Fu
In this paper we provide the technique report of Ego4D natural language query challenge in CVPR 2022.
1 code implementation • 8 Aug 2022 • Jaeseok Byun, Taebaek Hwang, Jianlong Fu, Taesup Moon
In contrast to the mainstream VLP methods, we highlight that two routinely applied steps during pre-training have crucial impact on the performance of the pre-trained model: in-batch hard negative sampling for image-text matching (ITM) and assigning the large masking probability for the masked language modeling (MLM).
1 code implementation • 5 Aug 2022 • Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu
First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band.
Ranked #5 on Video Super-Resolution on REDS4- 4x upscaling
2 code implementations • 4 Aug 2022 • Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
Ranked #9 on Zero-Shot Action Recognition on Kinetics
2 code implementations • 21 Jul 2022 • Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan
It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.
Ranked #135 on Image Classification on ImageNet
no code implementations • 19 Jul 2022 • Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
In particular, we formulate the warped features with inconsistent motions as query tokens, and formulate relevant regions in a motion trajectory from two original consecutive frames into keys and values.
no code implementations • 3 Jul 2022 • Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu
The extractor estimates the degradations in LR inputs and guides the meta-restoration modules to predict restoration parameters for different degradations on-the-fly.
2 code implementations • CVPR 2022 • Jinnian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan
The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks.
Ranked #213 on Image Classification on ImageNet
1 code implementation • CVPR 2022 • Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
Existing approaches usually align and aggregate video frames from limited adjacent frames (e. g., 5 or 7 frames), which prevents these approaches from satisfactory results.
Ranked #4 on Video Super-Resolution on UDM10 - 4x upscaling
2 code implementations • NeurIPS 2021 • Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures.
1 code implementation • CVPR 2022 • Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo
To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.
Ranked #16 on Video Retrieval on MSR-VTT
no code implementations • NeurIPS 2021 • Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu
Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer.
1 code implementation • 19 Oct 2021 • Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu
In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images.
no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.
1 code implementation • ICCV 2021 • Heliang Zheng, Huan Yang, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
And the reference space is optimized to capture deep image priors that are useful for quality assessment.
1 code implementation • ICCV 2021 • Kibeom Hong, Seogkyu Jeon, Huan Yang, Jianlong Fu, Hyeran Byun
To this end, we design a novel domainness indicator that captures the domainness value from the texture and structural features of reference images.
no code implementations • 10 Aug 2021 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao
To solve the partial visual confusion issue, we propose to leverage the carried context information of context reference, which is the concentric bigger box of each region proposal, to perform more accurate region classification and regression.
1 code implementation • ICCV 2021 • Kan Wu, Houwen Peng, Minghao Chen, Jianlong Fu, Hongyang Chao
We then propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE).
Ranked #152 on Object Detection on COCO minival
2 code implementations • ICCV 2021 • Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling
Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch.
Ranked #1 on Fine-Grained Image Classification on Oxford 102 Flowers (Top 1 Accuracy metric)
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
1 code implementation • CVPR 2021 • Bin Yan, Houwen Peng, Kan Wu, Dong Wang, Jianlong Fu, Huchuan Lu
Object tracking has achieved significant progress over the past few years.
3 code implementations • CVPR 2021 • Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu
As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.
Ranked #5 on Visual Entailment on SNLI-VE val
1 code implementation • 5 Apr 2021 • Yanhong Zeng, Jianlong Fu, Hongyang Chao
First, we calculate full-body anthropometric parameters from limited user inputs by imputation technique, and thus essential anthropometric parameters for 3D body reshaping can be obtained.
2 code implementations • 3 Apr 2021 • Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo
For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
Ranked #9 on Image Inpainting on Places2
1 code implementation • CVPR 2021 • Minghao Chen, Houwen Peng, Jianlong Fu, Haibin Ling
In this paper, we propose a one-shot neural ensemble architecture search (NEAS) solution that addresses the two challenges.
1 code implementation • ICCV 2021 • Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu
In this paper, we present a new tracking architecture with an encoder-decoder transformer as the key component.
Ranked #21 on Visual Object Tracking on TrackingNet
1 code implementation • 4 Dec 2020 • Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, Jiebo Luo
It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.
1 code implementation • NeurIPS 2020 • Heliang Zheng, Jianlong Fu, Yanhong Zeng, Jiebo Luo, Zheng-Jun Zha
Such a model disentangles latent factors according to the semantic of feature channels by channel-/group- wise fusion of latent codes and feature channels.
2 code implementations • NeurIPS 2020 • Houwen Peng, Hao Du, Hongyuan Yu, Qi Li, Jing Liao, Jianlong Fu
The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks.
1 code implementation • 22 Aug 2020 • Le Yang, Houwen Peng, Dingwen Zhang, Jianlong Fu, Junwei Han
To address this problem, this paper proposes a novel anchor-free action localization module that assists action localization by temporal points.
2 code implementations • ECCV 2020 • Yanhong Zeng, Jianlong Fu, Hongyang Chao
In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
Ranked #5 on Seeing Beyond the Visible on KITTI360-EX
3 code implementations • 18 Jun 2020 • Hongyuan Yu, Houwen Peng, Yan Huang, Jianlong Fu, Hao Du, Liang Wang, Haibin Ling
First, the search network generates an initial architecture for evaluation, and the weights of the evaluation network are optimized.
Ranked #18 on Neural Architecture Search on NAS-Bench-201, CIFAR-10
4 code implementations • ECCV 2020 • Zhipeng Zhang, Houwen Peng, Jianlong Fu, Bing Li, Weiming Hu
In this paper, we propose a novel object-aware anchor-free network to address this issue.
Ranked #2 on Visual Object Tracking on VOT2019
1 code implementation • CVPR 2020 • Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, Baining Guo
In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively.
no code implementations • 3 May 2020 • Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He, Wenhao Wu, Yukang Ding, Chao Li, Fu Li, Shilei Wen, Jianwei Li, Fuzhi Yang, Huan Yang, Jianlong Fu, Byung-Hoon Kim, JaeHyun Baek, Jong Chul Ye, Yuchen Fan, Thomas S. Huang, Junyeop Lee, Bokyeung Lee, Jungki Min, Gwantae Kim, Kanghyu Lee, Jaihyun Park, Mykola Mykhailych, Haoyu Zhong, Yukai Shi, Xiaojun Yang, Zhijing Yang, Liang Lin, Tongtong Zhao, Jinjia Peng, Huibing Wang, Zhi Jin, Jiahao Wu, Yifu Chen, Chenming Shang, Huanrong Zhang, Jeongki Min, Hrishikesh P. S, Densen Puthussery, Jiji C. V
This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results.
1 code implementation • 2 Apr 2020 • Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu
We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks.
2 code implementations • 8 Dec 2019 • Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, Jiebo Luo
In this report, we introduce the Winner method for HACS Temporal Action Localization Challenge 2019.
3 code implementations • 8 Dec 2019 • Songyang Zhang, Houwen Peng, Jianlong Fu, Jiebo Luo
We address the problem of retrieving a specific moment from an untrimmed video by a query sentence.
no code implementations • 24 Nov 2019 • Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
A storyboard is a sequence of images to illustrate a story containing multiple sentences, which has been a key process to create different story products.
1 code implementation • NeurIPS 2019 • Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks.
no code implementations • 29 Oct 2019 • Bei Liu, Zhicheng Huang, Zhaoyang Zeng, Zheyu Chen, Jianlong Fu
We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models.
no code implementations • 3 Oct 2019 • Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu
In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.
no code implementations • ICCV 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang
We study on weakly-supervised object detection (WSOD)which plays a vital role in relieving human involvement fromobject-level annotations.
Ranked #12 on Weakly Supervised Object Detection on PASCAL VOC 2007
1 code implementation • 11 Sep 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang
We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.
no code implementations • ICCV 2019 • Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai
Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels.
no code implementations • 11 Jul 2019 • Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann
The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.
no code implementations • 3 Jun 2019 • Shizhe Chen, Qin Jin, Jianlong Fu
However, a picture tells a thousand words, which makes multi-lingual sentences pivoted by the same image noisy as mutual translations and thus hinders the translation model learning.
2 code implementations • CVPR 2019 • Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo
As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured.
1 code implementation • CVPR 2019 • Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
Learning subtle yet discriminative features (e. g., beak and eyes for a bird) plays a significant role in fine-grained image recognition.
Ranked #1 on Fine-Grained Image Classification on iNaturalist
Fine-Grained Image Classification Fine-Grained Image Recognition
no code implementations • ECCV 2018 • Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei
First, we model one of the pairwise interaction (e. g., image and question) by bilinear features, which is further encoded with the third dimension (e. g., answer) to be a triplet by bilinear tensor product.
no code implementations • 9 Aug 2018 • Wen-Feng Cheng, Chao-Chung Wu, Ruihua Song, Jianlong Fu, Xing Xie, Jian-Yun Nie
This is one of the few attempts to generate poetry from images.
no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei
Specifically, we jointly learn a deep attention encoder, and the instance-level correspondences could be consequently discovered through attending on the learned instances.
3 code implementations • 23 Apr 2018 • Bei Liu, Jianlong Fu, Makoto P. Kato, Masatoshi Yoshikawa
Extensive experiments are conducted with 8K images, among which 1. 5K image are randomly picked for evaluation.
no code implementations • CVPR 2018 • Shuang Ma, Jianlong Fu, Chang Wen Chen, Tao Mei
Specifically, we jointly learn a deep attention encoder, and the instancelevel correspondences could be consequently discovered through attending on the learned instance pairs.
no code implementations • EMNLP 2018 • Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
Most existing approaches adopt the pipeline of representing an image via pre-trained CNNs, and then using the uninterpretable CNN features in conjunction with the question to predict the answer.
1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun
The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.
3 code implementations • ICCV 2017 • Heliang Zheng, Jianlong Fu, Tao Mei, Jiebo Luo
Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way.
Ranked #23 on Fine-Grained Image Classification on CUB-200-2011
no code implementations • CVPR 2017 • Jianlong Fu, Heliang Zheng, Tao Mei
The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN).
Fine-Grained Image Classification Fine-Grained Image Recognition +1
no code implementations • CVPR 2017 • Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui
To solve the challenges, we propose a multi-level attention network for visual question answering that can simultaneously reduce the semantic gap by semantic attention and benefit fine-grained spatial inference by visual attention.
1 code implementation • ICCV 2017 • Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, Jianlong Fu, Min Sun
The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain.
no code implementations • 2 Jun 2016 • Yu Liu, Jianlong Fu, Tao Mei, Chang Wen Chen
Second, by using sGRU as basic units, the BMRNN is trained to align the local storylines into the global sequential timeline.
no code implementations • ICCV 2015 • Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, Yong Rui
The development of deep learning has empowered machines with comparable capability of recognizing limited image categories to human beings.