no code implementations • Findings (EMNLP) 2021 • Kezhen Chen, Qiuyuan Huang, Daniel McDuff, Xiang Gao, Hamid Palangi, JianFeng Wang, Kenneth Forbus, Jianfeng Gao
Based on these annotations, we define two different tasks for the NICE dataset.
no code implementations • 21 Feb 2023 • Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, JianFeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
3D photography renders a static image into a video with appealing 3D visual effects.
1 code implementation • 31 Jan 2023 • JianFeng Wang, Xiaolin Hu, Thomas Lukasiewicz
In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match.
1 code implementation • 21 Dec 2022 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.
Ranked #3 on
Instance Segmentation
on ADE20K val
(using extra training data)
1 code implementation • 1 Dec 2022 • Jialian Wu, JianFeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang
Specifically, GRiT consists of a visual encoder to extract image features, a foreground object extractor to localize objects, and a text decoder to generate open-set object descriptions.
Ranked #1 on
Dense Captioning
on Visual Genome
no code implementations • 23 Nov 2022 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.
1 code implementation • 21 Nov 2022 • Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu
The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.
1 code implementation • 17 Oct 2022 • Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, JianFeng Wang, Jordan Boyd-Graber, Lijuan Wang
While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality.
1 code implementation • 20 Jul 2022 • Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan
In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.
Ranked #1 on
Image Outpainting
on LHQC
1 code implementation • 3 Jul 2022 • JianFeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou
Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data.
1 code implementation • CVPR 2022 • JianFeng Wang, Thomas Lukasiewicz
Secondly, in fact, they are only partially based on Bayesian deep learning, as their overall architectures are not designed under the Bayesian framework.
1 code implementation • NeurIPS 2022 • Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang
Vision-language (VL) pre-training has recently received considerable attention.
Ranked #1 on
Phrase Grounding
on Flickr30k Entities Dev
no code implementations • 7 Jun 2022 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang
We design a detection hub to dynamically adapt queries on category embedding based on the different distributions of datasets.
2 code implementations • 27 May 2022 • JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.
Ranked #1 on
Image Captioning
on nocaps-XD out-of-domain
no code implementations • 10 Mar 2022 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu
Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.
no code implementations • arXiv 2021 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu
Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.
Ranked #1 on
Human-Object Interaction Detection
on HICO
1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.
no code implementations • CVPR 2022 • Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang
In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.
Ranked #3 on
Image Captioning
on nocaps-XD entire
(using extra training data)
1 code implementation • 23 Nov 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang
On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations.
1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.
Ranked #1 on
Action Recognition In Videos
on Kinetics-600
no code implementations • 19 Nov 2021 • JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang
In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.
1 code implementation • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng
Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.
Ranked #15 on
Cross-Modal Retrieval
on COCO 2014
no code implementations • 18 Sep 2021 • Yuedong Chen, Junjia Huang, JianFeng Wang, Xiaohua Xie
Motion deblurring has witnessed rapid development in recent years, and most of the recent methods address it by using deep learning techniques, with the help of different kinds of prior knowledge.
1 code implementation • 10 Sep 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang
To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.
Ranked #10 on
Visual Question Answering
on OK-VQA
no code implementations • arXiv 2021 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Zicheng Liu, Jenq-Neng Hwang
This paper revisits human-object interaction (HOI) recognition at image level without using supervisions of object location and human pose.
1 code implementation • CVPR 2021 • JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu
Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.
Ranked #15 on
Long-tail Learning
on Places-LT
6 code implementations • ICCV 2021 • Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
Ranked #4 on
Semi-Supervised Object Detection
on COCO 100% labeled data
(using extra training data)
1 code implementation • 5 Jun 2021 • JianFeng Wang, Xiaolin Hu
The critical element of RCNN is the recurrent convolutional layer (RCL), which incorporates recurrent connections between neurons in the standard convolutional layer.
no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.
1 code implementation • CVPR 2021 • Yuanyi Zhong, JianFeng Wang, Lijuan Wang, Jian Peng, Yu-Xiong Wang, Lei Zhang
This paper presents a detection-aware pre-training (DAP) approach, which leverages only weakly-labeled classification-style datasets (e. g., ImageNet) for pre-training, but is specifically tailored to benefit object detection tasks.
1 code implementation • 22 Mar 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, JianFeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
no code implementations • 18 Feb 2021 • JianFeng Wang, Xingyu Lei, Mei Lu
This matrix can be interpreted as the opposite of the adjacency matrix, which is instead constructed from the distance matrix of a graph by keeping in each row and each column only the distances equal to 1.
Combinatorics 05C50
1 code implementation • 12 Jan 2021 • Zheng Ge, JianFeng Wang, Xin Huang, Songtao Liu, Osamu Yoshie
A joint loss is then defined as the weighted summation of cls and reg losses as the assigning indicator.
1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
This paper is concerned with self-supervised learning for small models.
no code implementations • 1 Jan 2021 • JianFeng Wang, Thomas Lukasiewicz, Zhongchao shi
Learning discriminative node features is the key to further improve the performance of graph-based face clustering.
no code implementations • 24 Dec 2020 • JianFeng Wang, Jing Wang, Maurizio Brunetti
The Hoffman program with respect to any real or complex square matrix $M$ associated to a graph $G$ stems from A. J. Hoffman's pioneering work on the limit points for the spectral radius of adjacency matrices of graphs less than $\sqrt{2+\sqrt{5}}$.
Combinatorics 05C50
no code implementations • 13 Dec 2020 • JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.
1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo
Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.
1 code implementation • CVPR 2021 • JianFeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng
Mainstream object detectors based on the fully convolutional network has achieved impressive performance.
1 code implementation • 22 May 2020 • Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang
Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.
no code implementations • 20 May 2019 • Jianfeng Wang, Rong Xiao, Yandong Guo, Lei Zhang
In this paper, we study the problem of object counting with incomplete annotations.
no code implementations • 18 Apr 2018 • Jianfeng Wang, Ye Yuan, Boxun Li, Gang Yu, Sun Jian
A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations.
1 code implementation • NeurIPS 2017 • Jianfeng Wang, Xiaolin Hu
Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN.
1 code implementation • 20 Nov 2017 • Jianfeng Wang, Ye Yuan, Gang Yu
The performance of face detection has been largely improved with the development of convolutional neural network.
Ranked #1 on
Occluded Face Detection
on MAFA
no code implementations • 5 Jan 2015 • Jianfeng Wang, Shuicheng Yan, Yi Yang, Mohan S. Kankanhalli, Shipeng Li, Jingdong Wang
We study how to learn multiple dictionaries from a dataset, and approximate any data point by the sum of the codewords each chosen from the corresponding dictionary.
no code implementations • 16 May 2014 • Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li
In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.