Search Results for author: JianFeng Wang

Found 70 papers, 40 papers with code

NICE: Neural Image Commenting with Empathy

no code implementations • Findings (EMNLP) 2021 • Kezhen Chen, Qiuyuan Huang, Daniel McDuff, Xiang Gao, Hamid Palangi, JianFeng Wang, Kenneth Forbus, Jianfeng Gao

Based on these annotations, we define two different tasks for the NICE dataset.

Paper
Add Code

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

1 code implementation • 25 Apr 2024 • An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, JianFeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Ranked #48 on Visual Question Answering on MM-Vet

Visual Grounding Visual Question Answering +1

Paper
Code

Exploring Probabilistic Models for Semi-supervised Learning

no code implementations • 5 Apr 2024 • JianFeng Wang

This thesis studies advanced probabilistic models, including both their theoretical foundations and practical applications, for different semi-supervised learning (SSL) tasks.

Autonomous Driving

Paper
Add Code

Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

no code implementations • 19 Mar 2024 • JieLin Qiu, William Han, Winfred Wang, Zhengyuan Yang, Linjie Li, JianFeng Wang, Christos Faloutsos, Lei LI, Lijuan Wang

Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments.

Dense Captioning Image Captioning +3

Paper
Add Code

Bring Metric Functions into Diffusion Models

no code implementations • 4 Jan 2024 • Jie An, Zhengyuan Yang, JianFeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo

The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function.

Denoising

Paper
Add Code

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

no code implementations • 1 Jan 2024 • Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, JianFeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

\ModelName, our unified framework, merges unimodal and multimodal elements, enhancing model performance for tasks involving textual and visual data while notably reducing learnable parameters.

Language Modelling Reading Comprehension +1

Paper
Add Code

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

no code implementations • 21 Dec 2023 • Bingbing Wen, Zhengyuan Yang, JianFeng Wang, Zhe Gan, Bill Howe, Lijuan Wang

In this paper, we build a visual dialogue dataset, named InfoVisDial, which provides rich informative answers in each round even with external knowledge related to the visual content.

Paper
Add Code

How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads

1 code implementation • 15 Dec 2023 • Artjom Joosen, Ahmed Hassan, Martin Asenov, Rajkarn Singh, Luke Darlow, JianFeng Wang, Adam Barker

The first trace is derived from Huawei's internal workloads and contains detailed per-second statistics for 200 functions running across multiple Huawei cloud data centers.

Scheduling Time Series Prediction

Paper
Code

Interfacing Foundation Models' Embeddings

1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

The proposed interface is adaptive to new tasks, and new models.

Decoder Image Segmentation +3

Paper
Code

Segment and Caption Anything

1 code implementation • 1 Dec 2023 • Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

Caption Generation object-detection +2

157

Paper
Code

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

no code implementations • 29 Nov 2023 • Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, JianFeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD).

In-Context Learning Text Generation

Paper
Add Code

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

2 code implementations • 13 Nov 2023 • An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, JianFeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang

We first benchmark MM-Navigator on our collected iOS screen dataset.

Action Localization

110

Paper
Code

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations • 30 Oct 2023 • Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

Paper
Add Code

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

1 code implementation • 23 Oct 2023 • Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Lijuan Wang

For DEsignBench benchmarking, we perform human evaluations on generated images in DEsignBench gallery, against the criteria of image-text alignment, visual aesthetic, and design creativity.

Benchmarking Image Generation

Paper
Code

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

no code implementations • 12 Oct 2023 • Zhengyuan Yang, JianFeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation.

Paper
Add Code

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

no code implementations • 11 Oct 2023 • Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo

We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Question Answering Text Generation

Paper
Add Code

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

1 code implementation • 29 Sep 2023 • Zhengyuan Yang, Linjie Li, Kevin Lin, JianFeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.

182

Paper
Code

NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation

1 code implementation • 5 Aug 2023 • JianFeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz

This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost.

Segmentation Self-Driving Cars +3

Paper
Code

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation • 4 Aug 2023 • Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math Zero-Shot Visual Question Answring

185

Paper
Code

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

no code implementations • 27 Jul 2023 • Xin Yuan, Linjie Li, JianFeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang

In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.

Denoising

Paper
Add Code

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

4 code implementations • 26 Jun 2023 • Fuxiao Liu, Kevin Lin, Linjie Li, JianFeng Wang, Yaser Yacoob, Lijuan Wang

To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts.

Ranked #4 on Visual Question Answering (VQA) on HallusionBench

Hallucination Visual Question Answering

220

Paper
Code

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

1 code implementation • 7 Jun 2023 • JieLin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Ding Zhao, Bo Li, Lijuan Wang

To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the \textbf{MMSum} dataset.

Text Summarization Video Summarization

Paper
Code

Segment Everything Everywhere All at Once

2 code implementations • NeurIPS 2023 • Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Decoder Image Segmentation +5

13,707

Paper
Code

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.

Video Generation

Paper
Add Code

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

1 code implementation • 20 Mar 2023 • Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.

Ranked #25 on Visual Question Answering on MM-Vet

Multimodal Reasoning Visual Question Answering

909

Paper
Code

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

no code implementations • 21 Feb 2023 • Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, JianFeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

3D photography renders a static image into a video with appealing 3D visual effects.

Ranked #1 on Image Outpainting on MSCOCO

Image Outpainting Monocular Depth Estimation

Paper
Add Code

NP-Match: Towards a New Probabilistic Model for Semi-Supervised Learning

1 code implementation • 31 Jan 2023 • JianFeng Wang, Xiaolin Hu, Thomas Lukasiewicz

In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match.

Classification Semi-Supervised Image Classification

127

Paper
Code

Generalized Decoding for Pixel, Image, and Language

1 code implementation • CVPR 2023 • Xueyan Zou, Zi-Yi Dou, Jianwei Yang, Zhe Gan, Linjie Li, Chunyuan Li, Xiyang Dai, Harkirat Behl, JianFeng Wang, Lu Yuan, Nanyun Peng, Lijuan Wang, Yong Jae Lee, Jianfeng Gao

We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly.

Ranked #4 on Instance Segmentation on ADE20K val (using extra training data)

Decoder Image Segmentation +4

1,257

Paper
Code

GRiT: A Generative Region-to-text Transformer for Object Understanding

1 code implementation • 1 Dec 2022 • Jialian Wu, JianFeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, Lijuan Wang

Specifically, GRiT consists of a visual encoder to extract image features, a foreground object extractor to localize objects, and a text decoder to generate open-set object descriptions.

Ranked #2 on Dense Captioning on Visual Genome

Decoder Dense Captioning +4

277

Paper
Code

ReCo: Region-Controlled Text-to-Image Generation

no code implementations • CVPR 2023 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

Ranked #2 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Position

Paper
Add Code

Exploring Discrete Diffusion Models for Image Captioning

1 code implementation • 21 Nov 2022 • Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.

Image Captioning Image Generation

Paper
Code

Prompting GPT-3 To Be Reliable

1 code implementation • 17 Oct 2022 • Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, JianFeng Wang, Jordan Boyd-Graber, Lijuan Wang

While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality.

Fairness Language Modelling

Paper
Code

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

1 code implementation • 20 Jul 2022 • Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

Ranked #1 on Image Outpainting on LHQC

Image Outpainting Text-to-Image Generation +1

2,795

Paper
Code

NP-Match: When Neural Processes meet Semi-Supervised Learning

1 code implementation • 3 Jul 2022 • JianFeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou

Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data.

Ranked #2 on Semi-Supervised Image Classification on CIFAR-10, 40 Labels

Semi-Supervised Image Classification

127

Paper
Code

Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation

1 code implementation • CVPR 2022 • JianFeng Wang, Thomas Lukasiewicz

Secondly, in fact, they are only partially based on Bayesian deep learning, as their overall architectures are not designed under the Bayesian framework.

Image Segmentation Semantic Segmentation +2

Paper
Code

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

1 code implementation • NeurIPS 2022 • Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang

Vision-language (VL) pre-training has recently received considerable attention.

Ranked #1 on Phrase Grounding on Flickr30k Entities Dev

Described Object Detection Image Captioning +5

124

Paper
Code

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

no code implementations • CVPR 2023 • Lingchen Meng, Xiyang Dai, Yinpeng Chen, Pengchuan Zhang, Dongdong Chen, Mengchen Liu, JianFeng Wang, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

Detection Hub further achieves SoTA performance on UODB benchmark with wide variety of datasets.

Object object-detection +1

Paper
Add Code

GIT: A Generative Image-to-text Transformer for Vision and Language

1 code implementation • 27 May 2022 • JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Ranked #1 on Image Captioning on nocaps-XD near-domain

Decoder Image Captioning +8

527

Paper
Code

The Overlooked Classifier in Human-Object Interaction Recognition

no code implementations • 10 Mar 2022 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.

Classification Human-Object Interaction Detection +4

Paper
Add Code

The Overlooked Classifier in Human-Object Interaction Recognition

no code implementations • arXiv 2021 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Lin Liang, Jenq-Neng Hwang, Zicheng Liu

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image.

Ranked #1 on Human-Object Interaction Detection on HICO

Classification Human-Object Interaction Detection +4

Paper
Add Code

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Caption Generation Image Captioning

Paper
Code

Scaling Up Vision-Language Pre-training for Image Captioning

no code implementations • CVPR 2022 • Xiaowei Hu, Zhe Gan, JianFeng Wang, Zhengyuan Yang, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning.

Ranked #3 on Image Captioning on nocaps-XD entire (using extra training data)

Attribute Image Captioning

Paper
Add Code

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

1 code implementation • 23 Nov 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Faisal Ahmed, Zicheng Liu, Yumao Lu, Lijuan Wang

On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations.

Image Captioning Language Modelling +5

Paper
Code

Florence: A New Foundation Model for Computer Vision

1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Ranked #1 on Action Recognition In Videos on Kinetics-600

Action Classification Action Recognition In Videos +12

370

Paper
Code

UFO: A UniFied TransfOrmer for Vision-Language Representation Learning

no code implementations • 19 Nov 2021 • JianFeng Wang, Xiaowei Hu, Zhe Gan, Zhengyuan Yang, Xiyang Dai, Zicheng Liu, Yumao Lu, Lijuan Wang

In this paper, we propose a single UniFied transfOrmer (UFO), which is capable of processing either unimodal inputs (e. g., image or language) or multimodal inputs (e. g., the concatenation of the image and the question), for vision-language (VL) representation learning.

Image Captioning Image-text matching +9

Paper
Add Code

An Empirical Study of Training End-to-End Vision-and-Language Transformers

2 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng

Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.

Ranked #19 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Cross-Modal Retrieval Decoder +2

354

Paper
Code

Edge Prior Augmented Networks for Motion Deblurring on Naturally Blurry Images

no code implementations • 18 Sep 2021 • Yuedong Chen, Junjia Huang, JianFeng Wang, Xiaohua Xie

Motion deblurring has witnessed rapid development in recent years, and most of the recent methods address it by using deep learning techniques, with the help of different kinds of prior knowledge.

Deblurring

Paper
Add Code

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

1 code implementation • 10 Sep 2021 • Zhengyuan Yang, Zhe Gan, JianFeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, Lijuan Wang

To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA.

Ranked #20 on Visual Question Answering (VQA) on OK-VQA (using extra training data)

Image Captioning Question Answering +2

Paper
Code

Is Object Detection Necessary for Human-Object Interaction Recognition?

no code implementations • arXiv 2021 • Ying Jin, Yinpeng Chen, Lijuan Wang, JianFeng Wang, Pei Yu, Zicheng Liu, Jenq-Neng Hwang

This paper revisits human-object interaction (HOI) recognition at image level without using supervisions of object location and human pose.

Human-Object Interaction Detection Object +2

Paper
Add Code

RSG: A Simple but Effective Module for Learning Imbalanced Datasets

1 code implementation • CVPR 2021 • JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu

Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.

Ranked #17 on Long-tail Learning on Places-LT

Long-tail Learning

122

Paper
Code

End-to-End Semi-Supervised Object Detection with Soft Teacher

8 code implementations • ICCV 2021 • Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu

This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.

Ranked #6 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Instance Segmentation object-detection +4

887

Paper
Code

Convolutional Neural Networks with Gated Recurrent Connections

1 code implementation • 5 Jun 2021 • JianFeng Wang, Xiaolin Hu

The critical element of RCNN is the recurrent convolutional layer (RCL), which incorporates recurrent connections between neurons in the standard convolutional layer.

object-detection Object Detection +2

125

Paper
Code

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Paper
Add Code

DAP: Detection-Aware Pre-training with Weak Supervision

1 code implementation • CVPR 2021 • Yuanyi Zhong, JianFeng Wang, Lijuan Wang, Jian Peng, Yu-Xiong Wang, Lei Zhang

This paper presents a detection-aware pre-training (DAP) approach, which leverages only weakly-labeled classification-style datasets (e. g., ImageNet) for pre-training, but is specifically tailored to benefit object detection tasks.

Classification General Classification +4

Paper
Code

Adversarial Feature Augmentation and Normalization for Visual Recognition

1 code implementation • 22 Mar 2021 • Tianlong Chen, Yu Cheng, Zhe Gan, JianFeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.

Classification Data Augmentation +2

Paper
Code

On graphs with exactly one anti-adjacency eigenvalue and beyond

no code implementations • 18 Feb 2021 • JianFeng Wang, Xingyu Lei, Mei Lu

This matrix can be interpreted as the opposite of the adjacency matrix, which is instead constructed from the distance matrix of a graph by keeping in each row and each column only the distances equal to 1.

Combinatorics 05C50

Paper
Add Code

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection

1 code implementation • 12 Jan 2021 • Zheng Ge, JianFeng Wang, Xin Huang, Songtao Liu, Osamu Yoshie

A joint loss is then defined as the weighted summation of cls and reg losses as the assigning indicator.

object-detection Object Detection +1

Paper
Code

SEED: Self-supervised Distillation For Visual Representation

1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

This paper is concerned with self-supervised learning for small models.

Knowledge Distillation Self-Supervised Learning +1

Paper
Code

Orthogonal Subspace Decomposition: A New Perspective of Learning Discriminative Features for Face Clustering

no code implementations • 1 Jan 2021 • JianFeng Wang, Thomas Lukasiewicz, Zhongchao shi

Learning discriminative node features is the key to further improve the performance of graph-based face clustering.

Clustering Face Clustering

Paper
Add Code

The Hoffman program of graphs: old and new

no code implementations • 24 Dec 2020 • JianFeng Wang, Jing Wang, Maurizio Brunetti

The Hoffman program with respect to any real or complex square matrix $M$ associated to a graph $G$ stems from A. J. Hoffman's pioneering work on the limit points for the spectral radius of adjacency matrices of graphs less than $\sqrt{2+\sqrt{5}}$.

Combinatorics 05C50

Paper
Add Code

MiniVLM: A Smaller and Faster Vision-Language Model

no code implementations • 13 Dec 2020 • JianFeng Wang, Xiaowei Hu, Pengchuan Zhang, Xiujun Li, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

We design a Two-stage Efficient feature Extractor (TEE), inspired by the one-stage EfficientDet network, to significantly reduce the time cost of visual feature extraction by $95\%$, compared to a baseline model.

Language Modelling

Paper
Add Code

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

1 code implementation • CVPR 2021 • Zhengyuan Yang, Yijuan Lu, JianFeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5. 4%, compared with a non-TAP baseline.

Caption Generation Language Modelling +5

Paper
Code

End-to-End Object Detection with Fully Convolutional Network

1 code implementation • CVPR 2021 • JianFeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng

Mainstream object detectors based on the fully convolutional network has achieved impressive performance.

object-detection Object Detection

491

Paper
Code

Hashing-based Non-Maximum Suppression for Crowded Object Detection

1 code implementation • 22 May 2020 • Jianfeng Wang, Xi Yin, Lijuan Wang, Lei Zhang

Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound.

object-detection Object Detection +1

Paper
Code

Learning to Count Objects with Few Exemplar Annotations

no code implementations • 20 May 2019 • Jianfeng Wang, Rong Xiao, Yandong Guo, Lei Zhang

In this paper, we study the problem of object counting with incomplete annotations.

Object Object Counting +2

Paper
Add Code

SFace: An Efficient Network for Face Detection in Large Scale Variations

no code implementations • 18 Apr 2018 • Jianfeng Wang, Ye Yuan, Boxun Li, Gang Yu, Sun Jian

A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations.

4k Face Detection +1

Paper
Add Code

Gated Recurrent Convolution Neural Network for OCR

1 code implementation • NeurIPS 2017 • Jianfeng Wang, Xiaolin Hu

Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN.

General Classification Image Classification +2

Paper
Code

Face Attention Network: An Effective Face Detector for the Occluded Faces

1 code implementation • 20 Nov 2017 • Jianfeng Wang, Ye Yuan, Gang Yu

The performance of face detection has been largely improved with the development of convolutional neural network.

Ranked #1 on Occluded Face Detection on MAFA

Data Augmentation Occluded Face Detection

314

Paper
Code

Group $K$-Means

no code implementations • 5 Jan 2015 • Jianfeng Wang, Shuicheng Yan, Yi Yang, Mohan S. Kankanhalli, Shipeng Li, Jingdong Wang

We study how to learn multiple dictionaries from a dataset, and approximate any data point by the sum of the codewords each chosen from the corresponding dictionary.

Paper
Add Code

Optimized Cartesian $K$-Means

no code implementations • 16 May 2014 • Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.