Search Results for author: Jingdong Wang

Found 190 papers, 91 papers with code

CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding

no code implementations • 22 Apr 2024 • Guibiao Liao, Jiankun Li, Zhenyu Bao, Xiaoqing Ye, Jingdong Wang, Qing Li, Kanglin Liu

Additionally, to address the semantic ambiguity, caused by utilizing view-inconsistent 2D CLIP semantics to supervise Gaussians, we introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model.

Paper
Add Code

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

1 code implementation • 1 Apr 2024 • Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong liu, Jingdong Wang

However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the widespread applications.

Virtual Try-on

Paper
Code

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

no code implementations • 28 Mar 2024 • Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong liu, Jingdong Wang

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context.

Denoising Face Generation

Paper
Add Code

Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection

no code implementations • 26 Mar 2024 • Jiacheng Zhang, Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Additionally, we present a DepthGradient Projection (DGP) module to mitigate optimization conflicts caused by noisy depth supervision of pseudo-labels, effectively decoupling the depth gradient and removing conflicting gradients.

Monocular 3D Object Detection object-detection +1

Paper
Add Code

TexRO: Generating Delicate Textures of 3D Models by Recursive Optimization

no code implementations • 22 Mar 2024 • Jinbo Wu, Xing Liu, Chenming Wu, Xiaobo Gao, Jialun Liu, Xinqi Liu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang

We propose an optimal viewpoint selection strategy, that finds the most miniature set of viewpoints covering all the faces of a mesh.

Denoising Texture Synthesis

Paper
Add Code

Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection

1 code implementation • ICCV 2023 • Jiaming Li, Xiangru Lin, Wei zhang, Xiao Tan, YingYing Li, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

To tackle the confirmation bias from incorrect pseudo labels of minority classes, the class-rebalancing sampling module resamples unlabeled data following the guidance of the gradient-based reweighting module.

object-detection Object Detection +1

Paper
Code

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

no code implementations • 15 Mar 2024 • Hao Li, Yuanyuan Gao, Chenming Wu, Dingwen Zhang, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model.

Generalizable Novel View Synthesis Novel View Synthesis

Paper
Add Code

Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval

1 code implementation • 8 Mar 2024 • Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang

To achieve this, we propose L2RM, a general framework based on Optimal Transport (OT) that learns to rematch mismatched pairs.

Cross-Modal Retrieval Retrieval +2

Paper
Code

VRP-SAM: SAM with Visual Reference Prompt

1 code implementation • 27 Feb 2024 • Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li

In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model.

Meta-Learning Segmentation

Paper
Code

GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos

no code implementations • 26 Feb 2024 • Xinqi Liu, Chenming Wu, Jialun Liu, Xing Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang

In this paper, we present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA).

Novel View Synthesis Pose Estimation

Paper
Add Code

Collaborative Position Reasoning Network for Referring Image Segmentation

no code implementations • 22 Jan 2024 • JianJian Cao, Beiya Dai, Yulin Li, Xiameng Qin, Jingdong Wang

Holi integrates features of the two modalities by a cross-modal attention mechanism, which suppresses the irrelevant redundancy under the guide of positioning information from RoCo.

Image Segmentation Position +2

Paper
Add Code

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

no code implementations • 22 Jan 2024 • Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong liu

In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability.

Action Recognition Temporal Action Localization

Paper
Add Code

MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation • 8 Jan 2024 • Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Object object-detection +1

Paper
Code

Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection

no code implementations • 27 Dec 2023 • Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Yao Zhao, Jingdong Wang

In this paper, we study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods, e. g., GANs and diffusion models.

Attribute Synthetic Image Detection

Paper
Add Code

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation

no code implementations • 27 Dec 2023 • Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Xiaojun Chang, Jingdong Wang

Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.

Cross-Modal Retrieval Memorization +2

Paper
Add Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

343

Paper
Code

GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization

no code implementations • 8 Dec 2023 • Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang

This paper presents GIR, a 3D Gaussian Inverse Rendering method for relightable scene factorization.

Inverse Rendering

Paper
Add Code

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

2 code implementations • 6 Dec 2023 • Hongyang Li, Yang Li, Huijie Wang, Jia Zeng, Huilin Xu, Pinlong Cai, Li Chen, Junchi Yan, Feng Xu, Lu Xiong, Jingdong Wang, Futang Zhu, Chunjing Xu, Tiancai Wang, Fei Xia, Beipeng Mu, Zhihui Peng, Dahua Lin, Yu Qiao

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.

Autonomous Driving

360

Paper
Code

Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

no code implementations • 4 Dec 2023 • Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang

To realize this, we innovatively blend video models with Large Language Models (LLMs) to devise Action-conditioned Prompts.

Action Recognition Descriptive +1

Paper
Add Code

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

1 code implementation • 30 Nov 2023 • Zipeng Qi, Guoxi Huang, Zebin Huang, Qin Guo, Jinwen Chen, Junyu Han, Jian Wang, Gang Zhang, Lufei Liu, Errui Ding, Jingdong Wang

The LRDiff framework constructs an image-rendering process with multiple layers, each of which applies the vision guidance to instructively estimate the denoising direction for a single object.

Denoising Image Generation

Paper
Code

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

2 code implementations • 27 Nov 2023 • Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang

Our study centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks: Firstly, we explore the potential of its generated rich textual descriptions across various categories to enhance recognition performance without any training.

Zero-Shot Learning

834

Paper
Code

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

no code implementations • 20 Nov 2023 • Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, Junwei Han

Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular.

Instance Segmentation Scene Understanding +2

Paper
Add Code

Disentangled Representation Learning with Transmitted Information Bottleneck

no code implementations • 3 Nov 2023 • Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, Jingdong Wang, Qinghua Zheng

Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models.

Disentanglement Variational Inference

Paper
Add Code

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

1 code implementation • NeurIPS 2023 • Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding, Lanfen Lin, Fei Wu, Jingdong Wang

To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image.

2D Pose Estimation Attribute +3

Paper
Code

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

1 code implementation • NeurIPS 2023 • Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert).

Ranked #6 on 3D Object Detection on nuScenes Camera Only

3D Object Detection object-detection

1,068

Paper
Code

Accelerating Vision Transformers Based on Heterogeneous Attention Patterns

no code implementations • 11 Oct 2023 • Deli Yu, Teng Xi, Jianwei Li, Baopu Li, Gang Zhang, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

On one hand, different images share more similar attention patterns in early layers than later layers, indicating that the dynamic query-by-key self-attention matrix may be replaced with a static self-attention matrix in early layers.

Dimensionality Reduction

Paper
Add Code

Forward Flow for Novel View Synthesis of Dynamic Scenes

no code implementations • ICCV 2023 • Xiang Guo, Jiadai Sun, Yuchao Dai, GuanYing Chen, Xiaoqing Ye, Xiao Tan, Errui Ding, Yumeng Zhang, Jingdong Wang

This paper proposes a neural radiance field (NeRF) approach for novel view synthesis of dynamic scenes using forward warping.

Novel View Synthesis

Paper
Add Code

GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction

no code implementations • 26 Sep 2023 • Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, Jingdong Wang

In this representation, the vertexes and edges of the grid store the localization and adjacency information of the table.

Paper
Add Code

PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

no code implementations • 20 Sep 2023 • Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Jingdong Wang

Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID).

Denoising Pedestrian Detection +2

Paper
Add Code

Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation

no code implementations • 18 Sep 2023 • Huan Liu, Zichang Tan, Qiang Chen, Yunchao Wei, Yao Zhao, Jingdong Wang

Moreover, to address the semantic conflicts between image and frequency domains, the forgery-aware mutual module is developed to further enable the effective interaction of disparate image and frequency features, resulting in aligned and comprehensive visual forgery representations.

Misinformation

Paper
Add Code

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

1 code implementation • ICCV 2023 • Zhiyin Shao, Xinyu Zhang, Changxing Ding, Jian Wang, Jingdong Wang

In this way, the pre-training task and the T2I-ReID task are made consistent with each other on both data and training levels.

Person Re-Identification

Paper
Code

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

no code implementations • 1 Sep 2023 • Xin Li, Wenqing Chu, Ye Wu, Weihang Yuan, Fanglong Liu, Qi Zhang, Fu Li, Haocheng Feng, Errui Ding, Jingdong Wang

In this paper, we present VideoGen, a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion.

Text-to-Image Generation Text-to-Video Generation +1

Paper
Add Code

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

no code implementations • 20 Aug 2023 • Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls.

Layout-to-Image Generation

Paper
Add Code

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

1 code implementation • ICCV 2023 • Jiazheng Xing, Mengmeng Wang, Yudi Ruan, Bofan Chen, Yaowei Guo, Boyu Mu, Guang Dai, Jingdong Wang, Yong liu

Class prototype construction and matching are core aspects of few-shot action recognition.

Feature Correlation Few-Shot action recognition +1

Paper
Code

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

2 code implementations • ICCV 2023 • Huan Liu, Qiang Chen, Zichang Tan, Jiang-Jiang Liu, Jian Wang, Xiangbo Su, Xiaolong Li, Kun Yao, Junyu Han, Errui Ding, Yao Zhao, Jingdong Wang

State-of-the-art solutions adopt the DETR-like framework, and mainly develop the complex decoder, e. g., regarding pose estimation as keypoint box detection and combining with human detection in ED-Pose, hierarchically predicting with pose decoder and joint (keypoint) decoder in PETR.

Human Detection Multi-Person Pose Estimation

Paper
Code

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

no code implementations • 3 Aug 2023 • Jiazheng Xing, Mengmeng Wang, Xiaojun Hou, Guang Dai, Jingdong Wang, Yong liu

The adapters we design can combine information from video-text multimodal sources for task-oriented spatiotemporal modeling, which is fast, efficient, and has low training costs.

Few-Shot action recognition Few Shot Action Recognition

Paper
Add Code

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

no code implementations • 3 Aug 2023 • Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang

Our BGA-MNER consists of \texttt{image2text} and \texttt{text2image} generation with respect to entity-salient content in two modalities.

named-entity-recognition Named Entity Recognition +2

Paper
Add Code

Enhancing Your Trained DETRs with Box Refinement

1 code implementation • 21 Jul 2023 • Yiqun Chen, Qiang Chen, Peize Sun, Shoufa Chen, Jingdong Wang, Jian Cheng

We hope our work will bring the attention of the detection community to the localization bottleneck of current DETR-like models and highlight the potential of the RefineBox framework.

Paper
Code

CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation

1 code implementation • ICCV 2023 • Lizhao Liu, Zhuangwei Zhuang, Shangxin Huang, Xunlong Xiao, Tianhang Xiang, Cen Chen, Jingdong Wang, Mingkui Tan

CMT disentangles the learning of supervised segmentation and unsupervised masked context prediction for effectively learning the very limited labeled points and mass unlabeled points, respectively.

Representation Learning Scene Understanding +2

Paper
Code

What Can Simple Arithmetic Operations Do for Temporal Modeling?

2 code implementations • ICCV 2023 • Wenhao Wu, Yuxin Song, Zhun Sun, Jingdong Wang, Chang Xu, Wanli Ouyang

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

Ranked #4 on Action Recognition on Something-Something V1

Action Classification Action Recognition +1

Paper
Code

Semi-DETR: Semi-Supervised Object Detection with Detection Transformers

3 code implementations • CVPR 2023 • Jiacheng Zhang, Xiangru Lin, Wei zhang, Kuo Wang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang, Guanbin Li

Specifically, we propose a Stage-wise Hybrid Matching strategy that combines the one-to-many assignment and one-to-one assignment strategies to improve the training efficiency of the first stage and thus provide high-quality pseudo labels for the training of the second stage.

Ranked #1 on Semi-Supervised Object Detection on COCO 5% labeled data

Object object-detection +3

12,055

Paper
Code

Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation

no code implementations • 29 Jun 2023 • Zhongwei Qiu, Qiansheng Yang, Jian Wang, Xiyu Wang, Chang Xu, Dongmei Fu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

One of the mainstream schemes for 2D human pose estimation (HPE) is learning keypoints heatmaps by a neural network.

2D Human Pose Estimation Denoising +1

Paper
Add Code

Vision Transformer with Attention Map Hallucination and FFN Compaction

no code implementations • 19 Jun 2023 • Haiyang Xu, Zhichao Zhou, Dongliang He, Fu Li, Jingdong Wang

Vision Transformer(ViT) is now dominating many vision tasks.

Dimensionality Reduction Hallucination

Paper
Add Code

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai

It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.

Document AI Entity Linking +1

Paper
Add Code

Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes

1 code implementation • 17 May 2023 • Jiang-Tian Zhai, Ze Feng, Jinhao Du, Yongqiang Mao, Jiang-Jiang Liu, Zichang Tan, Yifu Zhang, Xiaoqing Ye, Jingdong Wang

Modern autonomous driving systems are typically divided into three main tasks: perception, prediction, and planning.

Ranked #1 on Trajectory Planning on nuScenes

Autonomous Driving Trajectory Planning

147

Paper
Code

Multi-Modal 3D Object Detection by Box Matching

1 code implementation • 12 May 2023 • Zhe Liu, Xiaoqing Ye, Zhikang Zou, Xinwei He, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

Extensive experiments on the nuScenes dataset demonstrate that our method is much more stable in dealing with challenging cases such as asynchronous sensors, misaligned sensor placement, and degenerated camera images than existing fusion methods.

Ranked #47 on 3D Object Detection on nuScenes

3D Object Detection Autonomous Driving +2

Paper
Code

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

no code implementations • CVPR 2023 • Jiazhi Guan, Zhanwang Zhang, Hang Zhou, Tianshu Hu, Kaisiyuan Wang, Dongliang He, Haocheng Feng, Jingtuo Liu, Errui Ding, Ziwei Liu, Jingdong Wang

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability.

Paper
Add Code

Exploring Effective Factors for Improving Visual In-Context Learning

1 code implementation • 10 Apr 2023 • Yanpeng Sun, Qiang Chen, Jian Wang, Jingdong Wang, Zechao Li

By doing this, the model can leverage the diverse knowledge stored in different parts of the model to improve its performance on new tasks.

In-Context Learning Meta-Learning +1

Paper
Code

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

no code implementations • ICCV 2023 • Sifan Long, Zhen Zhao, Junkun Yuan, Zichang Tan, JiangJiang Liu, Luping Zhou, Shengsheng Wang, Jingdong Wang

A contrastive loss is employed to align such augmented text and image representations on downstream tasks.

Paper
Add Code

Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection

1 code implementation • CVPR 2023 • Chang Liu, Weiming Zhang, Xiangru Lin, Wei zhang, Xiao Tan, Junyu Han, Xiaomao Li, Errui Ding, Jingdong Wang

It employs a "divide-and-conquer" strategy and separately exploits positives for the classification and localization task, which is more robust to the assignment ambiguity.

Ranked #1 on Semi-Supervised Object Detection on COCO 10% labeled data (detector metric)

Dense Object Detection Object +3

12,055

Paper
Code

ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box

no code implementations • 27 Mar 2023 • Yifu Zhang, Xinggang Wang, Xiaoqing Ye, Wei zhang, Jincheng Lu, Xiao Tan, Errui Ding, Peize Sun, Jingdong Wang

We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories.

3D Multi-Object Tracking motion prediction +1

Paper
Add Code

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

2 code implementations • CVPR 2023 • Kaixin Xiong, Shi Gong, Xiaoqing Ye, Xiao Tan, Ji Wan, Errui Ding, Jingdong Wang, Xiang Bai

In this paper, we address the problem of detecting 3D objects from multi-view images.

Ranked #8 on 3D Object Detection on nuScenes Camera Only

3D Object Detection object-detection +1

534

Paper
Code

PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

no code implementations • CVPR 2023 • Zhongwei Qiu, Yang Qiansheng, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Chang Xu, Dongmei Fu, Jingdong Wang

To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame.

Ranked #23 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation

Paper
Add Code

Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement

1 code implementation • ICCV 2023 • Jiaxiang Tang, Hang Zhou, Xiaokang Chen, Tianshu Hu, Errui Ding, Jingdong Wang, Gang Zeng

Neural Radiance Fields (NeRF) have constituted a remarkable breakthrough in image-based 3D reconstruction.

3D Reconstruction

847

Paper
Code

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

1 code implementation • 1 Mar 2023 • Yuechen Yu, Yulin Li, Chengquan Zhang, Xiaoqiang Zhang, Zengyuan Guo, Xiameng Qin, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Compared to the masked multi-modal modeling methods for document image understanding that rely on both the image and text modalities, StrucTexTv2 models image-only input and potentially deals with more application scenarios free from OCR pre-processing.

Ranked #1 on Table Recognition on WTW

Document Image Classification Language Modelling +3

480

Paper
Code

Understanding Self-Supervised Pretraining with Part-Aware Representation Learning

1 code implementation • 27 Jan 2023 • Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang

The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.

Contrastive Learning Object +1

Paper
Code

Graph Contrastive Learning for Skeleton-based Action Recognition

1 code implementation • 26 Jan 2023 • Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences.

Ranked #9 on Skeleton Based Action Recognition on NTU RGB+D

Action Recognition Contrastive Learning +2

Paper
Code

UATVR: Uncertainty-Adaptive Text-Video Retrieval

1 code implementation • ICCV 2023 • Bo Fang, Wenhao Wu, Chang Liu, Yu Zhou, Yuxin Song, Weiping Wang, Xiangbo Shu, Xiangyang Ji, Jingdong Wang

In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation.

Retrieval Semantic correspondence +1

Paper
Code

CFCG: Semi-Supervised Semantic Segmentation via Cross-Fusion and Contour Guidance Supervision

no code implementations • ICCV 2023 • Shuo Li, Yue He, Weiming Zhang , Wei zhang, Xiao Tan, Junyu Han, Errui Ding, Jingdong Wang

Current state-of-the-art semi-supervised semantic segmentation (SSSS) methods typically adopt pseudo labeling and consistency regularization between multiple learners with different perturbations.

Semi-Supervised Semantic Segmentation

Paper
Add Code

s-Adaptive Decoupled Prototype for Few-Shot Object Detection

no code implementations • ICCV 2023 • Jinhao Du, Shan Zhang, Qiang Chen, Haifeng Le, Yanpeng Sun, Yao Ni, Jian Wang, Bin He, Jingdong Wang

To provide precise information for the query image, the prototype is decoupled into task-specific ones, which provide tailored guidance for 'where to look' and 'what to look for', respectively.

Few-Shot Object Detection Meta-Learning +3

Paper
Add Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

202

Paper
Code

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations • CVPR 2023 • Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Ranked #7 on Video Retrieval on VATEX

Data Augmentation Retrieval +2

202

Paper
Code

Augmentation Matters: A Simple-yet-Effective Approach to Semi-supervised Semantic Segmentation

1 code implementation • CVPR 2023 • Zhen Zhao, Lihe Yang, Sifan Long, Jimin Pi, Luping Zhou, Jingdong Wang

Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance.

Semi-Supervised Semantic Segmentation

Paper
Code

Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers

no code implementations • 9 Dec 2022 • Yasheng Sun, Hang Zhou, Kaisiyuan Wang, Qianyi Wu, Zhibin Hong, Jingtuo Liu, Errui Ding, Jingdong Wang, Ziwei Liu, Hideki Koike

This requires masking a large percentage of the original image and seamlessly inpainting it with the aid of audio and reference frames.

Paper
Add Code

Cyclically Disentangled Feature Translation for Face Anti-spoofing

1 code implementation • 7 Dec 2022 • Haixiao Yue, Keyao Wang, Guosheng Zhang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang

We further extend CDFTN for multi-target domain adaptation by leveraging data from more unlabeled target domains.

Disentanglement Domain Adaptation +3

Paper
Code

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

1 code implementation • 22 Nov 2022 • Jiaxiang Tang, Kaisiyuan Wang, Hang Zhou, Xiaokang Chen, Dongliang He, Tianshu Hu, Jingtuo Liu, Gang Zeng, Jingdong Wang

While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage.

Talking Face Generation

817

Paper
Code

Instance-specific and Model-adaptive Supervision for Semi-supervised Semantic Segmentation

1 code implementation • CVPR 2023 • Zhen Zhao, Sifan Long, Jimin Pi, Jingdong Wang, Luping Zhou

Relying on the model's performance, iMAS employs a class-weighted symmetric intersection-over-union to evaluate quantitative hardness of each unlabeled instance and supervises the training on unlabeled data in a model-adaptive manner.

Segmentation Semi-Supervised Semantic Segmentation

Paper
Code

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

1 code implementation • CVPR 2023 • Sifan Long, Zhen Zhao, Jimin Pi, Shengsheng Wang, Jingdong Wang

In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Computational Efficiency Efficient ViTs

Paper
Code

CAE v2: Context Autoencoder with CLIP Target

no code implementations • 17 Nov 2022 • Xinyu Zhang, Jiahui Chen, Junkun Yuan, Qiang Chen, Jian Wang, Xiaodi Wang, Shumin Han, Xiaokang Chen, Jimin Pi, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

That is to say, the smaller the model, the lower the mask ratio needs to be.

Semantic Segmentation

Paper
Add Code

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

no code implementations • arXiv 2022 • Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO.

Ranked #8 on Object Detection on COCO test-dev

Object object-detection +1

Paper
Add Code

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

1 code implementation • 13 Oct 2022 • Jian Wang, Chenhui Gou, Qiman Wu, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang

Recently, transformer-based networks have shown impressive results in semantic segmentation.

Ranked #2 on Real-Time Semantic Segmentation on CamVid

Real-Time Semantic Segmentation Segmentation

8,248

Paper
Code

It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training

no code implementations • 11 Oct 2022 • Yuxin Song, Min Yang, Wenhao Wu, Dongliang He, Fu Li, Jingdong Wang

In order to guide the encoder to fully excavate spatial-temporal features, two separate decoders are used for two pretext tasks of disentangled appearance and motion prediction.

motion prediction

Paper
Add Code

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

no code implementations • 27 Sep 2022 • Zhiliang Xu, Hang Zhou, Zhibin Hong, Ziwei Liu, Jiaming Liu, Zhizhi Guo, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping, thus the generator's advantage can be adopted for optimizing identity similarity.

Face Swapping

Paper
Add Code

NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

no code implementations • 24 Sep 2022 • Jiankai Sun, Yan Xu, Mingyu Ding, Hongwei Yi, Chen Wang, Jingdong Wang, Liangjun Zhang, Mac Schwager

Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks.

Object Localization Robot Navigation

Paper
Add Code

TRUST: An Accurate and End-to-End Table structure Recognizer Using Splitting-based Transformers

no code implementations • 31 Aug 2022 • Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, Jingdong Wang

The Vertex-based Merging Module is capable of aggregating local contextual information between adjacent basic grids, providing the ability to merge basic girds that belong to the same spanning cell accurately.

Ranked #5 on Table Recognition on PubTabNet

Table Recognition

Paper
Add Code

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

no code implementations • 21 Aug 2022 • Haoran Wang, Dongliang He, Wenhao Wu, Boyang xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang

We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting.

Clustering Contrastive Learning +4

Paper
Add Code

Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention

no code implementations • 2 Aug 2022 • Fanqi Meng, Xuesong Wang, Jingdong Wang, Peifang Wang

The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report (i. e. suggestion or explanation) is also considered, thereby improving the performance of the classification.

Paper
Add Code

Rating the Crisis of Online Public Opinion Using a Multi-Level Index System

no code implementations • 29 Jul 2022 • Fanqi Meng, Xixi Xiao, Jingdong Wang

We propose a method to rate the crisis of online public opinion based on a multi-level index system to evaluate the impact of events objectively.

Blocking

Paper
Add Code

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

2 code implementations • ICCV 2023 • Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing.

Data Augmentation Object +2

12,048

Paper
Code

Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption

no code implementations • 21 Jul 2022 • Jiazhi Guan, Hang Zhou, Mingming Gong, Errui Ding, Jingdong Wang, Youjian Zhao

Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator and create a wide range of pseudo-fake videos for training.

DeepFake Detection Face Swapping

Paper
Add Code

UFO: Unified Feature Optimization

1 code implementation • 21 Jul 2022 • Teng Xi, Yifan Sun, Deli Yu, Bi Li, Nan Peng, Gang Zhang, Xinyu Zhang, Zhigang Wang, Jinwen Chen, Jian Wang, Lufei Liu, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

UFO aims to benefit each single task with a large-scale pretraining on all tasks.

Face Recognition Multi-Task Learning +4

Paper
Code

Action Quality Assessment with Temporal Parsing Transformer

1 code implementation • 19 Jul 2022 • Yang Bai, Desen Zhou, Songyang Zhang, Jian Wang, Errui Ding, Yu Guan, Yang Long, Jingdong Wang

Action Quality Assessment(AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.

Action Quality Assessment Action Understanding +1

Paper
Code

Conditional DETR V2: Efficient Detection Transformer with Box Queries

no code implementations • 18 Jul 2022 • Xiaokang Chen, Fangyun Wei, Gang Zeng, Jingdong Wang

Inspired by Conditional DETR, an improved DETR with fast training convergence, that presented box queries (originally called spatial queries) for internal decoder layers, we reformulate the object query into the format of the box query that is a composition of the embeddings of the reference point and the transformation of the box with respect to the reference point.

Object object-detection +1

Paper
Add Code

Towards Lightweight Super-Resolution with Dual Regression Learning

2 code implementations • 16 Jul 2022 • Yong Guo, Jingdong Wang, Qi Chen, JieZhang Cao, Zeshuai Deng, Yanwu Xu, Jian Chen, Mingkui Tan

Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space.

Image Super-Resolution Model Compression +1

434

Paper
Code

Paint and Distill: Boosting 3D Object Detection with Semantic Passing Network

no code implementations • 12 Jul 2022 • Bo Ju, Zhikang Zou, Xiaoqing Ye, Minyue Jiang, Xiao Tan, Errui Ding, Jingdong Wang

In this work, we propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models with the guidance of rich context painting, with no extra computation cost during inference.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Delving into Sequential Patches for Deepfake Detection

no code implementations • 6 Jul 2022 • Jiazhi Guan, Hang Zhou, Zhibin Hong, Errui Ding, Jingdong Wang, Chengbin Quan, Youjian Zhao

Recent advances in face forgery techniques produce nearly visually untraceable deepfake videos, which could be leveraged with malicious intentions.

DeepFake Detection Face Swapping

Paper
Add Code

Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning

1 code implementation • 13 Jun 2022 • Yanpeng Sun, Qiang Chen, Xiangyu He, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jian Cheng, Zechao Li, Jingdong Wang

In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}.

Ranked #8 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)

Few-Shot Semantic Segmentation

Paper
Code

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

no code implementations • 1 Jun 2022 • Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Specifically, we transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder using a proposed masked image-language modeling scheme.

Ranked #2 on Optical Character Recognition (OCR) on Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study

Language Modelling Optical Character Recognition (OCR) +1

Paper
Add Code

Few-Shot Font Generation by Learning Fine-Grained Local Styles

2 code implementations • CVPR 2022 • Licheng Tang, Yiyang Cai, Jiaming Liu, Zhibin Hong, Mingming Gong, Minhu Fan, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Instead of explicitly disentangling global or component-wise modeling, the cross-attention mechanism can attend to the right local styles in the reference glyphs and aggregate the reference styles into a fine-grained style representation for the given content glyphs.

Font Generation

Paper
Code

Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

no code implementations • 8 May 2022 • Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamy, Gopal Srinivasa, Suhas Jayaram Subramanya, Jingdong Wang

The outcome of the competition was ranked leaderboards of algorithms in each track based on recall at a query throughput threshold.

Paper
Add Code

Few-Shot Head Swapping in the Wild

no code implementations • CVPR 2022 • Changyong Shu, Hemao Wu, Hang Zhou, Jiaming Liu, Zhibin Hong, Changxing Ding, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Particularly, seamless blending is achieved with the help of a Semantic-Guided Color Reference Creation procedure and a Blending UNet.

Face Swapping

Paper
Add Code

Human-Object Interaction Detection via Disentangled Transformer

no code implementations • CVPR 2022 • Desen Zhou, Zhichao Liu, Jian Wang, Leshan Wang, Tao Hu, Errui Ding, Jingdong Wang

To associate the predictions of disentangled decoders, we first generate a unified representation for HOI triplets with a base decoder, and then utilize it as input feature of each disentangled decoder.

Human-Object Interaction Detection Object

Paper
Add Code

GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation

no code implementations • 16 Apr 2022 • Shi Gong, Xiaoqing Ye, Xiao Tan, Jingdong Wang, Errui Ding, Yu Zhou, Xiang Bai

Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability.

Autonomous Driving Image Segmentation +2

Paper
Add Code

Implicit Sample Extension for Unsupervised Person Re-Identification

1 code implementation • CVPR 2022 • Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang, Errui Ding, Javen Qinfeng Shi, Zhaoxiang Zhang, Jingdong Wang

Specifically, we generate support samples from actual samples and their neighbouring clusters in the embedding space through a progressive linear interpolation (PLI) strategy.

Clustering Unsupervised Person Re-Identification

5,253

Paper
Code

DaViT: Dual Attention Vision Transformers

3 code implementations • 7 Apr 2022 • Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.

Ranked #1 on Instance Segmentation on Object Detection on COCO minival

Computational Efficiency Image Classification +4

29,735

Paper
Code

MixFormer: Mixing Features across Windows and Dimensions

3 code implementations • CVPR 2022 • Qiang Chen, Qiman Wu, Jian Wang, Qinghao Hu, Tao Hu, Errui Ding, Jian Cheng, Jingdong Wang

We propose MixFormer to find a solution.

Image Classification

5,253

Paper
Code

ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval

no code implementations • CVPR 2022 • Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics.

Ranked #10 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Contrastive Learning Cross-Modal Retrieval +1

Paper
Add Code

Efficient Video Segmentation Models with Per-frame Inference

no code implementations • 24 Feb 2022 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

To this end, we perform inference at each frame.

Image Matting Instance Segmentation +6

Paper
Add Code

Context Autoencoder for Self-Supervised Representation Learning

6 code implementations • 7 Feb 2022 • Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.

Ranked #14 on Self-Supervised Image Classification on ImageNet (finetuned)

Instance Segmentation object-detection +5

3,082

Paper
Code

Expressive Talking Head Generation With Granular Audio-Visual Control

no code implementations • CVPR 2022 • Borong Liang, Yan Pan, Zhizhi Guo, Hang Zhou, Zhibin Hong, Xiaoguang Han, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang

Generating expressive talking heads is essential for creating virtual humans.

Talking Head Generation

Paper
Add Code

HRFormer: High-Resolution Vision Transformer for Dense Predict

2 code implementations • NeurIPS 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Pose Estimation Semantic Segmentation +1

475

Paper
Code

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

1 code implementation • NeurIPS 2021 • Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, Jingdong Wang

It stores the centroid points of the posting lists in the memory and the large posting lists in the disk.

4,695

Paper
Code

Whole Brain Segmentation with Full Volume Neural Network

1 code implementation • 29 Oct 2021 • Yeshu Li, Jonathan Cui, Yilun Sheng, Xiao Liang, Jingdong Wang, Eric I-Chao Chang, Yan Xu

To address these issues, we propose to adopt a full volume framework, which feeds the full volume brain image into the segmentation network and directly outputs the segmentation result for the whole brain volume.

Brain Segmentation Representation Learning +1

Paper
Code

HRFormer: High-Resolution Transformer for Dense Prediction

1 code implementation • 18 Oct 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

Ranked #3 on Pose Estimation on AIC

Image Classification Multi-Person Pose Estimation +2

475

Paper
Code

Realistic Image Synthesis with Configurable 3D Scene Layouts

no code implementations • 23 Aug 2021 • Jaebong Jeong, Janghun Jo, Jingdong Wang, Sunghyun Cho, Jaesik Park

Our approach takes a 3D scene with semantic class labels as input and trains a 3D scene painting network that synthesizes color values for the input 3D scene.

Image Generation

Paper
Add Code

Conditional DETR for Fast Training Convergence

3 code implementations • ICCV 2021 • Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang

Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention.

Object object-detection +1

124,889

Paper
Code

Content-Aware Convolutional Neural Networks

1 code implementation • 30 Jun 2021 • Yong Guo, Yaofo Chen, Mingkui Tan, Kui Jia, Jian Chen, Jingdong Wang

In practice, the convolutional operation on some of the windows (e. g., smooth windows that contain very similar pixels) can be very redundant and may introduce noises into the computation.

Paper
Code

Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning

no code implementations • 13 Jun 2021 • Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, Jingdong Wang

Cross-modal correlation provides an inherent supervision for video unsupervised representation learning.

Contrastive Learning Representation Learning

Paper
Add Code

On the Connection between Local Attention and Dynamic Depth-wise Convolution

1 code implementation • ICLR 2022 • Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, Jingdong Wang

Sparse connectivity: there is no connection across channels, and each position is connected to the positions within a small local window.

object-detection Object Detection +2

179

Paper
Code

Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

3 code implementations • CVPR 2021 • Xiaokang Chen, Yuhui Yuan, Gang Zeng, Jingdong Wang

Our approach imposes the consistency on two segmentation networks perturbed with different initialization for the same input image.

Ranked #2 on Semi-Supervised Semantic Segmentation on WoodScape

Segmentation Semi-Supervised Semantic Segmentation

475

Paper
Code

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search

1 code implementation • NeurIPS 2021 • Qi Chen, Bing Zhao, Haidong Wang, Mingqin Li, Chuanjie Liu, Zengzhong Li, Mao Yang, Jingdong Wang

It stores the centroid points of the posting lists in the memory and the large posting lists in the disk.

4,695

Paper
Code

Lite-HRNet: A Lightweight High-Resolution Network

15 code implementations • CVPR 2021 • Changqian Yu, Bin Xiao, Changxin Gao, Lu Yuan, Lei Zhang, Nong Sang, Jingdong Wang

We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.

Ranked #33 on Pose Estimation on COCO test-dev (using extra training data)

Pose Estimation Real-Time Semantic Segmentation +1

12,048

Paper
Code

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

2 code implementations • CVPR 2021 • Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang

Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.

Keypoint Detection

4,986

Paper
Code

Learning Versatile Neural Architectures by Propagating Network Codes

1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.

Image Segmentation Neural Architecture Search +2

Paper
Code

Boosting Adversarial Transferability through Enhanced Momentum

1 code implementation • 19 Mar 2021 • Xiaosen Wang, Jiadong Lin, Han Hu, Jingdong Wang, Kun He

Various momentum iterative gradient-based methods are shown to be effective to improve the adversarial transferability.

Adversarial Attack

136

Paper
Code

Admix: Enhancing the Transferability of Adversarial Attacks

2 code implementations • ICCV 2021 • Xiaosen Wang, Xuanran He, Jingdong Wang, Kun He

We investigate in this direction and observe that existing transformations are all applied on a single image, which might limit the adversarial transferability.

136

Paper
Code

Consistent Instance Classification for Unsupervised Representation Learning

no code implementations • 1 Jan 2021 • Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang

The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.

Classification General Classification +1

Paper
Add Code

Improving Person Re-identification with Iterative Impression Aggregation

no code implementations • 21 Sep 2020 • Dengpan Fu, Bo Xin, Jingdong Wang, Dong-Dong Chen, Jianmin Bao, Gang Hua, Houqiang Li

Not only does such a simple method improve the performance of the baseline models, it also achieves comparable performance with latest advanced re-ranking methods.

Person Re-Identification Re-Ranking

Paper
Add Code

Informative Dropout for Robust Representation Learning: A Shape-bias Perspective

1 code implementation • ICML 2020 • Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, Jingdong Wang

Specifically, we discriminate texture from shape based on local self-information in an image, and adopt a Dropout-like algorithm to decorrelate the model output from the local texture.

Domain Generalization Representation Learning

125

Paper
Code

Distillation Guided Residual Learning for Binary Convolutional Neural Networks

1 code implementation • 10 Jul 2020 • Jianming Ye, Shiliang Zhang, Jingdong Wang

We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.

Paper
Code

SegFix: Model-Agnostic Boundary Refinement for Segmentation

4 code implementations • ECCV 2020 • Yuhui Yuan, Jingyi Xie, Xilin Chen, Jingdong Wang

We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model.

Segmentation

1,173

Paper
Code

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

1 code implementation • ECCV 2020 • Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin

A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.

Instance Segmentation Object +5

Paper
Code

Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates

1 code implementation • 28 Jun 2020 • Ke Sun, Zigang Geng, Depu Meng, Bin Xiao, Dong Liu, Zhao-Xiang Zhang, Jingdong Wang

The typical bottom-up human pose estimation framework includes two stages, keypoint detection and grouping.

Keypoint Detection Multi-Person Pose Estimation +1

141

Paper
Code

Weakly-Supervised Action Localization by Generative Attention Modeling

1 code implementation • CVPR 2020 • Baifeng Shi, Qi Dai, Yadong Mu, Jingdong Wang

By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.

Ranked #8 on Weakly Supervised Action Localization on ActivityNet-1.2

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

136

Paper
Code

Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution

3 code implementations • CVPR 2020 • Yong Guo, Jian Chen, Jingdong Wang, Qi Chen, JieZhang Cao, Zeshuai Deng, Yanwu Xu, Mingkui Tan

Extensive experiments with paired training data and unpaired real-world data demonstrate our superiority over existing methods.

Image Super-Resolution regression

434

Paper
Code

Efficient Semantic Video Segmentation with Per-frame Inference

1 code implementation • ECCV 2020 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.

Ranked #2 on Video Semantic Segmentation on CamVid

Knowledge Distillation Optical Flow Estimation +4

292

Paper
Code

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

11 code implementations • ECCV 2020 • Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.

Ranked #5 on Semantic Segmentation on LIP val

Object Segmentation +1

8,248

Paper
Code

Cross View Fusion for 3D Human Pose Estimation

1 code implementation • ICCV 2019 • Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, Wen-Jun Zeng

It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses.

Ranked #6 on 3D Human Pose Estimation on Total Capture

2D Pose Estimation 3D Human Pose Estimation

533

Paper
Code

Global-Local Temporal Representations For Video Person Re-Identification

no code implementations • ICCV 2019 • Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang

The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences.

Metric Learning Re-Ranking +1

Paper
Add Code

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

20 code implementations • CVPR 2020 • Bowen Cheng, Bin Xiao, Jingdong Wang, Honghui Shi, Thomas S. Huang, Lei Zhang

HigherHRNet even surpasses all top-down methods on CrowdPose test (67. 6% AP), suggesting its robustness in crowded scene.

Ranked #2 on Pose Estimation on UAV-Human

2D Human Pose Estimation Multi-Person Pose Estimation +2

12,048

Paper
Code

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations • 20 Aug 2019 • Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

27,765

Paper
Code

Interlaced Sparse Self-Attention for Semantic Segmentation

6 code implementations • 29 Jul 2019 • Lang Huang, Yuhui Yuan, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

There are two successive attention modules each estimating a sparse affinity matrix.

Segmentation Semantic Segmentation

8,248

Paper
Code

MMDetection: Open MMLab Detection Toolbox and Benchmark

144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In this paper, we introduce the various features of this toolbox.

Benchmarking Instance Segmentation +2

27,765

Paper
Code

Group Re-Identification with Multi-grained Matching and Integration

no code implementations • 17 May 2019 • Weiyao Lin, Yuxi Li, Hao Xiao, John See, Junni Zou, Hongkai Xiong, Jingdong Wang, Tao Mei

The task of re-identifying groups of people underdifferent camera views is an important yet less-studied problem. Group re-identification (Re-ID) is a very challenging task sinceit is not only adversely affected by common issues in traditionalsingle object Re-ID problems such as viewpoint and human posevariations, but it also suffers from changes in group layout andgroup membership.

Paper
Add Code

High-Resolution Representations for Labeling Pixels and Regions

39 code implementations • 9 Apr 2019 • Ke Sun, Yang Zhao, Borui Jiang, Tianheng Cheng, Bin Xiao, Dong Liu, Yadong Mu, Xinggang Wang, Wenyu Liu, Jingdong Wang

The proposed approach achieves superior results to existing single-model networks on COCO object detection.

Ranked #7 on Semantic Segmentation on LIP val

Face Alignment Facial Landmark Detection +5

12,048

Paper
Code

Structured Knowledge Distillation for Dense Prediction

1 code implementation • CVPR 2019 • Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen

Here we propose to distill structured knowledge from large networks to compact networks, taking into account the fact that dense prediction is a structured prediction problem.

Depth Estimation General Classification +7

687

Paper
Code

Deep High-Resolution Representation Learning for Human Pose Estimation

39 code implementations • CVPR 2019 • Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang

We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.

Ranked #1 on Pose Estimation on BRACE

2D Human Pose Estimation Instance Segmentation +6

27,765

Paper
Code

Collaborative Quantization for Cross-Modal Similarity Search

no code implementations • CVPR 2016 • Ting Zhang, Jingdong Wang

Cross-modal similarity search is a problem about designing a search system supporting querying across content modalities, e. g., using an image to search for texts or using a text to search for images.

Quantization

Paper
Add Code

Supervised Quantization for Similarity Search

no code implementations • CVPR 2016 • Xiaojuan Wang, Ting Zhang, Guo-Jun Q, Jinhui Tang, Jingdong Wang

In this paper, we address the problem of searching for semantically similar images from a large database.

feature selection General Classification +1

Paper
Add Code

Deep Triplet Quantization

1 code implementation • 1 Feb 2019 • Bin Liu, Yue Cao, Mingsheng Long, Jian-Min Wang, Jingdong Wang

We propose Deep Triplet Quantization (DTQ), a novel approach to learning deep quantization models from the similarity triplets.

Ranked #1 on Image Retrieval on NUS-WIDE

Deep Hashing Image Retrieval +1

551

Paper
Code

Disparity-preserved Deep Cross-platform Association for Cross-platform Video Recommendation

no code implementations • 1 Jan 2019 • Shengze Yu, Xin Wang, Wenwu Zhu, Peng Cui, Jingdong Wang

However, there remain two unsolved challenges: i) there exist inconsistencies in cross-platform association due to platform-specific disparity, and ii) data from distinct platforms may have different semantic granularities.

Paper
Add Code

Weakly Supervised Dense Event Captioning in Videos

no code implementations • NeurIPS 2018 • Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, Junzhou Huang

Dense event captioning aims to detect and describe all events of interest contained in a video.

Sentence

Paper
Add Code

Accelerating Deep Neural Networks with Spatial Bottleneck Modules

no code implementations • 7 Sep 2018 • Junran Peng, Lingxi Xie, Zhao-Xiang Zhang, Tieniu Tan, Jingdong Wang

This paper presents an efficient module named spatial bottleneck for accelerating the convolutional layers in deep neural networks.

Paper
Add Code

OCNet: Object Context Network for Scene Parsing

8 code implementations • 4 Sep 2018 • Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.

Ranked #9 on Semantic Segmentation on Trans10K

Object Relation +2

7,398

Paper
Code

IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks

3 code implementations • 1 Jun 2018 • Ke Sun, Mingjie Li, Dong Liu, Jingdong Wang

In this paper, we are interested in building lightweight and efficient convolutional neural networks.

Image Classification object-detection +1

2,917

Paper
Code

Interleaved Structured Sparse Convolutional Neural Networks

no code implementations • CVPR 2018 • Guotian Xie, Jingdong Wang, Ting Zhang, Jian-Huang Lai, Richang Hong, Guo-Jun Qi

In this paper, we study the problem of designing efficient convolutional neural network architectures with the interest in eliminating the redundancy in convolution kernels.

Paper
Add Code

Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing

1 code implementation • CVPR 2018 • Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, Jingdong Wang

Inspired by the traditional image segmentation methods of seeded region growing, we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing.

Ranked #38 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Image Segmentation Segmentation +2

249

Paper
Code

Part-Aligned Bilinear Representations for Person Re-identification

no code implementations • ECCV 2018 • Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, Kyoung Mu Lee

We propose a novel network that learns a part-aligned representation for person re-identification.

Ranked #4 on Person Re-Identification on UAV-Human

2D Human Pose Estimation Person Re-Identification +1

Paper
Add Code

IGCV$2$: Interleaved Structured Sparse Convolutional Neural Networks

2 code implementations • 17 Apr 2018 • Guotian Xie, Jingdong Wang, Ting Zhang, Jian-Huang Lai, Richang Hong, Guo-Jun Qi

In this paper, we study the problem of designing efficient convolutional neural network architectures with the interest in eliminating the redundancy in convolution kernels.

190

Paper
Code

Object Detection in Videos by High Quality Object Linking

no code implementations • 30 Jan 2018 • Peng Tang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wen-Jun Zeng, Jingdong Wang

In particular, our method improves results by 8. 8% over the static image detector for fast moving objects.

General Classification Object +3

Paper
Add Code

LVreID: Person Re-Identification with Long Sequence Videos

no code implementations • 20 Dec 2017 • Jianing Li, Shiliang Zhang, Jingdong Wang, Wen Gao, Qi Tian

This paper mainly establishes a large-scale Long sequence Video database for person re-IDentification (LVreID).

Person Re-Identification

Paper
Add Code

Composite Quantization

1 code implementation • 4 Dec 2017 • Jingdong Wang, Ting Zhang

We introduce a composite quantization framework.

Quantization

Paper
Code

S4Net: Single Stage Salient-Instance Segmentation

1 code implementation • CVPR 2019 • Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu

Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch.

Instance Segmentation Segmentation +1

Paper
Code

Global versus Localized Generative Adversarial Nets

2 code implementations • CVPR 2018 • Guo-Jun Qi, Liheng Zhang, Hao Hu, Marzieh Edraki, Jingdong Wang, Xian-Sheng Hua

In this paper, we present a novel localized Generative Adversarial Net (GAN) to learn on the manifold of real data.

General Classification

Paper
Code

Interleaved Group Convolutions

no code implementations • ICCV 2017 • Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.

Paper
Add Code

Ensemble Diffusion for Retrieval

no code implementations • ICCV 2017 • Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, Qi Tian

This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i. e., fusion with diffusion) for robust retrieval.

3D Shape Classification 3D Shape Retrieval +2

Paper
Add Code

Human Pose Estimation using Global and Local Normalization

no code implementations • ICCV 2017 • Ke Sun, Cuiling Lan, Junliang Xing, Wen-Jun Zeng, Dong Liu, Jingdong Wang

We present a two-stage normalization scheme, human body normalization and limb normalization, to make the distribution of the relative joint locations compact, resulting in easier learning of convolutional spatial models and more accurate pose estimation.

Pose Estimation

Paper
Add Code

Rethink ReLU to Training Better CNNs

no code implementations • 19 Sep 2017 • Gangming Zhao, Zhao-Xiang Zhang, He Guan, Peng Tang, Jingdong Wang

Most of convolutional neural networks share the same characteristic: each convolutional layer is followed by a nonlinear activation layer where Rectified Linear Unit (ReLU) is the most widely used.

Paper
Add Code

Deeply-Learned Part-Aligned Representations for Person Re-Identification

1 code implementation • ICCV 2017 • Liming Zhao, Xi Li, Jingdong Wang, Yueting Zhuang

In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras.

Ranked #106 on Person Re-Identification on Market-1501

Person Re-Identification

Paper
Code

Orthogonal and Idempotent Transformations for Learning Deep Neural Networks

no code implementations • 19 Jul 2017 • Jingdong Wang, Yajie Xing, Kexin Zhang, Cha Zhang

Identity transformations, used as skip-connections in residual networks, directly connect convolutional layers close to the input and those close to the output in deep neural networks, improving information flow and thus easing the training.

Paper
Add Code

Interleaved Group Convolutions for Deep Neural Networks

2 code implementations • 10 Jul 2017 • Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.

190

Paper
Code

Learning Correspondence Structures for Person Re-identification

no code implementations • 20 Mar 2017 • Weiyao Lin, Yang shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu

We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair.

Patch Matching Person Re-Identification

Paper
Add Code

Deep Convolutional Neural Networks with Merge-and-Run Mappings

4 code implementations • 23 Nov 2016 • Liming Zhao, Jingdong Wang, Xi Li, Zhuowen Tu, Wen-Jun Zeng

A deep residual network, built by stacking a sequence of residual blocks, is easy to train, because identity mappings skip residual branches and thus improve information flow.

228

Paper
Code

Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of Neurons

no code implementations • 21 Jul 2016 • Lingxi Xie, Qi Tian, John Flynn, Jingdong Wang, Alan Yuille

For this, we consider the neurons in the hidden layer as neural words, and construct a set of geometric neural phrases on top of them.

Image Classification

Paper
Add Code

A Survey on Learning to Hash

no code implementations • 1 Jun 2016 • Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen

In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.

Quantization

Paper
Add Code

Deeply-Fused Nets

2 code implementations • 25 May 2016 • Jingdong Wang, Zhen Wei, Ting Zhang, Wen-Jun Zeng

Second, in our suggested fused net formed by one deep and one shallow base networks, the flows of the information from the earlier intermediate layer of the deep base network to the output and from the input to the later intermediate layer of the deep base network are both improved.

228

Paper
Code

DisturbLabel: Regularizing CNN on the Loss Layer

2 code implementations • CVPR 2016 • Lingxi Xie, Jingdong Wang, Zhen Wei, Meng Wang, Qi Tian

During a long period of time we are combating over-fitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc.

Data Augmentation

Paper
Code

InterActive: Inter-Layer Activeness Propagation

no code implementations • CVPR 2016 • Lingxi Xie, Liang Zheng, Jingdong Wang, Alan Yuille, Qi Tian

An increasing number of computer vision tasks can be tackled with deep features, which are the intermediate outputs of a pre-trained Convolutional Neural Network.

Descriptive General Classification

Paper
Add Code

Good Practice in CNN Feature Transfer

no code implementations • 1 Apr 2016 • Liang Zheng, Yali Zhao, Shengjin Wang, Jingdong Wang, Qi Tian

The objective of this paper is the effective transfer of the Convolutional Neural Network (CNN) feature in image search and classification.

General Classification Image Retrieval

Paper
Add Code

A diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes

no code implementations • 16 Feb 2016 • Weiyao Lin, Yang Mi, Weiyue Wang, Jianxin Wu, Jingdong Wang, Tao Mei

These semantic regions can be used to recognize pre-defined activities in crowd scenes.

Clustering Optical Flow Estimation +1

Paper
Add Code

RIDE: Reversal Invariant Descriptor Enhancement

no code implementations • ICCV 2015 • Lingxi Xie, Jingdong Wang, Weiyao Lin, Bo Zhang, Qi Tian

In many fine-grained object recognition datasets, image orientation (left/right) might vary from sample to sample.

Object Recognition

Paper
Add Code

Scalable Person Re-Identification: A Benchmark

no code implementations • ICCV 2015 • Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, Qi Tian

As a minor contribution, inspired by recent advances in large-scale image search, this paper proposes an unsupervised Bag-of-Words descriptor.

Ranked #90 on Person Re-Identification on DukeMTMC-reID

Image Retrieval Person Re-Identification

Paper
Add Code

DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection

no code implementations • 19 Oct 2015 • Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, Jingdong Wang

A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner.

Image Segmentation Multi-Task Learning +6

Paper
Add Code

Co-Saliency Detection via Looking Deep and Wide

no code implementations • CVPR 2015 • Dingwen Zhang, Junwei Han, Chao Li, Jingdong Wang

In the proposed framework, the wide and deep information are explored for the object proposal windows extracted in each image, and the co-saliency scores are calculated by integrating the intra-image contrast and intra group consistency via a principled Bayesian formulation.

Co-Salient Object Detection Image Retrieval +1

Paper
Add Code

Similarity Learning on an Explicit Polynomial Kernel Feature Map for Person Re-Identification

no code implementations • CVPR 2015 • Dapeng Chen, Zejian yuan, Gang Hua, Nanning Zheng, Jingdong Wang

We follow the learning-to-rank methodology and learn a similarity function to maximize the difference between the similarity scores of matched and unmatched images for a same person.

Learning-To-Rank Patch Matching +1

Paper
Add Code

Sparse Composite Quantization

no code implementations • CVPR 2015 • Ting Zhang, Guo-Jun Qi, Jinhui Tang, Jingdong Wang

The benefit is that the distance evaluation between the query and the dictionary element (a sparse vector) is accelerated using the efficient sparse vector operation, and thus the cost of distance table computation is reduced a lot.

Quantization Retrieval

Paper
Add Code

Person Re-identification with Correspondence Structure Learning

1 code implementation • ICCV 2015 • Yang Shen, Weiyao Lin, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang

This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification.

Patch Matching Person Re-Identification

Paper
Code

Group $K$-Means

no code implementations • 5 Jan 2015 • Jianfeng Wang, Shuicheng Yan, Yi Yang, Mohan S. Kankanhalli, Shipeng Li, Jingdong Wang

We study how to learn multiple dictionaries from a dataset, and approximate any data point by the sum of the codewords each chosen from the corresponding dictionary.

Paper
Add Code

Salient Object Detection: A Discriminative Regional Feature Integration Approach

no code implementations • CVPR 2013 • Huaizu Jiang, Zejian yuan, Ming-Ming Cheng, Yihong Gong, Nanning Zheng, Jingdong Wang

Our method, which is based on multi-level image segmentation, utilizes the supervised learning approach to map the regional feature vector to a saliency score.

Image Segmentation Object +4

Paper
Add Code

Deep Regression for Face Alignment

no code implementations • 18 Sep 2014 • Baoguang Shi, Xiang Bai, Wenyu Liu, Jingdong Wang

In this paper, we present a deep regression approach for face alignment.

Face Alignment regression

Paper
Add Code

Hashing for Similarity Search: A Survey

no code implementations • 13 Aug 2014 • Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.

Paper
Add Code

Low-rank SIFT: An Affine Invariant Feature for Place Recognition

no code implementations • 7 Aug 2014 • Chao Yang, Shengnan Caih, Jingdong Wang, Long Quan

As an extension of SIFT, our method seeks to add prior to solve the ill-posed affine parameter estimation problem and normalizes them directly, and is applicable to objects with regular structures.

feature selection Translation

Paper
Add Code

Inner Product Similarity Search using Compositional Codes

no code implementations • 19 Jun 2014 • Chao Du, Jingdong Wang

This paper addresses the nearest neighbor search problem under inner product similarity and introduces a compact code-based approach.

Paper
Add Code

Orientational Pyramid Matching for Recognizing Indoor Scenes

no code implementations • CVPR 2014 • Lingxi Xie, Jingdong Wang, Baining Guo, Bo Zhang, Qi Tian

The novelty lies in that OPM uses the 3D orientations to form the pyramid and produce the pooling regions, which is unlike SPM that uses the spatial positions to form the pyramid.

General Classification Scene Classification +1

Paper
Add Code

Optimized Cartesian $K$-Means

no code implementations • 16 May 2014 • Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.

Quantization

Paper
Add Code

Fast Neighborhood Graph Search using Cartesian Concatenation

no code implementations • 11 Dec 2013 • Jingdong Wang, Jing Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo

This structure augments the neighborhood graph with a bridge graph.

Paper
Add Code

Fast Approximate $K$-Means via Cluster Closures

no code implementations • 11 Dec 2013 • Jingdong Wang, Jing Wang, Qifa Ke, Gang Zeng, Shipeng Li

Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are computed and each data point is re-assigned to its nearest center.

Clustering Image Retrieval +1

Paper
Add Code

Scalable $k$-NN graph construction

no code implementations • 30 Jul 2013 • Jingdong Wang, Jing Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li

The $k$-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct $k$-NN graphs remains a challenge, especially for large-scale high-dimensional data.

graph construction

Paper
Add Code

Hybrid Affinity Propagation

no code implementations • 30 Jul 2013 • Jingdong Wang, Hao Xu, Xian-Sheng Hua, Shipeng Li

We formulate this problem as finding a few image exemplars to represent the image set semantically and visually, and solve it in a hybrid way by exploiting both visual and textual information associated with images.

Paper
Add Code

Supervised Kernel Descriptors for Visual Recognition

no code implementations • CVPR 2013 • Peng Wang, Jingdong Wang, Gang Zeng, Weiwei Xu, Hongbin Zha, Shipeng Li

In visual recognition tasks, the design of low level image feature representation is fundamental.

General Classification Image Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.