Search Results for author: Jianfeng Dong

Found 33 papers, 23 papers with code

Representation Alignment Contrastive Regularization for Multi-Object Tracking

1 code implementation • 3 Apr 2024 • Zhonglin Liu, ShuJie Chen, Jianfeng Dong, Xun Wang, Di Zhou

Achieving high-performance in multi-object tracking algorithms heavily relies on modeling spatio-temporal relationships during the data association stage.

Multi-Object Tracking Object

Paper
Code

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

1 code implementation • 15 Dec 2023 • Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang, Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang

Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval.

Image Retrieval Retrieval +1

Paper
Code

CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer

no code implementations • 14 Dec 2023 • Yabing Wang, Fan Wang, Jianfeng Dong, Hao Luo

Cross-lingual cross-modal retrieval has garnered increasing attention recently, which aims to achieve the alignment between vision and target language (V-T) without using any annotated V-T data pairs.

Cross-Lingual Transfer Cross-Modal Retrieval +4

Paper
Add Code

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

1 code implementation • 6 Nov 2023 • Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang

In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.

Action Understanding Representation Learning +1

Paper
Code

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

1 code implementation • 13 Sep 2023 • Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang

We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature.

Disentanglement

Paper
Code

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

no code implementations • 11 Sep 2023 • Yabing Wang, Shuhui Wang, Hao Luo, Jianfeng Dong, Fan Wang, Meng Han, Xun Wang, Meng Wang

Therefore, we propose Dual-view Curricular Optimal Transport (DCOT) to learn with noisy correspondence in CCR.

Cross-Lingual Transfer Cross-Modal Retrieval +2

Paper
Add Code

From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval

1 code implementation • 17 May 2023 • Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang, Jixiang Zhu, Baolong Liu

As the attribute-specific similarity typically corresponds to the specific subtle regions of images, we propose a Region-to-Patch Framework (RPF) that consists of a region-aware branch and a patch-aware branch to extract fine-grained attribute-related visual features for precise retrieval in a coarse-to-fine manner.

Attribute Contrastive Learning +2

Paper
Code

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

no code implementations • 6 May 2023 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao Wang, Xing Di, Weining Lu, Yu Cheng

This paper addresses the temporal sentence grounding (TSG).

Sentence Temporal Sentence Grounding

Paper
Add Code

Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval

1 code implementation • ICCV 2023 • Jianfeng Dong, Minsong Zhang, Zheng Zhang, Xianke Chen, Daizong Liu, Xiaoye Qu, Xun Wang, Baolong Liu

During the knowledge distillation, an inheritance student branch is devised to absorb the knowledge from the teacher model.

Knowledge Distillation Language Modelling +3

Paper
Code

Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning

1 code implementation • 5 Dec 2022 • Jianfeng Dong, Shengkai Sun, Zhonglin Liu, ShuJie Chen, Baolong Liu, Xun Wang

This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework.

Action Recognition Representation Learning +2

Paper
Code

Partially Relevant Video Retrieval

1 code implementation • 26 Aug 2022 • Jianfeng Dong, Xianke Chen, Minsong Zhang, Xun Yang, ShuJie Chen, Xirong Li, Xun Wang

To fill the gap, we propose in this paper a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR).

Ranked #1 on Partially Relevant Video Retrieval on TVR

Moment Retrieval Multiple Instance Learning +5

Paper
Code

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

1 code implementation • 26 Aug 2022 • Yabing Wang, Jianfeng Dong, Tianxiang Liang, Minsong Zhang, Rui Cai, Xun Wang

In this paper, we propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages.

Cross-Modal Retrieval Machine Translation +3

Paper
Code

Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval

1 code implementation • 23 Jan 2022 • Jianfeng Dong, Yabing Wang, Xianke Chen, Xiaoye Qu, Xirong Li, Yuan He, Xun Wang

In this work, we concentrate on video representation learning, an essential component for text-to-video retrieval.

Representation Learning Retrieval +5

Paper
Code

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

1 code implementation • 3 Dec 2021 • Fan Hu, Aozhu Chen, Ziyue Wang, Fangming Zhou, Jianfeng Dong, Xirong Li

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.

Ranked #1 on Ad-hoc video search on TRECVID-AVS20 (V3C1) (using extra training data)

Ad-hoc video search feature selection +3

Paper
Code

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.

Sentence

Paper
Add Code

Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

1 code implementation • 6 Apr 2021 • Jianfeng Dong, Zhe Ma, Xiaofeng Mao, Xun Yang, Yuan He, Richang Hong, Shouling Ji

In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute between fashion items.

Attribute

Paper
Code

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Paper
Code

Hierarchical Similarity Learning for Language-based Product Image Retrieval

1 code implementation • 18 Feb 2021 • Zhe Ma, Fenghao Liu, Jianfeng Dong, Xiaoye Qu, Yuan He, Shouling Ji

In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network.

Image Retrieval Retrieval +1

Paper
Code

Progressive Localization Networks for Language-based Moment Localization

no code implementations • 2 Feb 2021 • Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang

The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.

Paper
Add Code

Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network

no code implementations • COLING 2020 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.

Sentence

Paper
Add Code

Dual Encoding for Video Retrieval by Text

1 code implementation • 10 Sep 2020 • Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang, Meng Wang

In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.

Ranked #3 on Ad-hoc video search on TRECVID-AVS16 (IACC.3) (using extra training data)

Ad-hoc video search Retrieval +2

Paper
Code

Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos

no code implementations • 6 Aug 2020 • Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou

In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.

Sentence

Paper
Add Code

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

1 code implementation • 4 Aug 2020 • Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.

Graph Attention Sentence

Paper
Code

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

no code implementations • 6 Jul 2020 • Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua

To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos.

Retrieval Video Retrieval

Paper
Add Code

Feature Re-Learning with Data Augmentation for Video Relevance Prediction

1 code implementation • 8 Apr 2020 • Jianfeng Dong, Xun Wang, Leimin Zhang, Chaoxi Xu, Gang Yang, Xirong Li

Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval.

Data Augmentation Retrieval

Paper
Code

Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network

1 code implementation • 7 Feb 2020 • Zhe Ma, Jianfeng Dong, Yao Zhang, Zhongzi Long, Yuan He, Hui Xue, Shouling Ji

This paper strives to learn fine-grained fashion similarity.

Attribute

Paper
Code

Dual Encoding for Zero-Example Video Retrieval

1 code implementation • CVPR 2019 • Jianfeng Dong, Xirong Li, Chaoxi Xu, Shouling Ji, Yuan He, Gang Yang, Xun Wang

This paper attacks the challenging problem of zero-example video retrieval.

Ad-hoc video search Retrieval +1

155

Paper
Code

Exploring Human-like Attention Supervision in Visual Question Answering

no code implementations • 19 Sep 2017 • Tingting Qiao, Jianfeng Dong, Duanqing Xu

Since there is a lack of human attention data, we first propose a Human Attention Network (HAN) to generate human-like attention maps, training on a recently released dataset called Human ATtention Dataset (VQA-HAT).

Question Answering Visual Question Answering

Paper
Add Code

Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild

1 code implementation • 5 Sep 2017 • Jianfeng Dong, Xirong Li, Duanqing Xu

To quantify the current progress, we propose a simple text2image method, representing a novel test query by a set of images selected from large-scale query log.

Image Retrieval Retrieval

Paper
Code

Predicting Visual Features from Text for Image and Video Caption Retrieval

1 code implementation • 5 Sep 2017 • Jianfeng Dong, Xirong Li, Cees G. M. Snoek

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.

Retrieval Sentence +1

Paper
Code

Fluency-Guided Cross-Lingual Image Captioning

1 code implementation • 15 Aug 2017 • Weiyu Lan, Xirong Li, Jianfeng Dong

The framework comprises a module to automatically estimate the fluency of the sentences and another module to utilize the estimated fluency scores to effectively train an image captioning model for the target language.

Image Captioning

Paper
Code

Learning Deep Representations Using Convolutional Auto-encoders with Symmetric Skip Connections

1 code implementation • 28 Nov 2016 • Jianfeng Dong, Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

In this paper, we investigate convolutional denoising auto-encoders to show that unsupervised pre-training can still improve the performance of high-level image related tasks such as image classification and semantic segmentation.

Denoising General Classification +4

Paper
Code

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction

no code implementations • 23 Apr 2016 • Jianfeng Dong, Xirong Li, Cees G. M. Snoek

This paper strives to find the sentence best describing the content of an image or video.

Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.