Search Results for author: Jianfeng Dong

Found 33 papers, 23 papers with code

Representation Alignment Contrastive Regularization for Multi-Object Tracking

1 code implementation3 Apr 2024 Zhonglin Liu, ShuJie Chen, Jianfeng Dong, Xun Wang, Di Zhou

Achieving high-performance in multi-object tracking algorithms heavily relies on modeling spatio-temporal relationships during the data association stage.

Multi-Object Tracking Object

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

1 code implementation15 Dec 2023 Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang, Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang

Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval.

Image Retrieval Retrieval +1

CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer

no code implementations14 Dec 2023 Yabing Wang, Fan Wang, Jianfeng Dong, Hao Luo

Cross-lingual cross-modal retrieval has garnered increasing attention recently, which aims to achieve the alignment between vision and target language (V-T) without using any annotated V-T data pairs.

Cross-Lingual Transfer Cross-Modal Retrieval +4

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

1 code implementation6 Nov 2023 Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang

In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.

Action Understanding Representation Learning +1

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

1 code implementation13 Sep 2023 Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang

We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature.

Disentanglement

From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval

1 code implementation17 May 2023 Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang, Jixiang Zhu, Baolong Liu

As the attribute-specific similarity typically corresponds to the specific subtle regions of images, we propose a Region-to-Patch Framework (RPF) that consists of a region-aware branch and a patch-aware branch to extract fine-grained attribute-related visual features for precise retrieval in a coarse-to-fine manner.

Attribute Contrastive Learning +2

Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning

1 code implementation5 Dec 2022 Jianfeng Dong, Shengkai Sun, Zhonglin Liu, ShuJie Chen, Baolong Liu, Xun Wang

This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework.

Action Recognition Representation Learning +2

Partially Relevant Video Retrieval

1 code implementation26 Aug 2022 Jianfeng Dong, Xianke Chen, Minsong Zhang, Xun Yang, ShuJie Chen, Xirong Li, Xun Wang

To fill the gap, we propose in this paper a novel T2VR subtask termed Partially Relevant Video Retrieval (PRVR).

Moment Retrieval Multiple Instance Learning +5

Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning

1 code implementation26 Aug 2022 Yabing Wang, Jianfeng Dong, Tianxiang Liang, Minsong Zhang, Rui Cai, Xun Wang

In this paper, we propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages.

Cross-Modal Retrieval Machine Translation +3

Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval

1 code implementation3 Dec 2021 Fan Hu, Aozhu Chen, Ziyue Wang, Fangming Zhou, Jianfeng Dong, Xirong Li

In this paper we revisit feature fusion, an old-fashioned topic, in the new context of text-to-video retrieval.

 Ranked #1 on Ad-hoc video search on TRECVID-AVS20 (V3C1) (using extra training data)

Ad-hoc video search feature selection +3

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

no code implementations EMNLP 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.

Sentence

Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

1 code implementation6 Apr 2021 Jianfeng Dong, Zhe Ma, Xiaofeng Mao, Xun Yang, Yuan He, Richang Hong, Shouling Ji

In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute between fashion items.

Attribute

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation CVPR 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Hierarchical Similarity Learning for Language-based Product Image Retrieval

1 code implementation18 Feb 2021 Zhe Ma, Fenghao Liu, Jianfeng Dong, Xiaoye Qu, Yuan He, Shouling Ji

In this paper, we focus on the cross-modal similarity measurement, and propose a novel Hierarchical Similarity Learning (HSL) network.

Image Retrieval Retrieval +1

Progressive Localization Networks for Language-based Moment Localization

no code implementations2 Feb 2021 Qi Zheng, Jianfeng Dong, Xiaoye Qu, Xun Yang, Yabing Wang, Pan Zhou, Baolong Liu, Xun Wang

The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments.

Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network

no code implementations COLING 2020 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.

Sentence

Dual Encoding for Video Retrieval by Text

1 code implementation10 Sep 2020 Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang, Meng Wang

In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.

Ranked #3 on Ad-hoc video search on TRECVID-AVS16 (IACC.3) (using extra training data)

Ad-hoc video search Retrieval +2

Fine-grained Iterative Attention Network for TemporalLanguage Localization in Videos

no code implementations6 Aug 2020 Xiaoye Qu, Pengwei Tang, Zhikang Zhou, Yu Cheng, Jianfeng Dong, Pan Zhou

In this paper, we propose a Fine-grained Iterative Attention Network (FIAN) that consists of an iterative attention module for bilateral query-video in-formation extraction.

Sentence

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

1 code implementation4 Aug 2020 Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.

Graph Attention Sentence

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

no code implementations6 Jul 2020 Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua

To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos.

Retrieval Video Retrieval

Feature Re-Learning with Data Augmentation for Video Relevance Prediction

1 code implementation8 Apr 2020 Jianfeng Dong, Xun Wang, Leimin Zhang, Chaoxi Xu, Gang Yang, Xirong Li

Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval.

Data Augmentation Retrieval

Exploring Human-like Attention Supervision in Visual Question Answering

no code implementations19 Sep 2017 Tingting Qiao, Jianfeng Dong, Duanqing Xu

Since there is a lack of human attention data, we first propose a Human Attention Network (HAN) to generate human-like attention maps, training on a recently released dataset called Human ATtention Dataset (VQA-HAT).

Question Answering Visual Question Answering

Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild

1 code implementation5 Sep 2017 Jianfeng Dong, Xirong Li, Duanqing Xu

To quantify the current progress, we propose a simple text2image method, representing a novel test query by a set of images selected from large-scale query log.

Image Retrieval Retrieval

Predicting Visual Features from Text for Image and Video Caption Retrieval

1 code implementation5 Sep 2017 Jianfeng Dong, Xirong Li, Cees G. M. Snoek

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video.

Retrieval Sentence +1

Fluency-Guided Cross-Lingual Image Captioning

1 code implementation15 Aug 2017 Weiyu Lan, Xirong Li, Jianfeng Dong

The framework comprises a module to automatically estimate the fluency of the sentences and another module to utilize the estimated fluency scores to effectively train an image captioning model for the target language.

Image Captioning

Learning Deep Representations Using Convolutional Auto-encoders with Symmetric Skip Connections

1 code implementation28 Nov 2016 Jianfeng Dong, Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang

In this paper, we investigate convolutional denoising auto-encoders to show that unsupervised pre-training can still improve the performance of high-level image related tasks such as image classification and semantic segmentation.

Denoising General Classification +4

Word2VisualVec: Image and Video to Sentence Matching by Visual Feature Prediction

no code implementations23 Apr 2016 Jianfeng Dong, Xirong Li, Cees G. M. Snoek

This paper strives to find the sentence best describing the content of an image or video.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.