Search Results for author: Luo Zhong

Found 6 papers, 2 papers with code

Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering

1 code implementation22 Oct 2021 Zhongwei Xie, Ling Liu, Yanzhao Wu, Luo Zhong, Lin Li

This paper introduces a two-phase deep feature engineering framework for efficient learning of semantics enhanced joint embedding, which clearly separates the deep feature engineering in data preprocessing from training the text-image joint embedding model.

Cross-Modal Retrieval Feature Engineering +1

Visual-aware Attention Dual-stream Decoder for Video Captioning

no code implementations16 Oct 2021 Zhixin Sun, Xian Zhong, Shuqin Chen, Lin Li, Luo Zhong

Video captioning is a challenging task that captures different visual parts and describes them in sentences, for it requires visual and linguistic coherence.

Video Captioning Video Description

Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images

no code implementations9 Aug 2021 Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong

This paper presents a three-tier modality alignment approach to learning text-image joint embedding, coined as JEMA, for cross-modal retrieval of cooking recipes and food images.

Cross-Modal Retrieval Retrieval +1

Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service

1 code implementation2 Aug 2021 Zhongwei Xie, Ling Liu, Yanzhao Wu, Lin Li, Luo Zhong

We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance cross-modal retrieval services.

Cross-Modal Retrieval Retrieval

Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning

no code implementations2 Aug 2021 Zhongwei Xie, Ling Liu, Lin Li, Luo Zhong

This paper introduces a two-phase deep feature calibration framework for efficient learning of semantics enhanced text-image cross-modal joint embedding, which clearly separates the deep feature calibration in data preprocessing from training the joint embedding model.

Feature Engineering

Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings

no code implementations4 Oct 2018 Zhongwei Xie, Lin Li, Xian Zhong, Luo Zhong

In this paper, we propose an end-to-end neural network framework for image-to-video person reidentification by leveraging cross-modal embeddings learned from extra information. Concretely speaking, cross-modal embeddings from image captioning and video captioning models are reused to help learned features be projected into a coordinated space, where similarity can be directly computed.

Image Captioning Image-To-Video Person Re-Identification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.