Search Results for author: Daqing Liu

Found 20 papers, 12 papers with code

Decompose Semantic Shifts for Composed Image Retrieval

no code implementations • 18 Sep 2023 • Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang

Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image.

Image Retrieval Retrieval

Paper
Add Code

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

no code implementations • 1 Jun 2023 • Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, DaCheng Tao, Tat-Jen Cham

In this work, we propose Cocktail, a pipeline to mix various modalities into one embedding, amalgamated with a generalized ControlNet (gControlNet), a controllable normalisation (ControlNorm), and a spatial guidance sampling method, to actualize multi-modal and spatially-refined control for text-conditional diffusion models.

Conditional Image Generation

Paper
Add Code

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

no code implementations • 10 May 2023 • Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, DaCheng Tao

To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where modalities are imperfectly complementary, i. e., composed multimodal conditional image synthesis (CMCIS).

Image Generation

Paper
Add Code

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.

Vision and Language Navigation

Paper
Code

OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

no code implementations • 1 Mar 2023 • Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, DaCheng Tao

Automated machine learning (AutoML) seeks to build ML models with minimal human effort.

AutoML

Paper
Add Code

Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

1 code implementation • 5 Feb 2023 • Zuopeng Yang, Tianshu Chu, Xin Lin, Erdun Gao, Daqing Liu, Jie Yang, Chaoyue Wang

The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process.

Text-to-Image Generation

Paper
Code

Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning

1 code implementation • CVPR 2023 • Heng Zhang, Daqing Liu, Qi Zheng, Bing Su

Specifically, we enforce the embeddings of the frame sequence of interest to approximate a goal-oriented stochastic process, i. e., Brownian bridge, in the latent space via a process-based contrastive loss.

Contrastive Learning Representation Learning +3

Paper
Code

Exploring Temporal Concurrency for Video-Language Representation Learning

no code implementations • ICCV 2023 • Heng Zhang, Daqing Liu, Zezhong Lv, Bing Su, DaCheng Tao

Paired video and language data is naturally temporal concurrency, which requires the modeling of the temporal dynamics within each modality and the temporal alignment across modalities simultaneously.

Dynamic Time Warping Metric Learning +6

Paper
Add Code

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

1 code implementation • 21 Nov 2022 • Qi Zheng, Chaoyue Wang, Daqing Liu, Dadong Wang, DaCheng Tao

For each positive pair, we regard the images from different graphs as negative samples and deduct the version of multi-positive contrastive learning.

Contrastive Learning Question Answering +1

Paper
Code

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

1 code implementation • 21 Jun 2022 • Gang Li, Heliang Zheng, Daqing Liu, Chaoyue Wang, Bing Su, Changwen Zheng

In this paper, we explore a potential visual analogue of words, i. e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy.

Language Modelling Masked Language Modeling +1

Paper
Code

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation • 14 Jun 2022 • Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

150

Paper
Code

Modeling Image Composition for Complex Scene Generation

1 code implementation • CVPR 2022 • Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, DaCheng Tao

Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training.

Layout-to-Image Generation Object +1

Paper
Code

Compact Bidirectional Transformer for Image Captioning

1 code implementation • 6 Jan 2022 • Yuanen Zhou, Zhenzhen Hu, Daqing Liu, Huixia Ben, Meng Wang

In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly while the decoder can be executed parallelly.

Image Captioning Sentence

Paper
Code

Learning to Discretely Compose Reasoning Module Networks for Video Captioning

1 code implementation • 17 Jul 2020 • Ganchao Tan, Daqing Liu, Meng Wang, Zheng-Jun Zha

However, existing visual reasoning methods designed for visual question answering are not appropriate to video captioning, for it requires more complex visual reasoning on videos over both space and time, and dynamic module composition along the generation process.

Question Answering Sentence +3

Paper
Code

More Grounded Image Captioning by Distilling Image-Text Matching Model

1 code implementation • CVPR 2020 • Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang

To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision.

Image Captioning Image-text matching +4

Paper
Code

Joint Visual Grounding with Language Scene Graphs

no code implementations • 9 Jun 2019 • Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, Qianru Sun

In this paper, we alleviate the missing-annotation problem and enable the joint reasoning by leveraging the language scene graph which covers both labeled referent and unlabeled contexts (other objects, attributes, and relationships).

Referring Expression Visual Grounding

Paper
Add Code

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

1 code implementation • 6 Jun 2019 • Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.

Image Captioning Image Paragraph Captioning +2

Paper
Code

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

no code implementations • 5 Jun 2019 • Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, Hanwang Zhang

Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space.

Visual Grounding Visual Reasoning

Paper
Add Code

Learning to Assemble Neural Module Tree Networks for Visual Grounding

no code implementations • ICCV 2019 • Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha

In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed.

Dependency Parsing Natural Language Visual Grounding +5

Paper
Add Code

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

1 code implementation • 16 Aug 2018 • Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu

To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning.

Image Captioning Reinforcement Learning (RL)

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.