controllable image captioning

generate image captions conditioned on control signals

Most implemented papers

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

aimagelab/show-control-and-tell CVPR 2019

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior.

Length-Controllable Image Captioning

bearcatt/LaBERT ECCV 2020

We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.

Language-Driven Region Pointer Advancement for Controllable Image Captioning

AnnikaLindh/Controllable_Region_Pointer_Advancement COLING 2020

A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer.

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

mad-red/VSR-guided-CIC CVPR 2021

However, we argue that almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal: 1) Event-compatible: all visual contents referred to in a single sentence should be compatible with the described activity.

CLID: Controlled-Length Image Descriptions with Limited Data

eladhi/clid 27 Nov 2022

Controllable image captioning models generate human-like image descriptions, enabling some kind of control over the generated captions.

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

ttengwang/caption-anything 4 May 2023

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.