ESP (Evaluation for Styled Prompt)

Introduced by Yu et al. in Multimodal Knowledge Alignment with Reinforcement Learning

ESP dataset (Evaluation for Styled Prompt dataset) is a benchmark for zero-shot domain-conditional caption generation. ESP is a new dataset focusing on providing multiple styled text targets for the same image. It comprises 4.8k captions from 1k images in the COCO Captions test set. We collect five text domains with everyday usage: blog, social media, instruction, story, and news.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages