TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Captioning	SCICAP	CNN+LSTM (Vision only, First sentence)	BLEU-4	0.0219	# 1
Image Captioning	SCICAP	CNN+LSTM (Text only, First sentence)	BLEU-4	0.0213	# 2
Image Captioning	SCICAP	CNN+LSTM (Vision only, Caption w/ <=100 words)	BLEU-4	0.0172	# 7
Image Captioning	SCICAP	CNN+LSTM (Vision only, Single-Sent Caption)	BLEU-4	0.0207	# 4
Image Captioning	SCICAP	CNN+LSTM (Text only, Single-Sent Caption)	BLEU-4	0.0212	# 3
Image Captioning	SCICAP	CNN+LSTM (Vision + Text, Single-Sent Caption)	BLEU-4	0.0202	# 6
Image Captioning	SCICAP	CNN+LSTM (Vision + Text, First sentence)	BLEU-4	0.0205	# 5
Image Captioning	SCICAP	CNN+LSTM (Vision + Text, Caption w/ <=100 words)	BLEU-4	0.0168	# 8
Image Captioning	SCICAP	CNN+LSTM (Text only, Caption w/ <=100 words)	BLEU-4	0.0165	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scicap-generating-captions-for-scientific/image-captioning-on-scicap)](https://paperswithcode.com/sota/image-captioning-on-scicap?p=scicap-generating-captions-for-scientific)`

SciCap: Generating Captions for Scientific Figures

Findings (EMNLP) 2021 · Ting-Yao Hsu, C. Lee Giles, Ting-Hao 'Kenneth' Huang ·

Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientific figures. To this end, we introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing - including figure-type classification, sub-figure identification, text normalization, and caption text selection - SCICAP contained more than two million figures extracted from over 290,000 papers. We then established baseline models that caption graph plots, the dominant (19.2%) figure type. The experimental results showed both opportunities and steep challenges of generating captions for scientific figures.

PDF Abstract Findings (EMNLP) 2021 PDF Findings (EMNLP) 2021 Abstract

Code

Add Remove Mark official

tingyaohsu/scicap official

Tasks

Add Remove

Image Captioning

Datasets

Introduced in the Paper:

SCICAP

Used in the Paper:

FigureQA

Results from the Paper

Edit

Ranked #1 on Image Captioning on SCICAP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Captioning	SCICAP	CNN+LSTM (Vision only, First sentence)	BLEU-4	0.0219	# 1	Compare
Image Captioning	SCICAP	CNN+LSTM (Text only, First sentence)	BLEU-4	0.0213	# 2	Compare
Image Captioning	SCICAP	CNN+LSTM (Vision only, Caption w/ <=100 words)	BLEU-4	0.0172	# 7	Compare
Image Captioning	SCICAP	CNN+LSTM (Vision only, Single-Sent Caption)	BLEU-4	0.0207	# 4	Compare
Image Captioning	SCICAP	CNN+LSTM (Text only, Single-Sent Caption)	BLEU-4	0.0212	# 3	Compare
Image Captioning	SCICAP	CNN+LSTM (Vision + Text, Single-Sent Caption)	BLEU-4	0.0202	# 6	Compare
Image Captioning	SCICAP	CNN+LSTM (Vision + Text, First sentence)	BLEU-4	0.0205	# 5	Compare
Image Captioning	SCICAP	CNN+LSTM (Vision + Text, Caption w/ <=100 words)	BLEU-4	0.0168	# 8	Compare
Image Captioning	SCICAP	CNN+LSTM (Text only, Caption w/ <=100 words)	BLEU-4	0.0165	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

SciCap: Generating Captions for Scientific Figures

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove