Search Results for author: Josiah Wang

Found 21 papers, 5 papers with code

MultiSubs: A Large-scale Multimodal and Multilingual Dataset

1 code implementation LREC 2022 Josiah Wang, Pranava Madhyastha, Josiel Figueiredo, Chiraag Lala, Lucia Specia

The dataset will benefit research on visual grounding of words especially in the context of free-form sentences, and can be obtained from https://doi. org/10. 5281/zenodo. 5034604 under a Creative Commons licence.

Multimodal Lexical Translation Multimodal Text Prediction +2

Transformer-based Cascaded Multimodal Speech Translation

no code implementations EMNLP (IWSLT) 2019 Zixiu Wu, Ozan Caglayan, Julia Ive, Josiah Wang, Lucia Specia

Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Imperial College London Submission to VATEX Video Captioning Task

no code implementations16 Oct 2019 Ozan Caglayan, Zixiu Wu, Pranava Madhyastha, Josiah Wang, Lucia Specia

This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features.

Video Captioning

Phrase Localization Without Paired Training Examples

1 code implementation ICCV 2019 Josiah Wang, Lucia Specia

Localizing phrases in images is an important part of image understanding and can be useful in many applications that require mappings between textual and visual information.

Semantic Similarity Semantic Textual Similarity

Predicting Actions to Help Predict Translations

no code implementations5 Aug 2019 Zixiu Wu, Julia Ive, Josiah Wang, Pranava Madhyastha, Lucia Specia

The question we ask ourselves is whether visual features can support the translation process, in particular, given that this is a dataset extracted from videos, we focus on the translation of actions, which we believe are poorly captured in current static image-text datasets currently used for multimodal translation.

Translation

VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions

no code implementations ACL 2019 Pranava Madhyastha, Josiah Wang, Lucia Specia

It estimates the faithfulness of a generated caption with respect to the content of the actual image, based on the semantic similarity between labels of objects depicted in images and words in the description.

Semantic Similarity Semantic Textual Similarity

End-to-end Image Captioning Exploits Distributional Similarity in Multimodal Space

1 code implementation WS 2018 Pranava Swaroop Madhyastha, Josiah Wang, Lucia Specia

We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn {`}distributional similarity{'} in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space.

Image Captioning Text Generation

End-to-end Image Captioning Exploits Multimodal Distributional Similarity

no code implementations11 Sep 2018 Pranava Madhyastha, Josiah Wang, Lucia Specia

We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn `distributional similarity' in a multimodal feature space by mapping a test image to similar training images in this space and generating a caption from the same space.

Image Captioning Text Generation

Defoiling Foiled Image Captions

1 code implementation NAACL 2018 Pranava Madhyastha, Josiah Wang, Lucia Specia

We address the task of detecting foiled image captions, i. e. identifying whether a caption contains a word that has been deliberately replaced by a semantically similar word, thus rendering it inaccurate with respect to the image being described.

Descriptive Image Captioning +1

Object Counts! Bringing Explicit Detections Back into Image Captioning

no code implementations NAACL 2018 Josiah Wang, Pranava Madhyastha, Lucia Specia

The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding.

Image Captioning Language Modelling +1

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

no code implementations9 Jan 2018 Yu-Xing Tang, Josiah Wang, Xiaofang Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, Liming Chen

This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations.

Object object-detection +3

What is image captioning made of?

1 code implementation ICLR 2018 Pranava Madhyastha, Josiah Wang, Lucia Specia

We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn ‘distributional similarity’ in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space.

Image Captioning Text Generation

Large Scale Semi-Supervised Object Detection Using Visual and Semantic Knowledge Transfer

no code implementations CVPR 2016 Yu-Xing Tang, Josiah Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, Liming Chen

This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations.

Object object-detection +3

Cross-validating Image Description Datasets and Evaluation Metrics

no code implementations LREC 2016 Josiah Wang, Robert Gaizauskas

The task of automatically generating sentential descriptions of image content has become increasingly popular in recent years, resulting in the development of large-scale image description datasets and the proposal of various metrics for evaluating image description generation systems.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.