Search Results for author: Stéphane Dupont

Found 22 papers, 6 papers with code

Visually Grounded Word Embeddings and Richer Visual Features for Improving Multimodal Neural Machine Translation

no code implementations4 Jul 2017 Jean-Benoit Delbrouck, Stéphane Dupont, Omar Seddati

In Multimodal Neural Machine Translation (MNMT), a neural model generates a translated sentence that describes an image, given the image itself and one source descriptions in English.

Dense Captioning Machine Translation +5

Modulating and attending the source image during encoding improves Multimodal Translation

1 code implementation9 Dec 2017 Jean-Benoit Delbrouck, Stéphane Dupont

We propose a new and fully end-to-end approach for multimodal translation where the source text encoder modulates the entire visual input processing using conditional batch normalization, in order to compute the most informative image features for our task.

Translation

Proceedings of eNTERFACE 2015 Workshop on Intelligent Interfaces

no code implementations19 Jan 2018 Matei Mancas, Christian Frisson, Joëlle Tilmanne, Nicolas D'Alessandro, Petr Barborka, Furkan Bayansar, Francisco Bernard, Rebecca Fiebrink, Alexis Heloir, Edgar Hemery, Sohaib Laraba, Alexis Moinet, Fabrizio Nunnari, Thierry Ravet, Loïc Reboursière, Alvaro Sarasua, Mickaël Tits, Noé Tits, François Zajéga, Paolo Alborno, Ksenia Kolykhalova, Emma Frid, Damiano Malafronte, Lisanne Huis in't Veld, Hüseyin Cakmak, Kevin El Haddad, Nicolas Riche, Julien Leroy, Pierre Marighetto, Bekir Berker Türker, Hossein Khaki, Roberto Pulisci, Emer Gilmartin, Fasih Haider, Kübra Cengiz, Martin Sulir, Ilaria Torre, Shabbir Marzban, Ramazan Yazıcı, Furkan Burak Bâgcı, Vedat Gazi Kılı, Hilal Sezer, Sena Büsra Yenge, Charles-Alexandre Delestage, Sylvie Leleu-Merviel, Muriel Meyer-Chemenska, Daniel Schmitt, Willy Yvart, Stéphane Dupont, Ozan Can Altiok, Aysegül Bumin, Ceren Dikmen, Ivan Giangreco, Silvan Heller, Emre Külah, Gueorgui Pironkov, Luca Rossetto, Yusuf Sahillioglu, Heiko Schuldt, Omar Seddati, Yusuf Setinkaya, Metin Sezgin, Claudiu Tanase, Emre Toyan, Sean Wood, Doguhan Yeke, Françcois Rocca, Pierre-Henri De Deken, Alessandra Bandrabur, Fabien Grisard, Axel Jean-Caurant, Vincent Courboulay, Radhwan Ben Madhkour, Ambroise Moreau

The 11th Summer Workshop on Multimodal Interfaces eNTERFACE 2015 was hosted by the Numediart Institute of Creative Technologies of the University of Mons from August 10th to September 2015.

UMONS Submission for WMT18 Multimodal Translation Task

1 code implementation15 Oct 2018 Jean-Benoit Delbrouck, Stéphane Dupont

This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18).

Image Captioning Multimodal Machine Translation +1

Bringing back simplicity and lightliness into neural image captioning

no code implementations15 Oct 2018 Jean-Benoit Delbrouck, Stéphane Dupont

So far, the goal has been to maximize scores on automated metric and to do so, one has to come up with a plurality of new modules and techniques.

Caption Generation Image Captioning +2

Object-oriented Targets for Visual Navigation using Rich Semantic Representations

no code implementations22 Nov 2018 Jean-Benoit Delbrouck, Stéphane Dupont

When searching for an object humans navigate through a scene using semantic information and spatial relationships.

Navigate Object +1

Adversarial reconstruction for Multi-modal Machine Translation

no code implementations7 Oct 2019 Jean-Benoit Delbrouck, Stéphane Dupont

Even with the growing interest in problems at the intersection of Computer Vision and Natural Language, grounding (i. e. identifying) the components of a structured description in an image still remains a challenging task.

Machine Translation Translation

Modulated Self-attention Convolutional Network for VQA

no code implementations8 Oct 2019 Jean-Benoit Delbrouck, Antoine Maiorca, Nathan Hubens, Stéphane Dupont

As new data-sets for real-world visual reasoning and compositional question answering are emerging, it might be needed to use the visual feature extraction as a end-to-end process during training.

Question Answering Visual Question Answering +1

AVECL-UMONS database for audio-visual event classification and localization

no code implementations2 Oct 2020 Mathilde Brousmiche, Stéphane Dupont, Jean Rouat

We introduce the AVECL-UMons dataset for audio-visual event classification and localization in the context of office environments.

General Classification

Improved Soccer Action Spotting using both Audio and Video Streams

no code implementations9 Nov 2020 Bastien Vanderplaetse, Stéphane Dupont

Action spotting and classification are the tasks that consist in finding the temporal anchors of events in a video and determine which event they are.

Action Classification Action Spotting +2

Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex

no code implementations20 May 2022 Hugo Bohy, Ahmad Hammoudeh, Antoine Maiorca, Stéphane Dupont, Thierry Dutoit

Laughter is not just an audio signal, but an intrinsic relationship of multimodal non-verbal communication, in addition to audio, it includes facial expressions and body movements.

Motion Synthesis

Transformers and CNNs both Beat Humans on SBIR

no code implementations14 Sep 2022 Omar Seddati, Stéphane Dupont, Saïd Mahmoudi, Thierry Dutoit

Sketch-based image retrieval (SBIR) is the task of retrieving natural images (photos) that match the semantics and the spatial configuration of hand-drawn sketch queries.

Retrieval Sketch-Based Image Retrieval

A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with Batch Normalization and Knowledge Distillation

no code implementations30 May 2023 Omar Seddati, Nathan Hubens, Stéphane Dupont, Thierry Dutoit

Then, we introduce a Relative Triplet Loss (RTL), an adapted triplet loss to overcome those limitations through loss weighting based on anchors similarity.

Data Augmentation Knowledge Distillation +2

Deep learning in medical image registration: introduction and survey

no code implementations1 Sep 2023 Ahmad Hammoudeh, Stéphane Dupont

Image registration (IR) is a process that deforms images to align them with respect to a reference space, making it easier for medical practitioners to examine various medical images in a standardized reference frame, such as having the same rotation and scale.

Image Registration Medical Image Registration

Analysis of Co-Laughter Gesture Relationship on RGB Videos in Dyadic Conversation Context

no code implementations SmiLa (LREC) 2022 Hugo Bohy, Ahmad Hammoudeh, Antoine Maiorca, Stéphane Dupont, Thierry Dutoit

Laughter is not just an audio signal, but an intrinsic relationship of multimodal non-verbal communication, in addition to audio, it includes facial expressions and body movements.

Motion Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.