Search Results for author: Rita Cucchiara

Found 123 papers, 67 papers with code

TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes

1 code implementation30 Oct 2024 Alessandro D'Amelio, Giuseppe Cartella, Vittorio Cuculo, Manuele Lucchi, Marcella Cornia, Rita Cucchiara, Giuseppe Boccignone

Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one.

Point Processes

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments

1 code implementation23 Oct 2024 Luca Barsellotti, Roberto Bigazzi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

In the last years, the research interest in visual navigation towards objects in indoor environments has grown significantly.

Object Visual Navigation

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

1 code implementation9 Oct 2024 Sara Sarto, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Metrics can indeed play a key role in the fine-tuning stage of captioning models, ultimately enhancing the quality of the generated captions.

Caption Generation Contrastive Learning

Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection

no code implementations16 Sep 2024 Federico Betti, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe

Diffusion models have significantly advanced generative AI, but they encounter difficulties when generating complex combinations of multiple objects.

Hallucination

KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction

no code implementations9 Sep 2024 Davide Di Nucci, Alessandro Simoni, Matteo Tomei, Luca Ciuffreda, Roberto Vezzani, Rita Cucchiara

The three-dimensional representation of objects or scenes starting from a set of images has been a widely discussed topic for years and has gained additional attention after the diffusion of NeRF-based approaches.

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

no code implementations29 Aug 2024 Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level.

Image Captioning Specificity

μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context

2 code implementations28 Aug 2024 Fabio Quattrini, Carmine Zaccagnino, Silvia Cascianelli, Laura Righi, Rita Cucchiara

Regesta are catalogs of summaries of other documents and, in some cases, are the only source of information about the content of such full-length documents.

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

1 code implementation28 Aug 2024 Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

To overcome this limitation, we propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting to improve the perceptual and semantical coherence of the generated panorama images.

Diversity Text-to-Image Generation

Alfie: Democratising RGBA Image Generation With No $$$

1 code implementation27 Aug 2024 Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling.

Image Generation Image Matting +2

Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization

1 code implementation26 Aug 2024 Nicholas Moratelli, Davide Caffagni, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

However, when attempting to optimize modern and higher-quality metrics like CLIP-Score and PAC-Score, this training method often encounters instability and fails to acquire the genuine descriptive capabilities needed to produce fluent and informative captions.

Descriptive Image Captioning

Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities

1 code implementation29 Jul 2024 Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara

To sustain the training of our model, we generate a comprehensive dataset that focuses on images generated by diffusion models and encompasses a collection of 9. 2 million images produced by using four different generators.

Contrastive Learning DeepFake Detection +2

BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues

1 code implementation29 Jul 2024 Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Effectively aligning with human judgment when evaluating machine-generated image captions represents a complex yet intriguing challenge.

Image Captioning

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

1 code implementation8 Jul 2024 Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Nicu Sebe

To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance.

Contrastive Learning Data Augmentation +2

Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes

no code implementations31 May 2024 Riccardo Benaglia, Angelo Porrello, Pietro Buzzega, Simone Calderara, Rita Cucchiara

Trajectory forecasting is crucial for video surveillance analytics, as it enables the anticipation of future movements for a set of agents, e. g. basketball players engaged in intricate interactions with long-term intentions.

Trajectory Forecasting

Sharing Key Semantics in Transformer Makes Efficient Image Restoration

no code implementations30 May 2024 Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Ming-Hsuan Yang, Nicu Sebe

Additionally, for IR, it is commonly noted that small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process, as they contribute essential contextual cues crucial for accurate reconstruction.

Image Restoration

A Second-Order Perspective on Model Compositionality and Incremental Learning

no code implementations25 May 2024 Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega, Monica Millunzi, Simone Calderara, Rita Cucchiara

The fine-tuning of deep pre-trained models has revealed compositional properties, with multiple specialized modules that can be arbitrarily composed into a single, multi-task model.

Incremental Learning

Towards Retrieval-Augmented Architectures for Image Captioning

no code implementations21 May 2024 Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Alessandro Nicolosi, Rita Cucchiara

The objective of image captioning models is to bridge the gap between the visual and linguistic modalities by generating natural language descriptions that accurately reflect the content of input images.

Image Captioning Language Modelling +1

Binarizing Documents by Leveraging both Space and Frequency

1 code implementation26 Apr 2024 Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

Document Image Binarization is a well-known problem in Document Analysis and Computer Vision, although it is far from being solved.

Binarization

AIGeN: An Adversarial Approach for Instruction Generation in VLN

no code implementations15 Apr 2024 Niyati Rawal, Roberto Bigazzi, Lorenzo Baraldi, Rita Cucchiara

VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal.

Decoder Vision and Language Navigation

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

2 code implementations21 Mar 2024 Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body.

Denoising Virtual Try-on

Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images

1 code implementation13 Mar 2024 Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Rita Cucchiara

Creating high-quality and realistic images is now possible thanks to the impressive advancements in image generation.

Fake Image Detection Image Generation +1

Mapping High-level Semantic Regions in Indoor Environments without Object Recognition

no code implementations11 Mar 2024 Roberto Bigazzi, Lorenzo Baraldi, Shreyas Kousik, Rita Cucchiara, Marco Pavone

Robots require a semantic understanding of their surroundings to operate in an efficient and explainable way in human environments.

Graph Generation Language Modelling +3

Trends, Applications, and Challenges in Human Attention Modelling

1 code implementation28 Feb 2024 Giuseppe Cartella, Marcella Cornia, Vittorio Cuculo, Alessandro D'Amelio, Dario Zanca, Giuseppe Boccignone, Rita Cucchiara

Human attention modelling has proven, in recent years, to be particularly useful not only for understanding the cognitive processes underlying visual exploration, but also for providing support to artificial intelligence models that aim to solve problems in various domains, including image and video processing, vision-and-language applications, and language modelling.

Language Modelling

VATr++: Choose Your Words Wisely for Handwritten Text Generation

no code implementations16 Feb 2024 Bram Vanherle, Vittorio Pippi, Silvia Cascianelli, Nick Michiels, Frank Van Reeth, Rita Cucchiara

Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models.

Benchmarking Text Generation

Key-Graph Transformer for Image Restoration

no code implementations4 Feb 2024 Bin Ren, Yawei Li, Jingyun Liang, Rakesh Ranjan, Mengyuan Liu, Rita Cucchiara, Luc van Gool, Nicu Sebe

While it is crucial to capture global information for effective image restoration (IR), integrating such cues into transformer-based methods becomes computationally expensive, especially with high input resolution.

Graph Attention Image Restoration

DistFormer: Enhancing Local and Global Features for Monocular Per-Object Distance Estimation

no code implementations6 Jan 2024 Aniello Panariello, Gianluca Mancusi, Fedy Haj Ali, Angelo Porrello, Simone Calderara, Rita Cucchiara

Existing approaches rely on two scales: local information (i. e., the bounding box proportions) or global information, which encodes the semantics of the scene as well as the spatial relations with neighboring objects.

Autonomous Driving Decoder +1

Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models

1 code implementation27 Nov 2023 Samuele Poppi, Tobia Poppi, Federico Cocchi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator.

Cross-Modal Retrieval Image Retrieval +6

HWD: A Novel Evaluation Score for Styled Handwritten Text Generation

1 code implementation31 Oct 2023 Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli, Rita Cucchiara

Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG.

Image Generation Perceptual Distance +1

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

2 code implementations11 Sep 2023 Giuseppe Cartella, Alberto Baldrati, Davide Morelli, Marcella Cornia, Marco Bertini, Rita Cucchiara

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements.

Contrastive Learning Domain Generalization +2

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

1 code implementation ICCV 2023 Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions.

Decoder Image Captioning

TrackFlow: Multi-Object Tracking with Normalizing Flows

no code implementations ICCV 2023 Gianluca Mancusi, Aniello Panariello, Angelo Porrello, Matteo Fabbri, Simone Calderara, Rita Cucchiara

The field of multi-object tracking has recently seen a renewed interest in the good old schema of tracking-by-detection, as its simplicity and strong priors spare it from the complex design and painful babysitting of tracking-by-attention approaches.

Multi-Object Tracking Object

Volumetric Fast Fourier Convolution for Detecting Ink on the Carbonized Herculaneum Papyri

1 code implementation9 Aug 2023 Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

Recent advancements in Digital Document Restoration (DDR) have led to significant breakthroughs in analyzing highly damaged written artifacts.

CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components

no code implementations24 Jul 2023 Davide Di Nucci, Alessandro Simoni, Matteo Tomei, Luca Ciuffreda, Roberto Vezzani, Rita Cucchiara

Neural Radiance Fields (NeRFs) have gained widespread recognition as a highly effective technique for representing 3D reconstructions of objects and scenes derived from sets of images.

Semantic Segmentation

Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation

no code implementations18 Jul 2023 Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe

Research in Image Generation has recently made significant progress, particularly boosted by the introduction of Vision-Language models which are able to produce high-quality visual content based on textual inputs.

Image Generation Question Answering +1

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

1 code implementation12 Jun 2023 Roberto Amoroso, Marcella Cornia, Lorenzo Baraldi, Andrea Pilzer, Rita Cucchiara

The use of self-supervised pre-training has emerged as a promising approach to enhance the performance of visual tasks such as image classification.

Image Classification

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

1 code implementation22 May 2023 Davide Morelli, Alberto Baldrati, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

In this context, image-based virtual try-on, which consists in generating a novel image of a target model wearing a given in-shop garment, has yet to capitalize on the potential of these powerful generative solutions.

Virtual Try-on

How to Choose Pretrained Handwriting Recognition Models for Single Writer Fine-Tuning

no code implementations4 May 2023 Vittorio Pippi, Silvia Cascianelli, Christopher Kermorvant, Rita Cucchiara

Recent advancements in Deep Learning-based Handwritten Text Recognition (HTR) have led to models with remarkable performance on both modern and historical manuscripts in large benchmark datasets.

Handwriting Recognition Handwritten Text Recognition +2

Multi-Class Unlearning for Image Classification via Weight Filtering

no code implementations4 Apr 2023 Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network.

Classification Image Classification +1

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

1 code implementation ICCV 2023 Alberto Baldrati, Davide Morelli, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara

Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner.

Multimodal fashion image editing

Evaluating Synthetic Pre-Training for Handwriting Processing Tasks

no code implementations4 Apr 2023 Vittorio Pippi, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

In this work, we explore massive pre-training on synthetic word images for enhancing the performance on four benchmark downstream handwriting analysis tasks.

Retrieval

Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images

2 code implementations2 Apr 2023 Roberto Amoroso, Davide Morelli, Marcella Cornia, Lorenzo Baraldi, Alberto del Bimbo, Rita Cucchiara

Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.

DeepFake Detection Face Swapping +2

Handwritten Text Generation from Visual Archetypes

1 code implementation CVPR 2023 Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training.

Text Generation

Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

1 code implementation CVPR 2023 Sara Sarto, Manuele Barraco, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The CLIP model has been recently proven to be very effective for a variety of cross-modal tasks, including the evaluation of captions generated from vision-and-language architectures.

Contrastive Learning Image Captioning +1

One Transformer for All Time Series: Representing and Training with Time-Dependent Heterogeneous Tabular Data

1 code implementation13 Feb 2023 Simone Luetto, Fabrizio Garuti, Enver Sangineto, Lorenzo Forni, Rita Cucchiara

There is a recent growing interest in applying Deep Learning techniques to tabular data, in order to replicate the success of other Artificial Intelligence areas in this structured domain.

Time Series Time Series Analysis

Input Perturbation Reduces Exposure Bias in Diffusion Models

1 code implementation27 Jan 2023 Mang Ning, Enver Sangineto, Angelo Porrello, Simone Calderara, Rita Cucchiara

Denoising Diffusion Probabilistic Models have shown an impressive generation quality, although their long sampling chain leads to high computational costs.

Denoising Image Generation +1

Embodied Agents for Efficient Exploration and Smart Scene Description

no code implementations17 Jan 2023 Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments.

Efficient Exploration Image Captioning +1

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

no code implementations17 Aug 2022 Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content.

Handwritten Text Recognition HTR

The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

no code implementations16 Aug 2022 Silvia Cascianelli, Vittorio Pippi, Martin Maarand, Marcella Cornia, Lorenzo Baraldi, Christopher Kermorvant, Rita Cucchiara

With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years.

Handwritten Text Recognition HTR

Consistency-based Self-supervised Learning for Temporal Anomaly Localization

1 code implementation10 Aug 2022 Aniello Panariello, Angelo Porrello, Simone Calderara, Rita Cucchiara

This work tackles Weakly Supervised Anomaly detection, in which a predictor is allowed to learn not only from normal examples but also from a few labeled anomalies made available during training.

Anomaly Detection In Surveillance Videos Anomaly Localization +4

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

1 code implementation29 Jul 2022 Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara

In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts.

Ranked #22 on Cross-Modal Retrieval on COCO 2014 (using extra training data)

Image-text matching Retrieval +1

Retrieval-Augmented Transformer for Image Captioning

no code implementations26 Jul 2022 Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process.

Image Captioning Retrieval

Maximum Class Separation as Inductive Bias in One Matrix

1 code implementation17 Jun 2022 Tejaswi Kasarla, Gertjan J. Burghouts, Max van Spengler, Elise van der Pol, Rita Cucchiara, Pascal Mettes

This paper proposes a simple alternative: encoding maximum separation as an inductive bias in the network by adding one fixed matrix multiplication before computing the softmax activations.

Inductive Bias Long-tail Learning +3

Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

1 code implementation CVPR 2023 Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, Wei Wang

In particular, MJP first shuffles the selected patches via our block-wise random jigsaw puzzle shuffle algorithm, and their corresponding PEs are occluded.

Federated Learning Position

SeeFar: Vehicle Speed Estimation and Flow Analysis from a Moving UAV

no code implementations ICIAP 2022 Mang Ning, Xiaoliang Ma, Yao Lu, Simone Calderara, Rita Cucchiara

In this paper, we introduce SeeFar to achieve vehicle speed estimation and traffic flow analysis based on YOLOv5 and DeepSORT from a moving drone.

Vehicle Speed Estimation

Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction

1 code implementation25 Apr 2022 Luigi Filippo Chiara, Pasquale Coscia, Sourav Das, Simone Calderara, Rita Cucchiara, Lamberto Ballan

Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and advanced video-surveillance applications.

Autonomous Vehicles Trajectory Forecasting

Embodied Navigation at the Art Gallery

1 code implementation19 Apr 2022 Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

This feature is challenging for occupancy-based agents which are usually trained in crowded domestic environments with plenty of occupancy information.

Navigate PointGoal Navigation

Dress Code: High-Resolution Multi-Category Virtual Try-On

1 code implementation18 Apr 2022 Davide Morelli, Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, Rita Cucchiara

Dress Code is more than 3x larger than publicly available datasets for image-based virtual try-on and features high-resolution paired images (1024x768) with front-view, full-body reference models.

Virtual Try-on Vocal Bursts Intensity Prediction

Spot the Difference: A Novel Task for Embodied Agents in Changing Environments

1 code implementation18 Apr 2022 Federico Landi, Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

To make a step towards this setting, we propose Spot the Difference: a novel task for Embodied AI where the agent has access to an outdated map of the environment and needs to recover the correct layout in a fixed time budget.

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

no code implementations CVPR 2022 Alessio Monti, Angelo Porrello, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara

To this end, we conceive a novel distillation strategy that allows a knowledge transfer from a teacher network to a student one, the latter fed with fewer observations (just two ones).

Knowledge Distillation Trajectory Forecasting +1

CaMEL: Mean Teacher Learning for Image Captioning

1 code implementation21 Feb 2022 Manuele Barraco, Matteo Stefanini, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities.

Image Captioning Knowledge Distillation

Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets

no code implementations24 Nov 2021 Marcella Cornia, Lorenzo Baraldi, Giuseppe Fiameni, Rita Cucchiara

This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources, containing both human-annotated and web-collected captions.

Descriptive Image Captioning +2

Multi-Category Mesh Reconstruction From Image Collections

1 code implementation21 Oct 2021 Alessandro Simoni, Stefano Pini, Roberto Vezzani, Rita Cucchiara

Recently, learning frameworks have shown the capability of inferring the accurate shape, pose, and texture of an object from a single RGB image.

Object

Focus on Impact: Indoor Exploration with Intrinsic Motivation

1 code implementation14 Sep 2021 Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

The proposed exploration approach outperforms DRL-based competitors relying on intrinsic rewards and surpasses the agents trained with a dense extrinsic reward computed with the environment layouts.

Working Memory Connections for LSTM

no code implementations31 Aug 2021 Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.

From Show to Tell: A Survey on Deep Learning-based Image Captioning

no code implementations14 Jul 2021 Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Silvia Cascianelli, Giuseppe Fiameni, Rita Cucchiara

Starting from 2015 the task has generally been addressed with pipelines composed of a visual encoder and a language model for text generation.

Image Captioning Language Modelling +1

Learning to Select: A Fully Attentive Approach for Novel Object Captioning

no code implementations2 Jun 2021 Marco Cagrandi, Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly.

Image Captioning Language Modelling

RMS-Net: Regression and Masking for Soccer Event Spotting

1 code implementation15 Feb 2021 Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita Cucchiara

The recently proposed action spotting task consists in finding the exact timestamp in which an event occurs.

Action Spotting regression

Inter-Homines: Distance-Based Risk Estimation for Human Safety

no code implementations20 Jul 2020 Matteo Fabbri, Fabio Lanzi, Riccardo Gasparini, Simone Calderara, Lorenzo Baraldi, Rita Cucchiara

In this document, we report our proposal for modeling the risk of possible contagiousity in a given area monitored by RGB cameras where people freely move and interact.

Explore and Explain: Self-supervised Navigation and Recounting

no code implementations14 Jul 2020 Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara

In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path.

Navigate

DAG-Net: Double Attentive Graph Neural Network for Trajectory Forecasting

1 code implementation26 May 2020 Alessio Monti, Alessia Bertugli, Simone Calderara, Rita Cucchiara

Understanding human motion behaviour is a critical task for several possible applications like self-driving cars or social robots, and in general for all those settings where an autonomous agent has to navigate inside a human-centric environment.

Graph Neural Network Human motion prediction +4

AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

1 code implementation17 May 2020 Alessia Bertugli, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara

Anticipating human motion in crowded scenarios is essential for developing intelligent transportation systems, social-aware robots and advanced video surveillance applications.

Graph Attention Multi-future Trajectory Prediction +2

A Novel Attention-based Aggregation Function to Combine Vision and Language

no code implementations27 Apr 2020 Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The joint understanding of vision and language has been recently gaining a lot of attention in both the Computer Vision and Natural Language Processing communities, with the emergence of tasks such as image captioning, image-text matching, and visual question answering.

General Classification Image Captioning +4

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

1 code implementation CVPR 2020 Matteo Fabbri, Fabio Lanzi, Simone Calderara, Stefano Alletto, Rita Cucchiara

At the core of the proposed method lies our Volumetric Heatmap Autoencoder, a fully-convolutional network tasked with the compression of ground-truth heatmaps into a dense intermediate representation.

Ranked #6 on 3D Human Pose Estimation on Panoptic (using extra training data)

3D Human Pose Estimation 3D Pose Estimation

Conditional Channel Gated Networks for Task-Aware Continual Learning

1 code implementation CVPR 2020 Davide Abati, Jakub Tomczak, Tijmen Blankevoort, Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available.

Continual Learning

Meshed-Memory Transformer for Image Captioning

2 code implementations CVPR 2020 Marcella Cornia, Matteo Stefanini, Lorenzo Baraldi, Rita Cucchiara

Transformer-based architectures represent the state of the art in sequence modeling tasks like machine translation and language understanding.

Image Captioning Machine Translation +2

Multimodal Attention Networks for Low-Level Vision-and-Language Navigation

1 code implementation27 Nov 2019 Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara

Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination.

Vision and Language Navigation

SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability

no code implementations7 Oct 2019 Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans.

Text Generation Video Captioning

Warp and Learn: Novel Views Generation for Vehicles and Other Objects

1 code implementation24 Jul 2019 Andrea Palazzi, Luca Bergamini, Simone Calderara, Rita Cucchiara

An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance.

3D Object Detection Image Generation

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

1 code implementation5 Jul 2019 Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction.

Vision and Language Navigation

M-VAD Names: a Dataset for Video Captioning with Naming

1 code implementation4 Mar 2019 Stefano Pini, Marcella Cornia, Federico Bolelli, Lorenzo Baraldi, Rita Cucchiara

Current movie captioning architectures are not capable of mentioning characters with their proper name, replacing them with a generic "someone" tag.

TAG Video Captioning

Classifying Signals on Irregular Domains via Convolutional Cluster Pooling

no code implementations13 Feb 2019 Angelo Porrello, Davide Abati, Simone Calderara, Rita Cucchiara

We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset.

Clustering General Classification

Anomaly Locality in Video Surveillance

no code implementations29 Jan 2019 Federico Landi, Cees G. M. Snoek, Rita Cucchiara

This paper strives for the detection of real-world anomalies such as burglaries and assaults in surveillance videos.

Anomaly Detection

Can Adversarial Networks Hallucinate Occluded People With a Plausible Aspect?

1 code implementation23 Jan 2019 Federico Fulgeri, Matteo Fabbri, Stefano Alletto, Simone Calderara, Rita Cucchiara

When you see a person in a crowd, occluded by other persons, you miss visual information that can be used to recognize, re-identify or simply classify him or her.

Attribute

Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-to-Image Translation

1 code implementation CVPR 2019 Matteo Tomei, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

The applicability of computer vision to real paintings and artworks has been rarely investigated, even though a vast heritage would greatly benefit from techniques which can understand and process data from the artistic domain.

Image-to-Image Translation Translation

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

1 code implementation CVPR 2019 Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior.

controllable image captioning Diversity

Latent Space Autoregression for Novelty Detection

1 code implementation CVPR 2019 Davide Abati, Angelo Porrello, Simone Calderara, Rita Cucchiara

Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity.

Anomaly Detection Novelty Detection +1

A Graph Transduction Game for Multi-target Tracking

no code implementations12 Jun 2018 Tewodros Mulugeta Dagnew, Dalia Coppi, Marcello Pelillo, Rita Cucchiara

Semi-supervised learning is a popular class of techniques to learn from labeled and unlabeled data.

Multiple People Tracking

Learning to Generate Facial Depth Maps

no code implementations30 May 2018 Stefano Pini, Filippo Grazioli, Guido Borghi, Roberto Vezzani, Rita Cucchiara

In this paper, an adversarial architecture for facial depth map estimation from monocular intensity images is presented.

Face Verification Generative Adversarial Network

Face-from-Depth for Head Pose Estimation on Depth Images

no code implementations12 Dec 2017 Guido Borghi, Matteo Fabbri, Roberto Vezzani, Simone Calderara, Rita Cucchiara

Therefore, we propose a complete framework for the estimation of the head and shoulder pose based on depth images only.

Head Detection Head Pose Estimation

Head Detection with Depth Images in the Wild

no code implementations21 Jul 2017 Diego Ballotta, Guido Borghi, Roberto Vezzani, Rita Cucchiara

Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images.

Head Detection

Generative Adversarial Models for People Attribute Recognition in Surveillance

no code implementations7 Jul 2017 Matteo Fabbri, Simone Calderara, Rita Cucchiara

In this paper we propose a deep architecture for detecting people attributes (e. g. gender, race, clothing ...) in surveillance contexts.

Attribute General Classification

Learning to Map Vehicles into Bird's Eye View

3 code implementations26 Jun 2017 Andrea Palazzi, Guido Borghi, Davide Abati, Simone Calderara, Rita Cucchiara

Awareness of the road scene is an essential component for both autonomous vehicles and Advances Driver Assistance Systems and is gaining importance both for the academia and car companies.

Autonomous Vehicles

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

no code implementations26 Jun 2017 Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

Image captioning has been recently gaining a lot of attention thanks to the impressive achievements shown by deep captioning architectures, which combine Convolutional Neural Networks to extract image representations, and Recurrent Neural Networks to generate the corresponding captions.

Ranked #2 on Image Captioning on Flickr30k Captions test (using extra training data)

Image Captioning Saliency Prediction

From Depth Data to Head Pose Estimation: a Siamese approach

no code implementations10 Mar 2017 Marco Venturelli, Guido Borghi, Roberto Vezzani, Rita Cucchiara

In this paper, we tackle the pose estimation problem through a deep learning network working in regression manner.

Driver Attention Monitoring Head Pose Estimation +2

Fast Gesture Recognition with Multiple Stream Discrete HMMs on 3D Skeletons

no code implementations8 Mar 2017 Guido Borghi, Roberto Vezzani, Rita Cucchiara

HMMs are widely used in action and gesture recognition due to their implementation simplicity, low computational requirement, scalability and high parallelism.

Classification General Classification +1

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model

2 code implementations29 Nov 2016 Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

Data-driven saliency has recently gained a lot of attention thanks to the use of Convolutional Neural Networks for predicting gaze fixations.

Saliency Prediction

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

no code implementations CVPR 2017 Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description.

Decoder Video Captioning +1

Learning Where to Attend Like a Human Driver

1 code implementation24 Nov 2016 Andrea Palazzi, Francesco Solera, Simone Calderara, Stefano Alletto, Rita Cucchiara

Despite the advent of autonomous cars, it's likely - at least in the near future - that human attention will still maintain a central role as a guarantee in terms of legal responsibility during the driving task.

Recognizing and Presenting the Storytelling Video Structure with Deep Multimodal Networks

no code implementations5 Oct 2016 Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

This paper presents a novel approach for temporal and semantic segmentation of edited videos into meaningful segments, from the point of view of the storytelling structure.

Change Detection Retrieval +1

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

22 code implementations6 Sep 2016 Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, Carlo Tomasi

To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080p, 60fps video taken by 8 cameras observing more than 2, 700 identities over 85 minutes; and (iii) a reference software system as a comparison baseline.

A Deep Multi-Level Network for Saliency Prediction

2 code implementations5 Sep 2016 Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

Current state of the art models for saliency prediction employ Fully Convolutional networks that perform a non-linear combination of features extracted from the last convolutional layer to predict saliency maps.

Saliency Prediction

Video Registration in Egocentric Vision under Day and Night Illumination Changes

no code implementations28 Jul 2016 Stefano Alletto, Giuseppe Serra, Rita Cucchiara

To effectively register an egocentric video sequence under these conditions, we propose to tackle the source of the problem: the matching process.

Scene-driven Retrieval in Edited Videos using Aesthetic and Semantic Deep Features

no code implementations9 Apr 2016 Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

This paper presents a novel retrieval pipeline for video collections, which aims to retrieve the most significant parts of an edited video for a given query, and represent them with thumbnails which are at the same time semantically meaningful and aesthetically remarkable.

Retrieval

A Deep Siamese Network for Scene Detection in Broadcast Videos

1 code implementation29 Oct 2015 Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

We present a model that automatically divides broadcast videos into coherent scenes by learning a distance measure between shots.

Scene Segmentation

Learning to Divide and Conquer for Online Multi-Target Tracking

no code implementations ICCV 2015 Francesco Solera, Simone Calderara, Rita Cucchiara

Online Multiple Target Tracking (MTT) is often addressed within the tracking-by-detection paradigm.

Socially Constrained Structural Learning for Groups Detection in Crowd

no code implementations5 Aug 2015 Francesco Solera, Simone Calderara, Rita Cucchiara

Modern crowd theories agree that collective behavior is the result of the underlying interactions among small groups of individuals.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.