1 code implementation • 17 Mar 2024 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini
In particular, we introduce a quality-aware image-text alignment strategy to make CLIP generate representations that correlate with the inherent quality of the images.
Blind Image Quality Assessment No-Reference Image Quality Assessment +1
1 code implementation • 7 Nov 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
Given that, in this context, the speaker is typically in front of the camera and remains the same for the entire duration of the transmission, we can maintain a set of reference keyframes of the person from the higher-quality I-frames that are transmitted within the video stream and exploit them to guide the visual quality improvement; a novel aspect of this approach is the update policy that maintains and updates a compact and effective set of reference keyframes.
1 code implementation • 7 Nov 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
In this paper, we present a system to restore analog videos of historical archives.
Ranked #2 on Analog Video Restoration on TAPE
2 code implementations • 20 Oct 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
We design a transformer-based Swin-UNet network that exploits both neighboring and reference frames via our Multi-Reference Spatial Feature Fusion (MRSFF) blocks.
Ranked #1 on Analog Video Restoration on TAPE
1 code implementation • 20 Oct 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsic manner.
Ranked #2 on No-Reference Image Quality Assessment on CSIQ
Blind Image Quality Assessment No-Reference Image Quality Assessment +1
1 code implementation • 12 Oct 2023 • Giovanni Burbi, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo
Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions.
Ranked #1 on Hateful Meme Classification on HarMeme
no code implementations • 26 Jul 2023 • Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto del Bimbo
Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space.
3 code implementations • ICCV 2023 • Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images.