no code implementations • 10 Feb 2025 • Vlad Hosu, Lorenzo Agnolucci, Daisuke Iso, Dietmar Saupe
To bridge this gap, we introduce the Image Intrinsic Scale (IIS), defined as the largest scale where an image exhibits its highest perceived quality.
1 code implementation • 6 Feb 2025 • Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Andrew D. Bagdanov
In this paper, we show that the common practice of individually exploiting the text or image encoders of these powerful multi-modal models is highly suboptimal for intra-modal tasks like image-to-image retrieval.
1 code implementation • 24 Sep 2024 • Vlad Hosu, Marcos V. Conde, Lorenzo Agnolucci, Nabajeet Barman, Saman Zadtootaghaj, Radu Timofte
By pushing the boundaries of NR-IQA for high-resolution photos, the UHD-IQA Challenge aims to stimulate the development of practical models that can keep pace with the rapidly evolving landscape of digital photography.
1 code implementation • 25 Jun 2024 • Vlad Hosu, Lorenzo Agnolucci, Oliver Wiedemann, Daisuke Iso, Dietmar Saupe
We introduce a novel Image Quality Assessment (IQA) dataset comprising 6073 UHD-1 (4K) images, annotated at a fixed width of 3840 pixels.
2 code implementations • 5 May 2024 • Lorenzo Agnolucci, Alberto Baldrati, Marco Bertini, Alberto del Bimbo
Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption.
1 code implementation • 17 Mar 2024 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini
At the same time, we force CLIP to generate consistent representations for images with similar content and the same level of degradation.
Ranked #2 on
No-Reference Image Quality Assessment
on UHD-IQA
1 code implementation • 7 Nov 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
Given that, in this context, the speaker is typically in front of the camera and remains the same for the entire duration of the transmission, we can maintain a set of reference keyframes of the person from the higher-quality I-frames that are transmitted within the video stream and exploit them to guide the visual quality improvement; a novel aspect of this approach is the update policy that maintains and updates a compact and effective set of reference keyframes.
1 code implementation • 7 Nov 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
In this paper, we present a system to restore analog videos of historical archives.
Ranked #2 on
Analog Video Restoration
on TAPE
2 code implementations • 20 Oct 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
We design a transformer-based Swin-UNet network that exploits both neighboring and reference frames via our Multi-Reference Spatial Feature Fusion (MRSFF) blocks.
Ranked #1 on
Analog Video Restoration
on TAPE
1 code implementation • 20 Oct 2023 • Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto del Bimbo
In this work, we propose a self-supervised approach named ARNIQA (leArning distoRtion maNifold for Image Quality Assessment) for modeling the image distortion manifold to obtain quality representations in an intrinsic manner.
Ranked #2 on
No-Reference Image Quality Assessment
on CSIQ
1 code implementation • 12 Oct 2023 • Giovanni Burbi, Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo
Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions.
Ranked #3 on
Hateful Meme Classification
on HarMeme
no code implementations • 26 Jul 2023 • Lorenzo Agnolucci, Alberto Baldrati, Francesco Todino, Federico Becattini, Marco Bertini, Alberto del Bimbo
Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by matching an image and a custom textual prompt in its latent space.
2 code implementations • ICCV 2023 • Alberto Baldrati, Lorenzo Agnolucci, Marco Bertini, Alberto del Bimbo
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images.