no code implementations • 11 Jan 2024 • Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian
Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts.
no code implementations • 14 Dec 2023 • Shijie Li, Farhad G. Zanjani, Haitam Ben Yahia, Yuki M. Asano, Juergen Gall, Amirhossein Habibian
This is because the source-view images and corresponding poses are processed separately and injected into the model at different stages.
1 code implementation • 13 Dec 2023 • Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen
This work aims to improve the efficiency of text-to-image diffusion models.
no code implementations • ICCV 2023 • Davide Abati, Haitam Ben Yahia, Markus Nagel, Amirhossein Habibian
Furthermore, we extend our model to dynamically adjust the bit-width proportional to the amount of changes in the video.
no code implementations • 5 Jan 2023 • Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian
This work aims to improve the efficiency of vision transformers (ViT).
1 code implementation • 16 Jun 2022 • Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort
Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware.
no code implementations • 5 Apr 2022 • Babak Ehteshami Bejnordi, Amirhossein Habibian, Fatih Porikli, Amir Ghodrati
In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection that allows for heavy down-sampling of unimportant background regions while preserving the fine-grained details of a high-resolution image.
1 code implementation • 17 Mar 2022 • Amirhossein Habibian, Haitam Ben Yahia, Davide Abati, Efstratios Gavves, Fatih Porikli
By extensive experiments on a wide range of architectures, including the most efficient ones, we demonstrate that delta distillation sets a new state of the art in terms of accuracy vs. efficiency trade-off for semantic segmentation and object detection in videos.
Ranked #2 on Video Semantic Segmentation on Cityscapes val
no code implementations • 3 Mar 2022 • Yura Perugachi-Diaz, Guillaume Sautière, Davide Abati, Yang Yang, Amirhossein Habibian, Taco S Cohen
To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.
1 code implementation • CVPR 2021 • Amir Ghodrati, Babak Ehteshami Bejnordi, Amirhossein Habibian
In this paper, we propose a conditional early exiting framework for efficient video recognition.
1 code implementation • CVPR 2021 • Amirhossein Habibian, Davide Abati, Taco S. Cohen, Babak Ehteshami Bejnordi
We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the model prediction,~\eg foreground regions, or it can be safely skipped, e. g. background regions.
no code implementations • 20 Apr 2020 • Vijay Veerabadran, Reza Pourreza, Amirhossein Habibian, Taco Cohen
In this paper, we present a novel adversarial lossy video compression model.
no code implementations • ICCV 2019 • Amirhossein Habibian, Ties van Rozendaal, Jakub M. Tomczak, Taco S. Cohen
We employ a model that consists of a 3D autoencoder with a discrete latent space and an autoregressive prior used for entropy coding.
no code implementations • 2 Aug 2019 • Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian
In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account.
no code implementations • 8 Nov 2015 • Amirhossein Habibian, Thomas Mensink, Cees G. M. Snoek
In our proposed embedding, which we call VideoStory, the correlations between the terms are utilized to learn a more effective representation by optimizing a joint objective balancing descriptiveness and predictability. We show how learning the VideoStory using a multimodal predictability loss, including appearance, motion and audio features, results in a better predictable representation.