Search Results for author: Leonid Sigal

Found 98 papers, 26 papers with code

Light Field Neural Rendering

1 code implementation • CVPR 2022 • Mohammed Suhail, Carlos Esteves, Leonid Sigal, Ameesh Makadia

Classical light field rendering for novel view synthesis can accurately reproduce view-dependent effects such as reflection, refraction, and translucency, but requires a dense view sampling of the scene.

Neural Rendering Novel View Synthesis

32,735

Paper
Code

LayoutVAE: Stochastic Scene Layout Generation From a Label Set

2 code implementations • ICCV 2019 • Akash Abdu Jyothi, Thibaut Durand, JiaWei He, Leonid Sigal, Greg Mori

Recently there is an increasing interest in scene generation within the research community.

Scene Generation

139

Paper
Code

Improved Few-Shot Visual Classification

2 code implementations • CVPR 2020 • Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal

Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data.

Ranked #2 on Few-Shot Image Classification on Mini-Imagenet 10-way (5-shot) (using extra training data)

Classification Few-Shot Image Classification +3

110

Paper
Code

Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning

2 code implementations • 13 Jan 2022 • Peyman Bateni, Jarred Barber, Raghav Goyal, Vaden Masrani, Jan-Willem van de Meent, Leonid Sigal, Frank Wood

The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks.

Active Learning continual few-shot learning +3

110

Paper
Code

Energy-Based Learning for Scene Graph Generation

1 code implementation • CVPR 2021 • Mohammed Suhail, Abhay Mittal, Behjat Siddiquie, Chris Broaddus, Jayan Eledath, Gerard Medioni, Leonid Sigal

The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space.

Ranked #3 on Scene Graph Classification on Visual Genome (R@20 metric)

Graph Generation Inductive Bias +4

Paper
Code

Multi-level Semantic Feature Augmentation for One-shot Learning

1 code implementation • 15 Apr 2018 • Zitian Chen, Yanwei Fu, yinda zhang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal

In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.

Novel Concepts One-Shot Learning

Paper
Code

Discriminative Feature Alignment: Improving Transferability of Unsupervised Domain Adaptation by Gaussian-guided Latent Alignment

1 code implementation • 23 Jun 2020 • Jing Wang, Jiahong Chen, Jianzhe Lin, Leonid Sigal, Clarence W. de Silva

To solve this problem, we introduce a Gaussian-guided latent alignment approach to align the latent feature distributions of the two domains under the guidance of the prior distribution.

Ranked #1 on Domain Adaptation on SYNSIG-to-GTSRB

Data Augmentation Domain Generalization +3

Paper
Code

Real-Time Monitoring of User Stress, Heart Rate and Heart Rate Variability on Mobile Devices

3 code implementations • 4 Oct 2022 • Peyman Bateni, Leonid Sigal

The user's pulse wave is then used to determine stress (according to the Baevsky Stress Index), heart rate, and heart rate variability.

Ranked #1 on Photoplethysmography (PPG) heart rate estimation on MMSE-HR

Heart rate estimation Heart Rate Variability +3

Paper
Code

Referring Transformer: A One-step Approach to Multi-task Visual Grounding

1 code implementation • NeurIPS 2021 • Muchen Li, Leonid Sigal

As an important step towards visual reasoning, visual grounding (e. g., phrase localization, referring expression comprehension/segmentation) has been widely explored Previous approaches to referring expression comprehension (REC) or segmentation (RES) either suffer from limited performance, due to a two-stage setup, or require the designing of complex task-specific one-stage architectures.

Ranked #10 on Referring Expression Segmentation on RefCOCO testB

Referring Expression Referring Expression Comprehension +4

Paper
Code

DwNet: Dense warp-based network for pose-guided human video generation

2 code implementations • 21 Oct 2019 • Polina Zablotskaia, Aliaksandr Siarohin, Bo Zhao, Leonid Sigal

In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video.

Video Generation

Paper
Code

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Paper
Code

An Improved Attention for Visual Question Answering

1 code implementation • 4 Nov 2020 • Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini

We also propose multimodal fusion module to combine both visual and textual information.

Question Answering Visual Question Answering

Paper
Code

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation • CVPR 2023 • Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Sentence Story Generation +1

Paper
Code

Joint Event Detection and Description in Continuous Video Streams

1 code implementation • 28 Feb 2018 • Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko

In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.

Dense Captioning Dense Video Captioning +2

Paper
Code

Modular Generative Adversarial Networks

2 code implementations • ECCV 2018 • Bo Zhao, Bo Chang, Zequn Jie, Leonid Sigal

Existing methods for multi-domain image-to-image translation (or generation) attempt to directly map an input image (or a random vector) to an image in one of the output domains.

Attribute Image-to-Image Translation +1

Paper
Code

VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

1 code implementation • 24 Oct 2022 • Sahithya Ravi, Aditya Chinchure, Leonid Sigal, Renjie Liao, Vered Shwartz

In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases.

Ranked #8 on Visual Question Answering (VQA) on A-OKVQA (DA VQA Score metric)

Question Answering Visual Question Answering

Paper
Code

A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)

1 code implementation • CVPR 2018 • Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross

The alignment of heterogeneous sequential data (video to text) is an important and challenging problem.

Dynamic Time Warping

Paper
Code

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

1 code implementation • 26 Oct 2021 • Tanzila Rahman, Mengyu Yang, Leonid Sigal

In this work, we introduce TriBERT -- a transformer-based architecture, inspired by ViLBERT, which enables contextual feature learning across three modalities: vision, pose, and audio, with the use of flexible co-attention.

Pose Retrieval Representation Learning +1

Paper
Code

TriBERT: Human-centric Audio-visual Representation Learning

1 code implementation • NeurIPS 2021 • Tanzila Rahman, Mengyu Yang, Leonid Sigal

Pose Retrieval Representation Learning +1

Paper
Code

Probabilistic Video Generation using Holistic Attribute Control

1 code implementation • ECCV 2018 • Jiawei He, Andreas Lehrmann, Joseph Marino, Greg Mori, Leonid Sigal

Videos express highly structured spatio-temporal patterns of visual data.

Attribute Future prediction +1

Paper
Code

Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction

1 code implementation • CVPR 2020 • Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, Alla Sheffer

Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces.

3D Shape Reconstruction Surface Reconstruction

Paper
Code

Attribute-guided image generation from layout

2 code implementations • 27 Aug 2020 • Ke Ma, Bo Zhao, Leonid Sigal

Also, the generated images from our model have higher resolution, object classification accuracy and consistency, as compared to the previous state-of-the-art.

Attribute Image Generation +2

Paper
Code

Show Me a Story: Towards Coherent Neural Story Illustration

1 code implementation • CVPR 2018 • Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal, Dimitris Metaxas, Mubbasir Kapadia

We propose an end-to-end network for the visual illustration of a sequence of sentences forming a story.

Sentence Story Visualization

Paper
Code

Vocabulary-informed Zero-shot and Open-set Learning

1 code implementation • 3 Jan 2023 • Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, xiangyang xue, Leonid Sigal

Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels.

Object Categorization Open Set Learning +1

Paper
Code

DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling

1 code implementation • CVPR 2023 • Yichen Guo, Mai Xu, Lai Jiang, Leonid Sigal, Yunjin Chen

To alleviate this issue, we propose the first attempt at 360deg image rescaling, which refers to downscaling a 360deg image to a visually valid low-resolution (LR) counterpart and then upscaling to a high-resolution (HR) 360deg image given the LR variant.

valid

Paper
Code

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models

1 code implementation • 28 Nov 2023 • Jiayun Luo, Siddhesh Khandelwal, Leonid Sigal, Boyang Li

and even outperforms most baselines that conduct additional network training on top of pretrained VLMs.

Image Captioning Image-text matching +6

Paper
Code

Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization

no code implementations • 16 Nov 2015 • Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal

Emotion is a key element in user-generated videos.

Ranked #5 on Video Emotion Recognition on Ekman6

Transfer Learning Video Emotion Recognition +1

Paper
Add Code

Visual Reference Resolution using Attention Memory for Visual Dialog

no code implementations • NeurIPS 2017 • Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal

From this memory, the model retrieves the previous attention, taking into account recency, which is most relevant for the current question, in order to resolve potentially ambiguous references.

Ranked #13 on Visual Dialog on VisDial v0.9 val (R@1 metric)

Parameter Prediction Question Answering +3

Paper
Add Code

Recent Advances in Zero-shot Recognition

no code implementations • 13 Oct 2017 • Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal, Shaogang Gong

With the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data.

Open Set Learning Zero-Shot Learning

Paper
Add Code

Action Classification and Highlighting in Videos

no code implementations • 31 Aug 2017 • Atousa Torabi, Leonid Sigal

Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition.

Action Classification Classification +4

Paper
Add Code

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

no code implementations • CVPR 2017 • Fanyi Xiao, Leonid Sigal, Yong Jae Lee

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i. e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.

Sentence Visual Grounding

Paper
Add Code

Weakly-Supervised Spatial Context Networks

no code implementations • 10 Apr 2017 • Zuxuan Wu, Larry S. Davis, Leonid Sigal

In particular, we propose spatial context networks that learn to predict a representation of one image patch from another image patch, within the same image, conditioned on their real-valued relative spatial offset.

Object Object Categorization

Paper
Add Code

Semi-Latent GAN: Learning to generate and modify facial images from attributes

no code implementations • 7 Apr 2017 • Weidong Yin, Yanwei Fu, Leonid Sigal, xiangyang xue

Generating and manipulating human facial images using high-level attributal controls are important and interesting problems.

Attribute Generative Adversarial Network

Paper
Add Code

Learning to Generate Posters of Scientific Papers by Probabilistic Graphical Models

no code implementations • 21 Feb 2017 • Yu-ting Qiang, Yanwei Fu, Xiao Yu, Yanwen Guo, Zhi-Hua Zhou, Leonid Sigal

In order to bridge the gap between panel attributes and the composition within each panel, we also propose a recursive page splitting algorithm to generate the panel layout for a poster.

Paper
Add Code

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

no code implementations • 26 Sep 2016 • Atousa Torabi, Niket Tandon, Leonid Sigal

We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video annotation and retrieval 2) Our proposed movie multiple-choice test.

Ranked #39 on Video Retrieval on MSR-VTT

Multiple-choice Retrieval +1

Paper
Add Code

Semi-supervised Vocabulary-informed Learning

no code implementations • CVPR 2016 • Yanwei Fu, Leonid Sigal

Despite significant progress in object categorization, in recent years, a number of important challenges remain, mainly, ability to learn from limited labeled data and ability to recognize object classes within large, potentially open, set of labels.

Object Categorization Open Set Learning +1

Paper
Add Code

Learning to Generate Posters of Scientific Papers

no code implementations • 5 Apr 2016 • Yu-ting Qiang, Yanwei Fu, Yanwen Guo, Zhi-Hua Zhou, Leonid Sigal

Then, given inferred layout and attributes, composition of graphical elements within each panel is synthesized.

Paper
Add Code

Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web

no code implementations • 22 Dec 2015 • Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, Stan Sclaroff

In contrast, collecting action images from the Web is much easier and training on images requires much less computation.

Ranked #14 on Action Recognition on ActivityNet (using extra training data)

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Robust Classification by Pre-conditioned LASSO and Transductive Diffusion Component Analysis

no code implementations • 19 Nov 2015 • Yanwei Fu, De-An Huang, Leonid Sigal

Collecting datasets in this way, however, requires robust and efficient ways for detecting and excluding outliers that are common and prevalent.

BIG-bench Machine Learning Classification +3

Paper
Add Code

Learning from Synthetic Data Using a Stacked Multichannel Autoencoder

no code implementations • 17 Sep 2015 • Xi Zhang, Yanwei Fu, Shanshan Jiang, Leonid Sigal, Gady Agam

In this paper, we investigate and formalize a general framework-Stacked Multichannel Autoencoder (SMCAE) that enables bridging the synthetic gap and learning from synthetic data more efficiently.

Sketch Recognition

Paper
Add Code

Learning Classifiers from Synthetic Data Using a Multichannel Autoencoder

no code implementations • 11 Mar 2015 • Xi Zhang, Yanwei Fu, Andi Zang, Leonid Sigal, Gady Agam

Experimental results on two datasets validate the efficiency of our MCAE model and our methodology of generating synthetic data.

General Classification

Paper
Add Code

Hierarchical Maximum-Margin Clustering

no code implementations • 6 Feb 2015 • Guang-Tong Zhou, Sung Ju Hwang, Mark Schmidt, Leonid Sigal, Greg Mori

We present a hierarchical maximum-margin clustering method for unsupervised data analysis.

Clustering

Paper
Add Code

A Unified Semantic Embedding: Relating Taxonomies and Attributes

no code implementations • NeurIPS 2014 • Sung Ju Hwang, Leonid Sigal

We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes.

Object Categorization

Paper
Add Code

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

no code implementations • 2 Feb 2012 • Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P. Xing, Masashi Sugiyama

We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures.

feature selection Vocal Bursts Intensity Prediction

Paper
Add Code

Learning the Compositional Spaces for Generalized Zero-shot Learning

no code implementations • ICLR 2019 • Hanze Dong, Yanwei Fu, Sung Ju Hwang, Leonid Sigal, xiangyang xue

This paper studies the problem of Generalized Zero-shot Learning (G-ZSL), whose goal is to classify instances belonging to both seen and unseen classes at the test time.

Generalized Zero-Shot Learning Open Set Learning

Paper
Add Code

Middle-Out Decoding

no code implementations • NeurIPS 2018 • Shikib Mehri, Leonid Sigal

Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled.

Video Captioning

Paper
Add Code

Image Generation from Layout

no code implementations • CVPR 2019 • Bo Zhao, Lili Meng, Weidong Yin, Leonid Sigal

The representation of each object is disentangled into a specified/certain part (category) and an unspecified/uncertain part (appearance).

Ranked #2 on Layout-to-Image Generation on Visual Genome 64x64

Layout-to-Image Generation Object

Paper
Add Code

Walking on Thin Air: Environment-Free Physics-based Markerless Motion Capture

no code implementations • 4 Dec 2018 • Micha Livne, Leonid Sigal, Marcus A. Brubaker, David J. Fleet

To our knowledge, this is the first approach to take physics into account without explicit {\em a priori} knowledge of the environment or body dimensions.

Markerless Motion Capture

Paper
Add Code

Traversing the Continuous Spectrum of Image Retrieval with Deep Dynamic Models

no code implementations • 1 Dec 2018 • Ziad Al-Halah, Andreas M. Lehrmann, Leonid Sigal

While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex.

Attribute Continuous Control +2

Paper
Add Code

Non-parametric Structured Output Networks

no code implementations • NeurIPS 2017 • Andreas Lehrmann, Leonid Sigal

End-to-end training methods for models with structured graphical dependencies on top of neural predictions have recently emerged as a principled way of combining these two paradigms.

Paper
Add Code

Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

no code implementations • NeurIPS 2013 • Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori

We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video.

Classification General Classification +3

Paper
Add Code

Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines

no code implementations • NeurIPS 2011 • Matthew D. Zeiler, Graham W. Taylor, Leonid Sigal, Iain Matthews, Rob Fergus

We present a type of Temporal Restricted Boltzmann Machine that defines a probability distribution over an output sequence conditional on an input sequence.

Paper
Add Code

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos

no code implementations • CVPR 2015 • Alina Kuznetsova, Sung Ju Hwang, Bodo Rosenhahn, Leonid Sigal

By incrementally detecting object instances in video and adding confident detections into the model, we are able to dynamically adjust the complexity of the detector over time by instantiating new prototypes to span all domains the model has seen.

Domain Adaptation Incremental Learning +3

Paper
Add Code

Where and when to look? Spatial-temporal attention for action recognition in videos

no code implementations • ICLR 2019 • Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Frederick Tung, Leonid Sigal

Our model is efficient, as it proposes a separable spatio-temporal mechanism for video attention, while being able to identify important parts of the video both spatially and temporally.

Action Recognition In Videos Temporal Action Localization +1

Paper
Add Code

Poselet Key-Framing: A Model for Human Activity Recognition

no code implementations • CVPR 2013 • Michalis Raptis, Leonid Sigal

We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.

Ranked #3 on Human Interaction Recognition on UT

Human Activity Recognition Temporal Localization

Paper
Add Code

Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction

no code implementations • CVPR 2014 • Gunhee Kim, Leonid Sigal, Eric P. Xing

The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos.

16k Video Summarization

Paper
Add Code

Ranking and Retrieval of Image Sequences From Multiple Paragraph Queries

no code implementations • CVPR 2015 • Gunhee Kim, Seungwhan Moon, Leonid Sigal

While most previous work has dealt with the relations between a natural language sentence and an image or a video, our work extends to the relations between paragraphs and image sequences.

Retrieval Sentence

Paper
Add Code

Joint Photo Stream and Blog Post Summarization and Exploration

no code implementations • CVPR 2015 • Gunhee Kim, Seungwhan Moon, Leonid Sigal

We alternate between solving the two coupled latent SVM problems, by first fixing the summarization and solving for the alignment from blog images to photo streams and vice versa.

Transfer Learning

Paper
Add Code

Space-Time Tree Ensemble for Action Recognition

no code implementations • CVPR 2015 • Shugao Ma, Leonid Sigal, Stan Sclaroff

Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.

Action Recognition Clustering +1

Paper
Add Code

Learning Activity Progression in LSTMs for Activity Detection and Early Detection

no code implementations • CVPR 2016 • Shugao Ma, Leonid Sigal, Stan Sclaroff

In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.

Action Detection Activity Detection +1

Paper
Add Code

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

no code implementations • CVPR 2016 • Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, Leonid Sigal

Large-scale action recognition and video categorization are important problems in computer vision.

Action Recognition Clustering +4

Paper
Add Code

Storyline Representation of Egocentric Videos With an Applications to Story-Based Search

no code implementations • ICCV 2015 • Bo Xiong, Gunhee Kim, Leonid Sigal

To address this, we propose a storyline representation that expresses an egocentric video as a set of jointly inferred, through MRF inference, story elements comprising of actors, locations, supporting objects and events, depicted on a timeline.

Paper
Add Code

Neural Sequential Phrase Grounding (SeqGROUND)

no code implementations • CVPR 2019 • Pelin Dogan, Leonid Sigal, Markus Gross

We propose an end-to-end approach for phrase grounding in images.

Phrase Grounding

Paper
Add Code

A Variational Auto-Encoder Model for Stochastic Point Processes

no code implementations • CVPR 2019 • Nazanin Mehrasa, Akash Abdu Jyothi, Thibaut Durand, JiaWei He, Leonid Sigal, Greg Mori

We propose a novel probabilistic generative model for action sequences.

Point Processes

Paper
Add Code

AttentionRNN: A Structured Spatial Attention Mechanism

no code implementations • ICCV 2019 • Siddhesh Khandelwal, Leonid Sigal

Visual attention mechanisms have proven to be integrally important constituent components of many modern deep neural architectures.

Image Categorization Image Generation +1

Paper
Add Code

Interpretable Spatio-temporal Attention for Video Action Recognition

no code implementations • 1 Oct 2018 • Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Wei Sun, Frederich Tung, Leonid Sigal

Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition.

Action Recognition Temporal Action Localization

Paper
Add Code

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

no code implementations • ICCV 2019 • Tanzila Rahman, Bicheng Xu, Leonid Sigal

Multi-modal learning, particularly among imaging and linguistic modalities, has made amazing strides in many high-level fundamental visual understanding problems, ranging from language grounding to dense event captioning.

Paper
Add Code

OptiBox: Breaking the Limits of Proposals for Visual Grounding

no code implementations • 29 Nov 2019 • Zicong Fan, Si Yi Meng, Leonid Sigal, James J. Little

The problem of language grounding has attracted much attention in recent years due to its pivotal role in more general image-lingual high level reasoning tasks (e. g., image captioning, VQA).

Image Captioning Visual Grounding +1

Paper
Add Code

Generating Videos of Zero-Shot Compositions of Actions and Objects

no code implementations • ECCV 2020 • Megha Nawhal, Mengyao Zhai, Andreas Lehrmann, Leonid Sigal, Greg Mori

Human activity videos involve rich, varied interactions between people and objects.

Human-Object Interaction Detection Object +1

Paper
Add Code

Variational Hyper RNN for Sequence Modeling

no code implementations • 24 Feb 2020 • Ruizhi Deng, Yanshuai Cao, Bo Chang, Leonid Sigal, Greg Mori, Marcus A. Brubaker

In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence.

Time Series Time Series Analysis

Paper
Add Code

Consistent Multiple Sequence Decoding

no code implementations • 2 Apr 2020 • Bicheng Xu, Leonid Sigal

Our formulation utilizes a consistency fusion mechanism, implemented using message passing in a Graph Neural Network (GNN), to aggregate context from related decoders.

Image Captioning

Paper
Add Code

UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation

no code implementations • CVPR 2021 • Siddhesh Khandelwal, Raghav Goyal, Leonid Sigal

Weakly-supervised approaches draw on image-level labels to build detectors/segmentors, while zero/few-shot methods assume abundant instance-level data for a set of base classes, and none to a few examples for novel classes.

object-detection Object Detection +1

Paper
Add Code

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

no code implementations • 25 Jun 2020 • Polina Zablotskaia, Edoardo A. Dominici, Leonid Sigal, Andreas M. Lehrmann

Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning.

Object Representation Learning

Paper
Add Code

Person-in-Context Synthesiswith Compositional Structural Space

no code implementations • 28 Aug 2020 • Weidong Yin, Ziwei Liu, Leonid Sigal

To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared ``compositional structural space'', which encodes shape, location and appearance information for both context and person structures in a disentangled manner.

Paper
Add Code

Weakly-supervised Audio-visual Sound Source Detection and Separation

no code implementations • 25 Mar 2021 • Tanzila Rahman, Leonid Sigal

Learning how to localize and separate individual object sounds in the audio channel of the video is a difficult task.

Audio Source Separation Denoising +5

Paper
Add Code

Segmentation-grounded Scene Graph Generation

no code implementations • ICCV 2021 • Siddhesh Khandelwal, Mohammed Suhail, Leonid Sigal

Our framework is agnostic to the underlying scene graph generation method and address the lack of segmentation annotations in target scene graph datasets (e. g., Visual Genome) through transfer and multi-task learning from, and with, an auxiliary dataset (e. g., MS COCO).

Graph Generation Multi-Task Learning +2

Paper
Add Code

Saliency-Guided Image Translation

no code implementations • CVPR 2021 • Lai Jiang, Mai Xu, Xiaofei Wang, Leonid Sigal

In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map.

Generative Adversarial Network Image-to-Image Translation +1

Paper
Add Code

Layered Controllable Video Generation

no code implementations • 24 Nov 2021 • Jiahui Huang, Yuhe Jin, Kwang Moo Yi, Leonid Sigal

In the first stage, with the rich set of losses and dynamic foreground size prior, we learn how to separate the frame into foreground and background layers and, conditioned on these layers, how to generate the next frame using VQ-VAE generator.

Video Generation

Paper
Add Code

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

no code implementations • 22 Mar 2022 • Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal

We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.

Representation Learning Self-Supervised Learning

Paper
Add Code

Generalizable Patch-Based Neural Rendering

no code implementations • 21 Jul 2022 • Mohammed Suhail, Carlos Esteves, Leonid Sigal, Ameesh Makadia

Neural rendering has received tremendous attention since the advent of Neural Radiance Fields (NeRF), and has pushed the state-of-the-art on novel-view synthesis considerably.

Neural Rendering Novel View Synthesis

Paper
Add Code

Iterative Scene Graph Generation

no code implementations • 27 Jul 2022 • Siddhesh Khandelwal, Leonid Sigal

In this work, we propose a novel framework for scene graph generation that addresses this limitation, as well as introduces dynamic conditioning on the image, using message passing in a Markov Random Field.

Graph Generation Scene Graph Generation

Paper
Add Code

GraphPNAS: Learning Distribution of Good Neural Architectures via Deep Graph Generative Models

no code implementations • 28 Nov 2022 • Muchen Li, Jeffrey Yunfan Liu, Leonid Sigal, Renjie Liao

Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods.

Neural Architecture Search

Paper
Add Code

Framework-agnostic Semantically-aware Global Reasoning for Segmentation

no code implementations • 6 Dec 2022 • Mir Rayat Imtiaz Hossain, Leonid Sigal, James J. Little

Recent advances in pixel-level tasks (e. g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features.

Instance Segmentation Segmentation +1

Paper
Add Code

Self-Supervised Relation Alignment for Scene Graph Generation

no code implementations • 2 Feb 2023 • Bicheng Xu, Renjie Liao, Leonid Sigal

In the auxiliary branch, relational input features are partially masked prior to message passing and predicate prediction.

Graph Generation Relation +1

Paper
Add Code

Frustratingly Simple but Effective Zero-shot Detection and Segmentation: Analysis and a Strong Baseline

no code implementations • 14 Feb 2023 • Siddhesh Khandelwal, Anirudth Nambirajan, Behjat Siddiquie, Jayan Eledath, Leonid Sigal

Methods for object detection and segmentation often require abundant instance-level annotations for training, which are time-consuming and expensive to collect.

Object object-detection +3

Paper
Add Code

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.

Action Detection Sentence +2

Paper
Add Code

Implicit and Explicit Commonsense for Multi-sentence Video Captioning

no code implementations • 14 Mar 2023 • Shih-Han Chou, James J. Little, Leonid Sigal

We show that our commonsense knowledge enhanced approach produces significant improvements on this task (up to 57% in METEOR and 8. 5% in CIDEr), as well as the state-of-the-art result on more traditional video captioning in the ActivityNet Captions dataset [29].

Imitation Learning Sentence +1

Paper
Add Code

Omnimatte3D: Associating Objects and Their Effects in Unconstrained Monocular Video

no code implementations • CVPR 2023 • Mohammed Suhail, Erika Lu, Zhengqi Li, Noah Snavely, Leonid Sigal, Forrester Cole

Instead, our method applies recent progress in monocular camera pose and depth estimation to create a full, RGBD video layer for the background, along with a video layer for each foreground object.

Depth Estimation

Paper
Add Code

INVE: Interactive Neural Video Editing

no code implementations • 15 Jul 2023 • Jiahui Huang, Leonid Sigal, Kwang Moo Yi, Oliver Wang, Joon-Young Lee

We present Interactive Neural Video Editing (INVE), a real-time video editing solution, which can assist the video editing process by consistently propagating sparse frame edits to the entire video clip.

Video Editing

Paper
Add Code

Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo Matching

no code implementations • ICCV 2023 • Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, Leonid Sigal

A novel Uncertainty Guided Adaptive Correlation (UGAC) module is introduced to robustly adapt the same model for different scenarios.

Stereo Matching

Paper
Add Code

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

no code implementations • 3 Dec 2023 • Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk

Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery.

counterfactual Counterfactual Reasoning

Paper
Add Code

TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking

no code implementations • 13 Dec 2023 • Raghav Goyal, Wan-Cyuan Fan, Mennatullah Siam, Leonid Sigal

In this work we propose a novel, clip-based DETR-style encoder-decoder architecture, which focuses on systematically analyzing and addressing aforementioned challenges.

Semantic Segmentation Video Object Segmentation +1

Paper
Add Code

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

no code implementations • 19 Dec 2023 • Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal

Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image.

Image Generation Prompt Engineering

Paper
Add Code

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

no code implementations • 2 Jan 2024 • Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal

While previous works have explored image generation conditioned on scene graphs or layouts, our task is distinctive and important as it involves generating scene graphs themselves unconditionally from noise, enabling efficient and interpretable control for image generation.

Graph Generation Image Generation +2

Paper
Add Code

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

no code implementations • 23 Jan 2024 • Shih-Han Chou, Matthew Kowal, Yasmin Niknam, Diana Moyano, Shayaan Mehdi, Richard Pito, Cheng Zhang, Ian Knopke, Sedef Akinli Kocak, Leonid Sigal, Yalda Mohsenzadeh

Towards a solution for designing this ability in algorithms, we present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset which focuses on high-level video-language understanding with an emphasis on long-form news.

Miscellaneous Video Description

Paper
Add Code

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

no code implementations • 18 Feb 2024 • Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.

Image Generation

Paper
Add Code

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

no code implementations • 21 Mar 2024 • Gaurav Bhatt, James Ross, Leonid Sigal

Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks.

Information Retrieval

Paper
Add Code

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

no code implementations • 17 Apr 2024 • Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.