no code implementations • ICLR 2019 • Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Frederick Tung, Leonid Sigal
Our model is efficient, as it proposes a separable spatio-temporal mechanism for video attention, while being able to identify important parts of the video both spatially and temporally.
Action Recognition In Videos Temporal Action Localization +1
no code implementations • 13 Aug 2024 • Shivam Chandhok, Wan-Cyuan Fan, Leonid Sigal
Vision-Language Models (VLMs) have emerged as general purpose tools for addressing a variety of complex computer vision problems.
no code implementations • 19 Jul 2024 • Wan-Cyuan Fan, Yen-Chun Chen, Mengchen Liu, Lu Yuan, Leonid Sigal
Experimental results show that CHOPINLLM exhibits strong performance in understanding both annotated and unannotated charts across a wide range of types.
no code implementations • 2 Jun 2024 • Chunjin Song, Zhijie Wu, Bastian Wandt, Leonid Sigal, Helge Rhodin
For reconstructing high-fidelity human 3D models from monocular videos, it is crucial to maintain consistent large-scale body shapes along with finely matched subtle wrinkles.
1 code implementation • CVPR 2024 • Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little
These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions.
no code implementations • 21 Mar 2024 • Gaurav Bhatt, James Ross, Leonid Sigal
Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks.
no code implementations • 18 Feb 2024 • Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal
We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a by-product, latent masks.
no code implementations • 23 Jan 2024 • Shih-Han Chou, Matthew Kowal, Yasmin Niknam, Diana Moyano, Shayaan Mehdi, Richard Pito, Cheng Zhang, Ian Knopke, Sedef Akinli Kocak, Leonid Sigal, Yalda Mohsenzadeh
Towards a solution for designing this ability in algorithms, we present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset which focuses on high-level video-language understanding with an emphasis on long-form news.
no code implementations • 2 Jan 2024 • Bicheng Xu, Qi Yan, Renjie Liao, Lele Wang, Leonid Sigal
While previous works have explored image generation conditioned on scene graphs or layouts, our task is distinctive and important as it involves generating scene graphs themselves unconditionally from noise, enabling efficient and interpretable control for image generation.
no code implementations • CVPR 2024 • Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
Further, we leverage the findings that different timesteps of the diffusion process cater to different levels of detail in an image.
no code implementations • 13 Dec 2023 • Raghav Goyal, Wan-Cyuan Fan, Mennatullah Siam, Leonid Sigal
In this work we propose a novel, clip-based DETR-style encoder-decoder architecture, which focuses on systematically analyzing and addressing aforementioned challenges.
1 code implementation • 3 Dec 2023 • Aditya Chinchure, Pushkar Shukla, Gaurav Bhatt, Kiri Salij, Kartik Hosanagar, Leonid Sigal, Matthew Turk
Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery.
1 code implementation • CVPR 2024 • Jiayun Luo, Siddhesh Khandelwal, Leonid Sigal, Boyang Li
From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering.
no code implementations • ICCV 2023 • Junpeng Jing, Jiankun Li, Pengfei Xiong, Jiangyu Liu, Shuaicheng Liu, Yichen Guo, Xin Deng, Mai Xu, Lai Jiang, Leonid Sigal
A novel Uncertainty Guided Adaptive Correlation (UGAC) module is introduced to robustly adapt the same model for different scenarios.
no code implementations • 15 Jul 2023 • Jiahui Huang, Leonid Sigal, Kwang Moo Yi, Oliver Wang, Joon-Young Lee
We present Interactive Neural Video Editing (INVE), a real-time video editing solution, which can assist the video editing process by consistently propagating sparse frame edits to the entire video clip.
no code implementations • 14 Mar 2023 • Shih-Han Chou, James J. Little, Leonid Sigal
We show that our commonsense knowledge enhanced approach produces significant improvements on this task (up to 57% in METEOR and 8. 5% in CIDEr), as well as the state-of-the-art result on more traditional video captioning in the ActivityNet Captions dataset [29].
no code implementations • 16 Feb 2023 • Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran
Video understanding tasks take many forms, from action detection to visual query localization and spatio-temporal grounding of sentences.
no code implementations • 14 Feb 2023 • Siddhesh Khandelwal, Anirudth Nambirajan, Behjat Siddiquie, Jayan Eledath, Leonid Sigal
Methods for object detection and segmentation often require abundant instance-level annotations for training, which are time-consuming and expensive to collect.
no code implementations • 2 Feb 2023 • Bicheng Xu, Renjie Liao, Leonid Sigal
In the auxiliary branch, relational input features are partially masked prior to message passing and predicate prediction.
1 code implementation • 3 Jan 2023 • Yanwei Fu, Xiaomei Wang, Hanze Dong, Yu-Gang Jiang, Meng Wang, xiangyang xue, Leonid Sigal
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels.
no code implementations • CVPR 2023 • Mohammed Suhail, Erika Lu, Zhengqi Li, Noah Snavely, Leonid Sigal, Forrester Cole
Instead, our method applies recent progress in monocular camera pose and depth estimation to create a full, RGBD video layer for the background, along with a video layer for each foreground object.
1 code implementation • CVPR 2023 • Yichen Guo, Mai Xu, Lai Jiang, Leonid Sigal, Yunjin Chen
To alleviate this issue, we propose the first attempt at 360deg image rescaling, which refers to downscaling a 360deg image to a visually valid low-resolution (LR) counterpart and then upscaling to a high-resolution (HR) 360deg image given the LR variant.
no code implementations • 6 Dec 2022 • Mir Rayat Imtiaz Hossain, Leonid Sigal, James J. Little
Recent advances in pixel-level tasks (e. g. segmentation) illustrate the benefit of of long-range interactions between aggregated region-based representations that can enhance local features.
no code implementations • 28 Nov 2022 • Muchen Li, Jeffrey Yunfan Liu, Leonid Sigal, Renjie Liao
Moreover, our graph generator leads to a learnable probabilistic search method that is more flexible and efficient than the commonly used RNN generator and random search methods.
1 code implementation • CVPR 2023 • Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
1 code implementation • 24 Oct 2022 • Sahithya Ravi, Aditya Chinchure, Leonid Sigal, Renjie Liao, Vered Shwartz
In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases.
Ranked #8 on Visual Question Answering (VQA) on A-OKVQA (DA VQA Score metric)
no code implementations • 4 Oct 2022 • Peyman Bateni, Leonid Sigal
The user's pulse wave is then used to determine stress (according to the Baevsky Stress Index), heart rate, and heart rate variability.
no code implementations • 27 Jul 2022 • Siddhesh Khandelwal, Leonid Sigal
In this work, we propose a novel framework for scene graph generation that addresses this limitation, as well as introduces dynamic conditioning on the image, using message passing in a Markov Random Field.
no code implementations • 21 Jul 2022 • Mohammed Suhail, Carlos Esteves, Leonid Sigal, Ameesh Makadia
Neural rendering has received tremendous attention since the advent of Neural Radiance Fields (NeRF), and has pushed the state-of-the-art on novel-view synthesis considerably.
no code implementations • 22 Mar 2022 • Tianyu Hua, Yonglong Tian, Sucheng Ren, Michalis Raptis, Hang Zhao, Leonid Sigal
We illustrate that randomized serialization of the segments significantly improves the performance and results in distribution over spatially-long (across-segments) and -short (within-segment) predictions which are effective for feature learning.
2 code implementations • 13 Jan 2022 • Peyman Bateni, Jarred Barber, Raghav Goyal, Vaden Masrani, Jan-Willem van de Meent, Leonid Sigal, Frank Wood
The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks.
1 code implementation • CVPR 2022 • Mohammed Suhail, Carlos Esteves, Leonid Sigal, Ameesh Makadia
Classical light field rendering for novel view synthesis can accurately reproduce view-dependent effects such as reflection, refraction, and translucency, but requires a dense view sampling of the scene.
1 code implementation • NeurIPS 2021 • Tanzila Rahman, Mengyu Yang, Leonid Sigal
In this work, we introduce TriBERT -- a transformer-based architecture, inspired by ViLBERT, which enables contextual feature learning across three modalities: vision, pose, and audio, with the use of flexible co-attention.
no code implementations • 24 Nov 2021 • Jiahui Huang, Yuhe Jin, Kwang Moo Yi, Leonid Sigal
In the first stage, with the rich set of losses and dynamic foreground size prior, we learn how to separate the frame into foreground and background layers and, conditioned on these layers, how to generate the next frame using VQ-VAE generator.
1 code implementation • 26 Oct 2021 • Tanzila Rahman, Mengyu Yang, Leonid Sigal
In this work, we introduce TriBERT -- a transformer-based architecture, inspired by ViLBERT, which enables contextual feature learning across three modalities: vision, pose, and audio, with the use of flexible co-attention.
no code implementations • CVPR 2021 • Lai Jiang, Mai Xu, Xiaofei Wang, Leonid Sigal
In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map.
Generative Adversarial Network Image-to-Image Translation +1
1 code implementation • NeurIPS 2021 • Muchen Li, Leonid Sigal
As an important step towards visual reasoning, visual grounding (e. g., phrase localization, referring expression comprehension/segmentation) has been widely explored Previous approaches to referring expression comprehension (REC) or segmentation (RES) either suffer from limited performance, due to a two-stage setup, or require the designing of complex task-specific one-stage architectures.
no code implementations • ICCV 2021 • Siddhesh Khandelwal, Mohammed Suhail, Leonid Sigal
Our framework is agnostic to the underlying scene graph generation method and address the lack of segmentation annotations in target scene graph datasets (e. g., Visual Genome) through transfer and multi-task learning from, and with, an auxiliary dataset (e. g., MS COCO).
no code implementations • 25 Mar 2021 • Tanzila Rahman, Leonid Sigal
Learning how to localize and separate individual object sounds in the audio channel of the video is a difficult task.
1 code implementation • CVPR 2021 • Mohammed Suhail, Abhay Mittal, Behjat Siddiquie, Chris Broaddus, Jayan Eledath, Gerard Medioni, Leonid Sigal
The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space.
Ranked #4 on Scene Graph Generation on Visual Genome
1 code implementation • 4 Nov 2020 • Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini
We also propose multimodal fusion module to combine both visual and textual information.
no code implementations • 28 Aug 2020 • Weidong Yin, Ziwei Liu, Leonid Sigal
To handle the stark difference in input structures, we proposed two separate neural branches to attentively composite the respective (context/person) inputs into shared ``compositional structural space'', which encodes shape, location and appearance information for both context and person structures in a disentangled manner.
2 code implementations • 27 Aug 2020 • Ke Ma, Bo Zhao, Leonid Sigal
Also, the generated images from our model have higher resolution, object classification accuracy and consistency, as compared to the previous state-of-the-art.
no code implementations • 25 Jun 2020 • Polina Zablotskaia, Edoardo A. Dominici, Leonid Sigal, Andreas M. Lehrmann
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning.
1 code implementation • 23 Jun 2020 • Jing Wang, Jiahong Chen, Jianzhe Lin, Leonid Sigal, Clarence W. de Silva
To solve this problem, we introduce a Gaussian-guided latent alignment approach to align the latent feature distributions of the two domains under the guidance of the prior distribution.
Ranked #1 on Domain Adaptation on SYNSIG-to-GTSRB
no code implementations • CVPR 2021 • Siddhesh Khandelwal, Raghav Goyal, Leonid Sigal
Weakly-supervised approaches draw on image-level labels to build detectors/segmentors, while zero/few-shot methods assume abundant instance-level data for a set of base classes, and none to a few examples for novel classes.
no code implementations • 2 Apr 2020 • Bicheng Xu, Leonid Sigal
Our formulation utilizes a consistency fusion mechanism, implemented using message passing in a Graph Neural Network (GNN), to aggregate context from related decoders.
no code implementations • 24 Feb 2020 • Ruizhi Deng, Yanshuai Cao, Bo Chang, Leonid Sigal, Greg Mori, Marcus A. Brubaker
In this work, we propose a novel probabilistic sequence model that excels at capturing high variability in time series data, both across sequences and within an individual sequence.
1 code implementation • CVPR 2020 • Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, Alla Sheffer
Reconstruction of a 3D shape from a single 2D image is a classical computer vision problem, whose difficulty stems from the inherent ambiguity of recovering occluded or only partially observed surfaces.
2 code implementations • CVPR 2020 • Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal
Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data.
Ranked #2 on Few-Shot Image Classification on Mini-Imagenet 10-way (5-shot) (using extra training data)
no code implementations • ECCV 2020 • Megha Nawhal, Mengyao Zhai, Andreas Lehrmann, Leonid Sigal, Greg Mori
Human activity videos involve rich, varied interactions between people and objects.
no code implementations • 29 Nov 2019 • Zicong Fan, Si Yi Meng, Leonid Sigal, James J. Little
The problem of language grounding has attracted much attention in recent years due to its pivotal role in more general image-lingual high level reasoning tasks (e. g., image captioning, VQA).
3 code implementations • 21 Oct 2019 • Polina Zablotskaia, Aliaksandr Siarohin, Bo Zhao, Leonid Sigal
In this paper, we focus on human motion transfer - generation of a video depicting a particular subject, observed in a single image, performing a series of motions exemplified by an auxiliary (driving) video.
no code implementations • ICCV 2019 • Tanzila Rahman, Bicheng Xu, Leonid Sigal
Multi-modal learning, particularly among imaging and linguistic modalities, has made amazing strides in many high-level fundamental visual understanding problems, ranging from language grounding to dense event captioning.
2 code implementations • ICCV 2019 • Akash Abdu Jyothi, Thibaut Durand, JiaWei He, Leonid Sigal, Greg Mori
Recently there is an increasing interest in scene generation within the research community.
no code implementations • ICCV 2019 • Siddhesh Khandelwal, Leonid Sigal
Visual attention mechanisms have proven to be integrally important constituent components of many modern deep neural architectures.
no code implementations • CVPR 2019 • Nazanin Mehrasa, Akash Abdu Jyothi, Thibaut Durand, JiaWei He, Leonid Sigal, Greg Mori
We propose a novel probabilistic generative model for action sequences.
no code implementations • CVPR 2019 • Pelin Dogan, Leonid Sigal, Markus Gross
We propose an end-to-end approach for phrase grounding in images.
no code implementations • 4 Dec 2018 • Micha Livne, Leonid Sigal, Marcus A. Brubaker, David J. Fleet
To our knowledge, this is the first approach to take physics into account without explicit {\em a priori} knowledge of the environment or body dimensions.
no code implementations • 1 Dec 2018 • Ziad Al-Halah, Andreas M. Lehrmann, Leonid Sigal
While the proposed approaches in the literature can be roughly categorized into two main groups: category- and instance-based retrieval, in this work we show that the retrieval task is much richer and more complex.
no code implementations • CVPR 2019 • Bo Zhao, Lili Meng, Weidong Yin, Leonid Sigal
The representation of each object is disentangled into a specified/certain part (category) and an unspecified/uncertain part (appearance).
Ranked #2 on Layout-to-Image Generation on Visual Genome 64x64
no code implementations • NeurIPS 2018 • Shikib Mehri, Leonid Sigal
Despite being virtually ubiquitous, sequence-to-sequence models are challenged by their lack of diversity and inability to be externally controlled.
no code implementations • ICLR 2019 • Hanze Dong, Yanwei Fu, Sung Ju Hwang, Leonid Sigal, xiangyang xue
This paper studies the problem of Generalized Zero-shot Learning (G-ZSL), whose goal is to classify instances belonging to both seen and unseen classes at the test time.
no code implementations • 1 Oct 2018 • Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Wei Sun, Frederich Tung, Leonid Sigal
Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition.
1 code implementation • CVPR 2018 • Hareesh Ravi, Lezi Wang, Carlos Muniz, Leonid Sigal, Dimitris Metaxas, Mubbasir Kapadia
We propose an end-to-end network for the visual illustration of a sequence of sentences forming a story.
1 code implementation • 15 Apr 2018 • Zitian Chen, Yanwei Fu, yinda zhang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal
In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet.
1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko
To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.
2 code implementations • ECCV 2018 • Bo Zhao, Bo Chang, Zequn Jie, Leonid Sigal
Existing methods for multi-domain image-to-image translation (or generation) attempt to directly map an input image (or a random vector) to an image in one of the output domains.
1 code implementation • ECCV 2018 • Jiawei He, Andreas Lehrmann, Joseph Marino, Greg Mori, Leonid Sigal
Videos express highly structured spatio-temporal patterns of visual data.
1 code implementation • 28 Feb 2018 • Huijuan Xu, Boyang Li, Vasili Ramanishka, Leonid Sigal, Kate Saenko
In order to explicitly model temporal relationships between visual events and their captions in a single video, we also propose a two-level hierarchical captioning module that keeps track of context.
1 code implementation • CVPR 2018 • Pelin Dogan, Boyang Li, Leonid Sigal, Markus Gross
The alignment of heterogeneous sequential data (video to text) is an important and challenging problem.
no code implementations • NeurIPS 2017 • Andreas Lehrmann, Leonid Sigal
End-to-end training methods for models with structured graphical dependencies on top of neural predictions have recently emerged as a principled way of combining these two paradigms.
no code implementations • 13 Oct 2017 • Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue, Leonid Sigal, Shaogang Gong
With the recent renaissance of deep convolution neural networks, encouraging breakthroughs have been achieved on the supervised recognition tasks, where each class has sufficient training data and fully annotated training data.
no code implementations • NeurIPS 2017 • Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, Leonid Sigal
From this memory, the model retrieves the previous attention, taking into account recency, which is most relevant for the current question, in order to resolve potentially ambiguous references.
Ranked #13 on Visual Dialog on VisDial v0.9 val (R@1 metric)
no code implementations • 31 Aug 2017 • Atousa Torabi, Leonid Sigal
Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition.
no code implementations • CVPR 2017 • Fanyi Xiao, Leonid Sigal, Yong Jae Lee
We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i. e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.
no code implementations • 10 Apr 2017 • Zuxuan Wu, Larry S. Davis, Leonid Sigal
In particular, we propose spatial context networks that learn to predict a representation of one image patch from another image patch, within the same image, conditioned on their real-valued relative spatial offset.
no code implementations • 7 Apr 2017 • Weidong Yin, Yanwei Fu, Leonid Sigal, xiangyang xue
Generating and manipulating human facial images using high-level attributal controls are important and interesting problems.
no code implementations • 21 Feb 2017 • Yu-ting Qiang, Yanwei Fu, Xiao Yu, Yanwen Guo, Zhi-Hua Zhou, Leonid Sigal
In order to bridge the gap between panel attributes and the composition within each panel, we also propose a recursive page splitting algorithm to generate the panel layout for a poster.
no code implementations • 26 Sep 2016 • Atousa Torabi, Niket Tandon, Leonid Sigal
We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video annotation and retrieval 2) Our proposed movie multiple-choice test.
Ranked #39 on Video Retrieval on MSR-VTT
no code implementations • CVPR 2016 • Zuxuan Wu, Yanwei Fu, Yu-Gang Jiang, Leonid Sigal
Large-scale action recognition and video categorization are important problems in computer vision.
no code implementations • CVPR 2016 • Shugao Ma, Leonid Sigal, Stan Sclaroff
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.
no code implementations • CVPR 2016 • Yanwei Fu, Leonid Sigal
Despite significant progress in object categorization, in recent years, a number of important challenges remain, mainly, ability to learn from limited labeled data and ability to recognize object classes within large, potentially open, set of labels.
no code implementations • 5 Apr 2016 • Yu-ting Qiang, Yanwei Fu, Yanwen Guo, Zhi-Hua Zhou, Leonid Sigal
Then, given inferred layout and attributes, composition of graphical elements within each panel is synthesized.
no code implementations • 22 Dec 2015 • Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, Stan Sclaroff
In contrast, collecting action images from the Web is much easier and training on images requires much less computation.
Ranked #14 on Action Recognition on ActivityNet (using extra training data)
no code implementations • ICCV 2015 • Bo Xiong, Gunhee Kim, Leonid Sigal
To address this, we propose a storyline representation that expresses an egocentric video as a set of jointly inferred, through MRF inference, story elements comprising of actors, locations, supporting objects and events, depicted on a timeline.
no code implementations • 19 Nov 2015 • Yanwei Fu, De-An Huang, Leonid Sigal
Collecting datasets in this way, however, requires robust and efficient ways for detecting and excluding outliers that are common and prevalent.
no code implementations • 16 Nov 2015 • Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, Leonid Sigal
Emotion is a key element in user-generated videos.
Ranked #5 on Video Emotion Recognition on Ekman6
no code implementations • 17 Sep 2015 • Xi Zhang, Yanwei Fu, Shanshan Jiang, Leonid Sigal, Gady Agam
In this paper, we investigate and formalize a general framework-Stacked Multichannel Autoencoder (SMCAE) that enables bridging the synthetic gap and learning from synthetic data more efficiently.
no code implementations • CVPR 2015 • Gunhee Kim, Seungwhan Moon, Leonid Sigal
We alternate between solving the two coupled latent SVM problems, by first fixing the summarization and solving for the alignment from blog images to photo streams and vice versa.
no code implementations • CVPR 2015 • Shugao Ma, Leonid Sigal, Stan Sclaroff
Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.
no code implementations • CVPR 2015 • Alina Kuznetsova, Sung Ju Hwang, Bodo Rosenhahn, Leonid Sigal
By incrementally detecting object instances in video and adding confident detections into the model, we are able to dynamically adjust the complexity of the detector over time by instantiating new prototypes to span all domains the model has seen.
no code implementations • CVPR 2015 • Gunhee Kim, Seungwhan Moon, Leonid Sigal
While most previous work has dealt with the relations between a natural language sentence and an image or a video, our work extends to the relations between paragraphs and image sequences.
no code implementations • 11 Mar 2015 • Xi Zhang, Yanwei Fu, Andi Zang, Leonid Sigal, Gady Agam
Experimental results on two datasets validate the efficiency of our MCAE model and our methodology of generating synthetic data.
no code implementations • 6 Feb 2015 • Guang-Tong Zhou, Sung Ju Hwang, Mark Schmidt, Leonid Sigal, Greg Mori
We present a hierarchical maximum-margin clustering method for unsupervised data analysis.
no code implementations • NeurIPS 2014 • Sung Ju Hwang, Leonid Sigal
We propose a method that learns a discriminative yet semantic space for object categorization, where we also embed auxiliary semantic entities such as supercategories and attributes.
no code implementations • CVPR 2014 • Gunhee Kim, Leonid Sigal, Eric P. Xing
The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos.
no code implementations • NeurIPS 2013 • Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori
We propose a new weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video.
no code implementations • CVPR 2013 • Michalis Raptis, Leonid Sigal
We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.
Ranked #3 on Human Interaction Recognition on UT
no code implementations • 2 Feb 2012 • Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P. Xing, Masashi Sugiyama
We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures.
no code implementations • NeurIPS 2011 • Matthew D. Zeiler, Graham W. Taylor, Leonid Sigal, Iain Matthews, Rob Fergus
We present a type of Temporal Restricted Boltzmann Machine that defines a probability distribution over an output sequence conditional on an input sequence.