no code implementations • 15 Apr 2023 • Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa
The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential.
no code implementations • 29 Nov 2022 • Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff
One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.
no code implementations • 25 Apr 2022 • Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal
In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.
2 code implementations • 21 Apr 2022 • Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Archana Venkatachalapathy, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Alice Li, Shangru Li, Rama Chellappa
The four challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries.
1 code implementation • CVPR 2022 • Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko
Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.
Ranked #1 on
Video Frame Interpolation
on Xiph-4K (Crop)
1 code implementation • 1 Apr 2022 • Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff
In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision.
1 code implementation • 22 Mar 2022 • Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko
In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets.
no code implementations • ICCV 2021 • Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, Manmohan Chandraker
Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition.
2 code implementations • ICCV 2021 • Kuniaki Saito, Donghyun Kim, Piotr Teterwak, Stan Sclaroff, Trevor Darrell, Kate Saenko
Unsupervised domain adaptation (UDA) methods can dramatically improve generalization on unlabeled target domains.
no code implementations • CVPR 2022 • Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff
In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.
1 code implementation • 25 Apr 2021 • Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Xiaodong Yang, Yue Yao, Liang Zheng, Pranamesh Chakraborty, Christian E. Lopez, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff
Track 3 addressed city-scale multi-target multi-camera vehicle tracking.
1 code implementation • 12 Jan 2021 • Qi Feng, Vitaly Ablavsky, Stan Sclaroff
In this paper, we focus on two foundational tasks: the Vehicle Retrieval by NL task and the Vehicle Tracking by NL task, which take advantage of the proposed CityFlow-NL benchmark and provide a strong basis for future research on the multi-target multi-camera tracking by NL description task.
no code implementations • ICCV 2021 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko
We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.
no code implementations • NeurIPS 2020 • Ping Hu, Stan Sclaroff, Kate Saenko
Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.
no code implementations • 1 Aug 2020 • Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer
Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks.
1 code implementation • 7 Jul 2020 • Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.
Ranked #32 on
Semantic Segmentation
on DensePASS
no code implementations • 11 Jun 2020 • Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff
In this work, we develop efficient disruptions of black-box image translation deepfake generation systems.
1 code implementation • CVPR 2020 • Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi
We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation.
Ranked #2 on
Video Semantic Segmentation
on Cityscapes val
no code implementations • 1 Apr 2020 • Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell
Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".
no code implementations • 18 Mar 2020 • Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko
We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.
no code implementations • 13 Mar 2020 • Andrea Zunino, Sarah Adel Bargal, Riccardo Volpi, Mehrnoosh Sameki, Jianming Zhang, Stan Sclaroff, Vittorio Murino, Kate Saenko
Explanations are defined as regions of visual evidence upon which a deep classification network makes a decision.
4 code implementations • 3 Mar 2020 • Nataniel Ruiz, Sarah Adel Bargal, Stan Sclaroff
This type of manipulated images and video have been coined Deepfakes.
1 code implementation • NeurIPS 2020 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko
While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori.
no code implementations • 18 Feb 2020 • Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni
We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate.
no code implementations • 12 Feb 2020 • Nataniel Ruiz, Hao Yu, Danielle A. Allessio, Mona Jalal, Ajjen Joshi, Thomas Murray, John J. Magee, Jacob R. Whitehill, Vitaly Ablavsky, Ivon Arroyo, Beverly P. Woolf, Stan Sclaroff, Margrit Betke
In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS).
1 code implementation • 23 Dec 2019 • Nuno C. Garcia, Sarah Adel Bargal, Vitaly Ablavsky, Pietro Morerio, Vittorio Murino, Stan Sclaroff
In this work, we address the problem of learning an ensemble of specialist networks using multimodal data, while considering the realistic and challenging scenario of possible missing modalities at test time.
1 code implementation • CVPR 2021 • Qi Feng, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff
We propose a novel Siamese Natural Language Tracker (SNLT), which brings the advancements in visual tracking to the tracking by natural language (NL) descriptions task.
no code implementations • 8 Sep 2019 • Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.
no code implementations • ICCV 2019 • Andrea Burns, Reuben Tan, Kate Saenko, Stan Sclaroff, Bryan A. Plummer
Shouldn't language and vision features be treated equally in vision-language (VL) tasks?
no code implementations • 26 Jul 2019 • Qi Feng, Vitaly Ablavsky, Qinxun Bai, Guorong Li, Stan Sclaroff
In benchmarks, our method is competitive with state of the art trackers, while it outperforms all other trackers on targets with unambiguous and precise language annotations.
no code implementations • 11 Jun 2019 • Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff
Learning from a few examples is a challenging task for machine learning.
no code implementations • 5 Jun 2019 • Donghyun Kim, Sarah Adel Bargal, Jianming Zhang, Stan Sclaroff
However, it has been shown that deep models are vulnerable to adversarial examples.
no code implementations • ICLR 2019 • Donghyun Kim, Sarah Adel Bargal, Jianming Zhang, Stan Sclaroff
Deep models are state-of-the-art for many computer vision tasks including image classification and object detection.
3 code implementations • ICCV 2019 • Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko
Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.
no code implementations • 6 Dec 2018 • Sarah Adel Bargal, Andrea Zunino, Vitali Petsiuk, Jianming Zhang, Kate Saenko, Vittorio Murino, Stan Sclaroff
We propose Guided Zoom, an approach that utilizes spatial grounding of a model's decision to make more informed predictions.
3 code implementations • 17 Nov 2018 • Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko
Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.
no code implementations • ICCV 2019 • Hanxiao Wang, Venkatesh Saligrama, Stan Sclaroff, Vitaly Ablavsky
We consider the problem of fine-grained classification on an edge camera device that has limited power.
2 code implementations • ECCV 2018 • Fatih Cakir, Kun He, Stan Sclaroff
We propose theoretical and empirical improvements for two-stage hashing methods.
1 code implementation • 23 May 2018 • Andrea Zunino, Sarah Adel Bargal, Pietro Morerio, Jianming Zhang, Stan Sclaroff, Vittorio Murino
In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout.
no code implementations • CVPR 2018 • Kun He, Yan Lu, Stan Sclaroff
In this paper, we improve the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows descriptor extraction in local feature based pipelines, and can be formulated as nearest neighbor retrieval.
1 code implementation • 13 Apr 2018 • Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko
To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.
2 code implementations • 2 Mar 2018 • Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff
Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval.
1 code implementation • CVPR 2018 • Sarah Adel Bargal, Andrea Zunino, Donghyun Kim, Jianming Zhang, Vittorio Murino, Stan Sclaroff
Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions.
no code implementations • CVPR 2017 • Ajjen Joshi, Soumya Ghosh, Margrit Betke, Stan Sclaroff, Hanspeter Pfister
Leveraging recent work on learning Bayesian neural networks, we build fast, scalable algorithms for inferring the posterior distribution over all network weights in the hierarchy.
1 code implementation • CVPR 2018 • Kun He, Fatih Cakir, Sarah Adel Bargal, Stan Sclaroff
Hashing, or learning binary embeddings of data, is frequently used in nearest neighbor retrieval.
no code implementations • 30 Apr 2017 • Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman
We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems.
1 code implementation • ICCV 2017 • Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff
Learning-based hashing methods are widely used for nearest neighbor retrieval, and recently, online hashing methods have demonstrated good performance-complexity trade-offs by learning hash functions from streaming data.
no code implementations • 2 Feb 2017 • Mikhail Breslav, Tyson L. Hedrick, Stan Sclaroff, Margrit Betke
Image and video analysis is often a crucial step in the study of animal behavior and kinematics.
3 code implementations • 1 Aug 2016 • Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff
We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps.
no code implementations • CVPR 2015 • Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.
1 code implementation • CVPR 2016 • Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
Our system leverages a Convolutional-Neural-Network model to generate location proposals of salient objects.
no code implementations • CVPR 2016 • Shugao Ma, Leonid Sigal, Stan Sclaroff
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.
no code implementations • 2 May 2016 • Mikhail Breslav, Tyson L. Hedrick, Stan Sclaroff, Margrit Betke
Our work introduces a novel way to increase pose estimation accuracy by discovering parts from unannotated regions of training images.
no code implementations • 22 Dec 2015 • Shugao Ma, Sarah Adel Bargal, Jianming Zhang, Leonid Sigal, Stan Sclaroff
In contrast, collecting action images from the Web is much easier and training on images requires much less computation.
Ranked #13 on
Action Recognition
on ActivityNet
(using extra training data)
no code implementations • ICCV 2015 • Fatih Cakir, Stan Sclaroff
With the staggering growth in image and video datasets, algorithms that provide fast similarity search and compact storage are crucial.
no code implementations • ICCV 2015 • Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech
Powered by this fast MBD transform algorithm, the proposed salient object detection method runs at 80 FPS, and significantly outperforms previous methods with similar speed on four large benchmark datasets, and achieves comparable or better performance than state-of-the-art methods.
Ranked #6 on
Video Salient Object Detection
on DAVSOD-easy35
(using extra training data)
no code implementations • 10 Nov 2015 • Fatih Cakir, Sarah Adel Bargal, Stan Sclaroff
To address these issues, we propose an online hashing method that is amenable to changes and expansions of the datasets.
no code implementations • 8 Jul 2015 • Qinxun Bai, Henry Lam, Stan Sclaroff
We propose a Bayesian approach for recursively estimating the classifier weights in online learning of a classifier ensemble.
no code implementations • 25 Jun 2015 • Sobhan Naderi Parizi, Kun He, Reza Aghajani, Stan Sclaroff, Pedro Felzenszwalb
Majorization-Minimization (MM) is a powerful iterative procedure for optimizing non-convex functions that works by optimizing a sequence of bounds on the function.
no code implementations • CVPR 2015 • Shugao Ma, Leonid Sigal, Stan Sclaroff
Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.
no code implementations • 4 Mar 2015 • Qinxun Bai, Steven Rosenberg, Zheng Wu, Stan Sclaroff
We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective.
no code implementations • 23 Jul 2014 • Fatih Cakir, Stan Sclaroff
Thus, given a training set for a particular computer vision task, a key problem is pruning a large codebook to select only a subset of visual words.
no code implementations • LREC 2012 • Zoya Gavrilov, Stan Sclaroff, Carol Neidle, Sven Dickinson
A framework is proposed for the detection of reduplication in digital videos of American Sign Language (ASL).