Search Results for author: Stan Sclaroff

Found 66 papers, 24 papers with code

Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

no code implementations29 Oct 2023 Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko

In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently.

Computational Efficiency Motion Estimation +1

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

no code implementations3 Aug 2023 Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

no code implementations30 Jun 2023 Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz

We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion.

Data Augmentation Inductive Bias

The 7th AI City Challenge

no code implementations15 Apr 2023 Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential.

Retrieval

Finding Differences Between Transformers and ConvNets Using Counterfactual Simulation Testing

no code implementations29 Nov 2022 Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

One shortcoming of this is the fact that these deep neural networks cannot be easily evaluated for robustness issues with respect to specific scene variations.

counterfactual Object

Temporal Relevance Analysis for Video Action Models

no code implementations25 Apr 2022 Quanfu Fan, Donghyun Kim, Chun-Fu, Chen, Stan Sclaroff, Kate Saenko, Sarah Adel Bargal

In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature.

Action Recognition

Many-to-many Splatting for Efficient Video Frame Interpolation

1 code implementation CVPR 2022 Ping Hu, Simon Niklaus, Stan Sclaroff, Kate Saenko

Motion-based video frame interpolation commonly relies on optical flow to warp pixels from the inputs to the desired interpolation instant.

Motion Estimation Optical Flow Estimation +1

A Unified Framework for Domain Adaptive Pose Estimation

1 code implementation1 Apr 2022 Donghyun Kim, Kaihong Wang, Kate Saenko, Margrit Betke, Stan Sclaroff

In this paper, we investigate the problem of domain adaptive 2D pose estimation that transfers knowledge learned on a synthetic source domain to a target domain without supervision.

2D Pose Estimation Animal Pose Estimation +2

A Broad Study of Pre-training for Domain Generalization and Adaptation

1 code implementation22 Mar 2022 Donghyun Kim, Kaihong Wang, Stan Sclaroff, Kate Saenko

In this paper, we provide a broad study and in-depth analysis of pre-training for domain adaptation and generalization, namely: network architectures, size, pre-training loss, and datasets.

Domain Generalization

Simulated Adversarial Testing of Face Recognition Models

no code implementations CVPR 2022 Nataniel Ruiz, Adam Kortylewski, Weichao Qiu, Cihang Xie, Sarah Adel Bargal, Alan Yuille, Stan Sclaroff

In this work, we propose a framework for learning how to test machine learning algorithms using simulators in an adversarial manner in order to find weaknesses in the model before deploying it in critical scenarios.

BIG-bench Machine Learning Face Recognition

CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions

1 code implementation12 Jan 2021 Qi Feng, Vitaly Ablavsky, Stan Sclaroff

In this paper, we focus on two foundational tasks: the Vehicle Retrieval by NL task and the Vehicle Tracking by NL task, which take advantage of the proposed CityFlow-NL benchmark and provide a strong basis for future research on the multi-target multi-camera tracking by NL description task.

Multi-Object Tracking Retrieval +1

CDS: Cross-Domain Self-Supervised Pre-Training

no code implementations ICCV 2021 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We present a two-stage pre-training approach that improves the generalization ability of standard single-domain pre-training.

Domain Adaptation Transfer Learning

Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation

no code implementations NeurIPS 2020 Ping Hu, Stan Sclaroff, Kate Saenko

Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level.

Semantic correspondence Semantic Segmentation +1

Self-supervised Visual Attribute Learning for Fashion Compatibility

no code implementations1 Aug 2020 Donghyun Kim, Kuniaki Saito, Samarth Mishra, Stan Sclaroff, Kate Saenko, Bryan A Plummer

Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks.

Attribute Object Recognition +3

Real-time Semantic Segmentation with Fast Attention

1 code implementation7 Jul 2020 Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko, Stan Sclaroff

The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism and captures the same rich spatial context at a small fraction of the computational cost, by changing the order of operations.

Real-Time Semantic Segmentation Segmentation

Spatio-Temporal Action Detection with Multi-Object Interaction

no code implementations1 Apr 2020 Huijuan Xu, Lizhi Yang, Stan Sclaroff, Kate Saenko, Trevor Darrell

Spatio-temporal action detection in videos requires localizing the action both spatially and temporally in the form of an "action tube".

Action Detection Human Detection +2

Cross-domain Self-supervised Learning for Domain Adaptation with Few Source Labels

no code implementations18 Mar 2020 Donghyun Kim, Kuniaki Saito, Tae-Hyun Oh, Bryan A. Plummer, Stan Sclaroff, Kate Saenko

We show that when labeled source examples are limited, existing methods often fail to learn discriminative features applicable for both source and target domains.

Self-Supervised Learning Unsupervised Domain Adaptation

Universal Domain Adaptation through Self Supervision

1 code implementation NeurIPS 2020 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Kate Saenko

While some methods address target settings with either partial or open-set categories, they assume that the particular setting is known a priori.

Clustering Partial Domain Adaptation +2

MILA: Multi-Task Learning from Videos via Efficient Inter-Frame Attention

no code implementations18 Feb 2020 Donghyun Kim, Tian Lan, Chuhang Zou, Ning Xu, Bryan A. Plummer, Stan Sclaroff, Jayan Eledath, Gerard Medioni

We embed the attention module in a ``slow-fast'' architecture, where the slower network runs on sparsely sampled keyframes and the light-weight shallow network runs on non-keyframes at a high frame rate.

Multi-Task Learning

DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition

1 code implementation23 Dec 2019 Nuno C. Garcia, Sarah Adel Bargal, Vitaly Ablavsky, Pietro Morerio, Vittorio Murino, Stan Sclaroff

In this work, we address the problem of learning an ensemble of specialist networks using multimodal data, while considering the realistic and challenging scenario of possible missing modalities at test time.

Action Recognition Multiple-choice +1

Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers

1 code implementation CVPR 2021 Qi Feng, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff

We propose a novel Siamese Natural Language Tracker (SNLT), which brings the advancements in visual tracking to the tracking by natural language (NL) descriptions task.

Region Proposal Visual Object Tracking +1

MULE: Multimodal Universal Language Embedding

no code implementations8 Sep 2019 Donghyun Kim, Kuniaki Saito, Kate Saenko, Stan Sclaroff, Bryan A. Plummer

In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages.

Data Augmentation Machine Translation +2

Real-time Visual Object Tracking with Natural Language Description

no code implementations26 Jul 2019 Qi Feng, Vitaly Ablavsky, Qinxun Bai, Guorong Li, Stan Sclaroff

In benchmarks, our method is competitive with state of the art trackers, while it outperforms all other trackers on targets with unambiguous and precise language annotations.

Object Visual Object Tracking

Semi-supervised Domain Adaptation via Minimax Entropy

3 code implementations ICCV 2019 Kuniaki Saito, Donghyun Kim, Stan Sclaroff, Trevor Darrell, Kate Saenko

Contemporary domain adaptation methods are very effective at aligning feature distributions of source and target domains without any target supervision.

Domain Adaptation Semi-supervised Domain Adaptation

Revisiting Image-Language Networks for Open-ended Phrase Detection

3 code implementations17 Nov 2018 Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko

Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image.

object-detection Object Detection +1

Hashing with Binary Matrix Pursuit

2 code implementations ECCV 2018 Fatih Cakir, Kun He, Stan Sclaroff

We propose theoretical and empirical improvements for two-stage hashing methods.

Image Retrieval Retrieval

Excitation Dropout: Encouraging Plasticity in Deep Neural Networks

1 code implementation23 May 2018 Andrea Zunino, Sarah Adel Bargal, Pietro Morerio, Jianming Zhang, Stan Sclaroff, Vittorio Murino

In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout.

Decision Making Video Recognition

Local Descriptors Optimized for Average Precision

no code implementations CVPR 2018 Kun He, Yan Lu, Stan Sclaroff

In this paper, we improve the learning of local feature descriptors by optimizing the performance of descriptor matching, which is a common stage that follows descriptor extraction in local feature based pipelines, and can be formulated as nearest neighbor retrieval.

Learning-To-Rank Retrieval

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

1 code implementation13 Apr 2018 Huijuan Xu, Kun He, Bryan A. Plummer, Leonid Sigal, Stan Sclaroff, Kate Saenko

To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work.

Retrieval Sentence

Hashing with Mutual Information

2 code implementations2 Mar 2018 Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff

Binary vector embeddings enable fast nearest neighbor retrieval in large databases of high-dimensional objects, and play an important role in many practical applications, such as image and video retrieval.

Image Retrieval Retrieval +1

Excitation Backprop for RNNs

1 code implementation CVPR 2018 Sarah Adel Bargal, Andrea Zunino, Donghyun Kim, Jianming Zhang, Vittorio Murino, Stan Sclaroff

Models are trained to caption or classify activity in videos, but little is known about the evidence used to make such decisions.

Action Recognition Temporal Action Localization +1

Personalizing Gesture Recognition Using Hierarchical Bayesian Neural Networks

no code implementations CVPR 2017 Ajjen Joshi, Soumya Ghosh, Margrit Betke, Stan Sclaroff, Hanspeter Pfister

Leveraging recent work on learning Bayesian neural networks, we build fast, scalable algorithms for inferring the posterior distribution over all network weights in the hierarchy.

Active Learning Gesture Recognition

Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

no code implementations30 Apr 2017 Danna Gurari, Kun He, Bo Xiong, Jianming Zhang, Mehrnoosh Sameki, Suyog Dutt Jain, Stan Sclaroff, Margrit Betke, Kristen Grauman

We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems.

Object Semantic Segmentation +1

MIHash: Online Hashing with Mutual Information

1 code implementation ICCV 2017 Fatih Cakir, Kun He, Sarah Adel Bargal, Stan Sclaroff

Learning-based hashing methods are widely used for nearest neighbor retrieval, and recently, online hashing methods have demonstrated good performance-complexity trade-offs by learning hash functions from streaming data.

Image Retrieval Retrieval

Automating Image Analysis by Annotating Landmarks with Deep Neural Networks

no code implementations2 Feb 2017 Mikhail Breslav, Tyson L. Hedrick, Stan Sclaroff, Margrit Betke

Image and video analysis is often a crucial step in the study of animal behavior and kinematics.

Top-down Neural Attention by Excitation Backprop

3 code implementations1 Aug 2016 Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff

We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps.

Salient Object Subitizing

no code implementations CVPR 2015 Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

We study the problem of Salient Object Subitizing, i. e. predicting the existence and the number of salient objects in an image using holistic cues.

Image Retrieval Object +4

Learning Activity Progression in LSTMs for Activity Detection and Early Detection

no code implementations CVPR 2016 Shugao Ma, Leonid Sigal, Stan Sclaroff

In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection.

Action Detection Activity Detection +1

Discovering Useful Parts for Pose Estimation in Sparsely Annotated Datasets

no code implementations2 May 2016 Mikhail Breslav, Tyson L. Hedrick, Stan Sclaroff, Margrit Betke

Our work introduces a novel way to increase pose estimation accuracy by discovering parts from unannotated regions of training images.

Pose Estimation

Adaptive Hashing for Fast Similarity Search

no code implementations ICCV 2015 Fatih Cakir, Stan Sclaroff

With the staggering growth in image and video datasets, algorithms that provide fast similarity search and compact storage are crucial.

Image Retrieval Retrieval

Minimum Barrier Salient Object Detection at 80 FPS

no code implementations ICCV 2015 Jianming Zhang, Stan Sclaroff, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

Powered by this fast MBD transform algorithm, the proposed salient object detection method runs at 80 FPS, and significantly outperforms previous methods with similar speed on four large benchmark datasets, and achieves comparable or better performance than state-of-the-art methods.

Ranked #6 on Video Salient Object Detection on VOS-T (using extra training data)

Object object-detection +2

Online Supervised Hashing for Ever-Growing Datasets

no code implementations10 Nov 2015 Fatih Cakir, Sarah Adel Bargal, Stan Sclaroff

To address these issues, we propose an online hashing method that is amenable to changes and expansions of the datasets.

A Bayesian Approach for Online Classifier Ensemble

no code implementations8 Jul 2015 Qinxun Bai, Henry Lam, Stan Sclaroff

We propose a Bayesian approach for recursively estimating the classifier weights in online learning of a classifier ensemble.

Generalized Majorization-Minimization

no code implementations25 Jun 2015 Sobhan Naderi Parizi, Kun He, Reza Aghajani, Stan Sclaroff, Pedro Felzenszwalb

Majorization-Minimization (MM) is a powerful iterative procedure for optimizing non-convex functions that works by optimizing a sequence of bounds on the function.

Space-Time Tree Ensemble for Action Recognition

no code implementations CVPR 2015 Shugao Ma, Leonid Sigal, Stan Sclaroff

Using the action vocabulary we then utilize tree mining with subsequent tree clustering and ranking to select a compact set of highly discriminative tree patterns.

Action Recognition Clustering +1

Class Probability Estimation via Differential Geometric Regularization

no code implementations4 Mar 2015 Qinxun Bai, Steven Rosenberg, Zheng Wu, Stan Sclaroff

We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective.

Classification General Classification

Visual Word Selection without Re-Coding and Re-Pooling

no code implementations23 Jul 2014 Fatih Cakir, Stan Sclaroff

Thus, given a training set for a particular computer vision task, a key problem is pruning a large codebook to select only a subset of visual words.

Detecting Reduplication in Videos of American Sign Language

no code implementations LREC 2012 Zoya Gavrilov, Stan Sclaroff, Carol Neidle, Sven Dickinson

A framework is proposed for the detection of reduplication in digital videos of American Sign Language (ASL).

Cannot find the paper you are looking for? You can Submit a new open access paper.