Search Results for author: Zsolt Kira

Found 87 papers, 46 papers with code

Seeing the Unseen: Visual Common Sense for Semantic Placement

no code implementations15 Jan 2024 Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs

Datasets for image description are typically constructed by curating relevant images and asking humans to annotate the contents of the image; neither of those two steps are straightforward for objects not present in the image.

Common Sense Reasoning Object

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

no code implementations14 Dec 2023 Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk

Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i. e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like.

Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters

no code implementations30 Nov 2023 James Seale Smith, Yen-Chang Hsu, Zsolt Kira, Yilin Shen, Hongxia Jin

We show that STAMINA outperforms the prior SOTA for the setting of text-to-image continual customization on a 50-concept benchmark composed of landmarks and human faces, with no stored replay data.

Continual Learning Hard Attention +1

ConstraintMatch for Semi-constrained Clustering

1 code implementation26 Nov 2023 Jann Goschenhofer, Bernd Bischl, Zsolt Kira

Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine, while still yielding full-supervision-level model performance.

Constrained Clustering

DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets

1 code implementation NeurIPS 2023 Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet

Construction of a universal detector poses a crucial question: How can we most effectively train a model on a large mixture of datasets?

object-detection Object Detection

FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects

no code implementations19 Oct 2023 Mayank Lunayach, Sergey Zakharov, Dian Chen, Rares Ambrus, Zsolt Kira, Muhammad Zubair Irshad

In this work, we address the challenging task of 3D object recognition without the reliance on real-world 3D labeled data.

3D Object Recognition 6D Pose Estimation

LatentDR: Improving Model Generalization Through Sample-Aware Latent Degradation and Restoration

1 code implementation28 Aug 2023 Ran Liu, Sahil Khose, Jingyun Xiao, Lakshmi Sathidevi, Keerthan Ramnath, Zsolt Kira, Eva L. Dyer

To address this challenge, we propose a novel approach for distribution-aware latent augmentation that leverages the relationships across samples to guide the augmentation procedure.

Domain Generalization

NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

1 code implementation ICCV 2023 Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus

NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference.

Generalizable Novel View Synthesis Novel View Synthesis

HePCo: Data-Free Heterogeneous Prompt Consolidation for Continual Federated Learning

no code implementations16 Jun 2023 Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira

In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data.

Federated Learning Image Classification

Adaptive Coordination in Social Embodied Rearrangement

no code implementations31 May 2023 Andrew Szot, Unnat Jain, Dhruv Batra, Zsolt Kira, Ruta Desai, Akshara Rai

We present the task of "Social Rearrangement", consisting of cooperative everyday tasks like setting up the dinner table, tidying a house or unpacking groceries in a simulated multi-agent environment.

HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning

no code implementations CVPR 2023 Chia-Wen Kuo, Zsolt Kira

The image captioning model encodes each view independently with a shared encoder efficiently, and a contrastive loss is incorporated across the encoded views in a novel way to improve their representation quality and the model's data efficiency.

Image Captioning

CLIP-GCD: Simple Language Guided Generalized Category Discovery

no code implementations17 May 2023 Rabah Ouldnoughi, Chia-Wen Kuo, Zsolt Kira

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data.

Clustering Retrieval

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

no code implementations12 Apr 2023 James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin

We show that C-LoRA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification.

Continual Learning Image Classification

Trainable Projected Gradient Method for Robust Fine-tuning

2 code implementations CVPR 2023 Junjiao Tian, Xiaoliang Dai, Chih-Yao Ma, Zecheng He, Yen-Cheng Liu, Zsolt Kira

To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization.

Transfer Learning

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav

no code implementations14 Mar 2023 Karmesh Yadav, Arjun Majumdar, Ram Ramrakhya, Naoki Yokoyama, Alexei Baevski, Zsolt Kira, Oleksandr Maksymets, Dhruv Batra

We present a single neural network architecture composed of task-agnostic components (ViTs, convolutions, and LSTMs) that achieves state-of-art results on both the ImageNav ("go to location in <this picture>") and ObjectNav ("find a chair") tasks without any task-specific modules like object detection, segmentation, mapping, or planning modules.

object-detection Object Detection +3

Communication-Critical Planning via Multi-Agent Trajectory Exchange

no code implementations10 Mar 2023 Nathaniel Moore Glaser, Zsolt Kira

This paper addresses the task of joint multi-agent perception and planning, especially as it relates to the real-world challenge of collision-free navigation for connected self-driving vehicles.

Navigate

System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

no code implementations8 Dec 2022 Indranil Sur, Zachary Daniels, Abrar Rahman, Kamil Faber, Gianmarco J. Gallardo, Tyler L. Hayes, Cameron E. Taylor, Mustafa Burak Gurbuz, James Smith, Sahana Joshi, Nathalie Japkowicz, Michael Baron, Zsolt Kira, Christopher Kanan, Roberto Corizzo, Ajay Divakaran, Michael Piacentino, Jesse Hostetler, Aswin Raghavan

In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system.

Continual Learning reinforcement-learning +2

Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation

no code implementations20 Nov 2022 Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira

In Vision-and-Language Navigation (VLN), researchers typically take an image encoder pre-trained on ImageNet without fine-tuning on the environments that the agent will be trained or tested on.

Test unseen Vision and Language Navigation

ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

1 code implementation CVPR 2023 James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

This leads to reasoning mistakes, which need to be corrected as they occur by teaching VL models the missing SVLC skills; often this must be done using private data where the issue was found, which naturally leads to a data-free continual (no task-id) VL learning setting.

Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks

no code implementations7 Oct 2022 Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira

Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using ~10% of their trainable parameters.

FedFOR: Stateless Heterogeneous Federated Learning with First-Order Regularization

1 code implementation21 Sep 2022 Junjiao Tian, James Seale Smith, Zsolt Kira

For the more typical applications of FL where the number of clients is large (e. g., edge-device and mobile applications), these methods cannot be applied, motivating the need for a stateless approach to heterogeneous FL which can be used for any number of clients.

Federated Learning

On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition

no code implementations15 Sep 2022 Farrukh Rahman, Ömer Mubarek, Zsolt Kira

Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting compared to CNNs.

Image Classification Inductive Bias +2

Open-Set Semi-Supervised Object Detection

no code implementations29 Aug 2022 Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vajda, Zijian He, Zsolt Kira

To address this problem, we consider online and offline OOD detection modules, which are integrated with SSOD methods.

Object object-detection +3

Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors

1 code implementation CVPR 2022 Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

In this paper, we present Unbiased Teacher v2, which shows the generalization of SS-OD method to anchor-free detectors and also introduces Listen2Student mechanism for the unsupervised regression loss.

Object Detection regression +1

Lifelong Wandering: A realistic few-shot online continual learning setting

no code implementations16 Jun 2022 Mayank Lunayach, James Smith, Zsolt Kira

Online few-shot learning describes a setting where models are trained and evaluated on a stream of data while learning emerging classes.

Continual Learning Few-Shot Learning

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

1 code implementation CVPR 2022 Chia-Wen Kuo, Zsolt Kira

A key limitation of such methods, however, is that the output of the model is conditioned only on the object detector's outputs.

Image Captioning Object

A Closer Look at Rehearsal-Free Continual Learning

no code implementations31 Mar 2022 James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen-Chang Hsu, Zsolt Kira

Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation.

Continual Learning Knowledge Distillation +2

A Closer Look at Knowledge Distillation with Features, Logits, and Gradients

no code implementations18 Mar 2022 Yen-Chang Hsu, James Smith, Yilin Shen, Zsolt Kira, Hongxia Jin

Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another.

Incremental Learning Knowledge Distillation +2

Exploring Covariate and Concept Shift for Detection and Calibration of Out-of-Distribution Data

no code implementations28 Oct 2021 Junjiao Tian, Yen-Change Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira

We are the first to propose a method that works well across both OOD detection and calibration and under different types of shifts.

Out of Distribution (OOD) Detection

A Geometric Perspective towards Neural Calibration via Sensitivity Decomposition

1 code implementation NeurIPS 2021 Junjiao Tian, Dylan Yung, Yen-Chang Hsu, Zsolt Kira

It is well known that vision classification models suffer from poor calibration in the face of data distribution shifts.

Exploring Covariate and Concept Shift for Detection and Confidence Calibration of Out-of-Distribution Data

no code implementations29 Sep 2021 Junjiao Tian, Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira

To this end, we theoretically derive two score functions for OOD detection, the covariate shift score and concept shift score, based on the decomposition of KL-divergence for both scores, and propose a geometrically-inspired method (Geometric ODIN) to improve OOD detection under both shifts with only in-distribution data.

Out of Distribution (OOD) Detection

CrossMatch: Improving Semi-Supervised Object Detection via Multi-Scale Consistency

no code implementations29 Sep 2021 Zhuoran Yu, Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

Inspired by the fact that teacher/student pseudo-labeling approaches result in a weak and sparse gradient signal due to the difficulty of confidence-thresholding, CrossMatch leverages \textit{multi-scale feature extraction} in object detection.

Object object-detection +2

Enhancing Multi-Robot Perception via Learned Data Association

no code implementations1 Jul 2021 Nathaniel Glaser, Yen-Cheng Liu, Junjiao Tian, Zsolt Kira

In this paper, we address the multi-robot collaborative perception problem, specifically in the context of multi-view infilling for distributed semantic segmentation.

Semantic Segmentation

Overcoming Obstructions via Bandwidth-Limited Multi-Agent Spatial Handshaking

no code implementations1 Jul 2021 Nathaniel Glaser, Yen-Cheng Liu, Junjiao Tian, Zsolt Kira

In this paper, we address bandwidth-limited and obstruction-prone collaborative perception, specifically in the context of multi-agent semantic segmentation.

Semantic Segmentation

Striking the Right Balance: Recall Loss for Semantic Segmentation

1 code implementation28 Jun 2021 Junjiao Tian, Niluthpol Mithun, Zach Seymour, Han-Pang Chiu, Zsolt Kira

There are two major drawbacks to these methods: 1) constantly up-weighting minority classes can introduce excessive false positives in semantic segmentation; 2) a minority class is not necessarily a hard class.

Semantic Segmentation

LRGNet: Learnable Region Growing for Class-Agnostic Point Cloud Segmentation

1 code implementation16 Mar 2021 Jingdao Chen, Zsolt Kira, Yong K. Cho

3D point cloud segmentation is an important function that helps robots understand the layout of their surrounding environment and perform tasks such as grasping objects, avoiding obstacles, and finding landmarks.

Instance Segmentation MORPH +3

Unbiased Teacher for Semi-Supervised Object Detection

4 code implementations ICLR 2021 Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda

To address this, we introduce Unbiased Teacher, a simple yet effective approach that jointly trains a student and a gradually progressing teacher in a mutually-beneficial manner.

Image Classification Object +4

Memory-Efficient Semi-Supervised Continual Learning: The World is its Own Replay Buffer

1 code implementation23 Jan 2021 James Smith, Jonathan Balloch, Yen-Chang Hsu, Zsolt Kira

Our work investigates whether we can significantly reduce this memory budget by leveraging unlabeled data from an agent's environment in a realistic and challenging continual learning paradigm.

Continual Learning Knowledge Distillation +1

Recall Loss for Imbalanced Image Classification and Semantic Segmentation

1 code implementation1 Jan 2021 Junjiao Tian, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Zsolt Kira

Many works have proposed to weigh the standard cross entropy loss function with pre-computed weights based on class statistics such as the number of samples and class margins.

Classification General Classification +4

Posterior Re-calibration for Imbalanced Datasets

no code implementations NeurIPS 2020 Junjiao Tian, Yen-Cheng Liu, Nathan Glaser, Yen-Chang Hsu, Zsolt Kira

Neural Networks can perform poorly when the training label distribution is heavily imbalanced, as well as when the testing data differs from the training distribution.

Long-tail Learning Semantic Segmentation

3D for Free: Crossmodal Transfer Learning using HD Maps

no code implementations24 Aug 2020 Benjamin Wilson, Zsolt Kira, James Hays

In this work, we address the long-tail problem by leveraging both the large class-taxonomies of modern 2D datasets and the robustness of state-of-the-art 2D detection methods.

3D Object Detection Autonomous Driving +5

FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning

2 code implementations ECCV 2020 Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

Recent state-of-the-art semi-supervised learning (SSL) methods use a combination of image-based transformations and consistency regularization as core components.

Clustering Data Augmentation +1

Frustratingly Simple Domain Generalization via Image Stylization

2 code implementations19 Jun 2020 Nathan Somavarapu, Chih-Yao Ma, Zsolt Kira

Convolutional Neural Networks (CNNs) show impressive performance in the standard classification setting where training and testing data are drawn i. i. d.

Domain Generalization Image Stylization

When2com: Multi-Agent Perception via Communication Graph Grouping

2 code implementations CVPR 2020 Yen-Cheng Liu, Junjiao Tian, Nathaniel Glaser, Zsolt Kira

While significant advances have been made for single-agent perception, many applications require multiple sensing agents and cross-agent communication due to benefits such as coverage and robustness.

Who2com: Collaborative Perception via Learnable Handshake Communication

1 code implementation21 Mar 2020 Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathan Glaser, Chia-Wen Kuo, Zsolt Kira

In this paper, we propose the problem of collaborative perception, where robots can combine their local observations with those of neighboring agents in a learnable way to improve accuracy on a perception task.

Multi-agent Reinforcement Learning Scene Understanding +1

Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

2 code implementations CVPR 2020 Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Zsolt Kira

Deep neural networks have attained remarkable performance when applied to data that comes from the same distribution as that of the training set, but can significantly degrade otherwise.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

UNO: Uncertainty-aware Noisy-Or Multimodal Fusion for Unanticipated Input Degradation

no code implementations6 Nov 2019 Junjiao Tian, Wesley Cheung, Nathan Glaser, Yen-Cheng Liu, Zsolt Kira

Specifically, we analyze a number of uncertainty measures, each of which captures a different aspect of uncertainty, and we propose a novel way to fuse degraded inputs by scaling modality-specific output softmax probabilities.

Semantic Segmentation

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation

5 code implementations ICCV 2019 Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, Jian Zheng

Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA3N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on four video DA datasets (e. g. 7. 9% accuracy gain over "Source only" from 73. 9% to 81. 8% on "HMDB --> UCF", and 10. 3% gain on "Kinetics --> Gameplay").

Unsupervised Domain Adaptation

Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification

no code implementations12 Jun 2019 Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira

We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results.

Classification General Classification +1

Learning to Generate Grounded Visual Captions without Localization Supervision

2 code implementations1 Jun 2019 Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira

When automatically generating a sentence description for an image or video, it often remains unclear how well the generated caption is grounded, that is whether the model uses the correct image regions to output particular words, or if the model is hallucinating based on priors in the dataset and/or the language model.

Image Captioning Language Modelling +2

Leveraging Semantics for Incremental Learning in Multi-Relational Embeddings

no code implementations29 May 2019 Angel Daruna, Weiyu Liu, Zsolt Kira, Sonia Chernova

Service robots benefit from encoding information in semantically meaningful ways to enable more robust task execution.

Incremental Learning Knowledge Graphs

Path Ranking with Attention to Type Hierarchies

1 code implementation26 May 2019 Weiyu Liu, Angel Daruna, Zsolt Kira, Sonia Chernova

The objective of the knowledge base completion problem is to infer missing information from existing facts in a knowledge base.

Knowledge Base Completion Knowledge Graphs +1

Temporal Attentive Alignment for Video Domain Adaptation

5 code implementations26 May 2019 Min-Hung Chen, Zsolt Kira, Ghassan AlRegib

Finally, we propose Temporal Attentive Adversarial Adaptation Network (TA3N), which explicitly attends to the temporal dynamics using domain discrepancy for more effective domain alignment, achieving state-of-the-art performance on three video DA datasets.

Domain Adaptation

RoboCSE: Robot Common Sense Embedding

no code implementations24 Mar 2019 Angel Daruna, Weiyu Liu, Zsolt Kira, Sonia Chernova

Autonomous service robots require computational frameworks that allow them to generalize knowledge to new situations in a manner that models uncertainty while scaling to real-world problem sizes.

Common Sense Reasoning

Unsupervised Continual Learning and Self-Taught Associative Memory Hierarchies

no code implementations ICLR Workshop LLD 2019 James Smith, Seth Baer, Zsolt Kira, Constantine Dovrolis

We first pose the Unsupervised Continual Learning (UCL) problem: learning salient representations from a non-stationary stream of unlabeled data in which the number of object classes varies with time.

Continual Learning Online Clustering

The Regretful Navigation Agent for Vision-and-Language Navigation

1 code implementation CVPR 2019 (Oral) 2019 Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Decision Making Vision and Language Navigation +2

Multi-view Incremental Segmentation of 3D Point Clouds for Mobile Robots

1 code implementation18 Feb 2019 Jingdao Chen, Yong K. Cho, Zsolt Kira

Mobile robots need to create high-definition 3D maps of the environment for applications such as remote surveillance and infrastructure mapping.

Robotics

Multi-class Classification without Multi-class Labels

1 code implementation ICLR 2019 Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom, Zsolt Kira

This work presents a new strategy for multi-class classification that requires no class-specific labels, but instead leverages pairwise similarity between examples, which is a weaker form of annotation.

Classification General Classification +1

Data-Efficient Graph Embedding Learning for PCB Component Detection

no code implementations16 Nov 2018 Chia-Wen Kuo, Jacob Ashmore, David Huggins, Zsolt Kira

This paper presents a challenging computer vision task, namely the detection of generic components on a PCB, and a novel set of deep-learning methods that are able to jointly leverage the appearance of individual components and the propagation of information across the structure of the board to accurately detect and identify various types of components on a PCB.

Graph Embedding object-detection +2

A probabilistic constrained clustering for transfer learning and image category discovery

no code implementations28 Jun 2018 Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom, Zsolt Kira

The proposed objective directly minimizes the negative log-likelihood of cluster assignment with respect to the pairwise constraints, has no hyper-parameters, and demonstrates improved scalability and performance on both supervised learning and unsupervised transfer learning.

Constrained Clustering Deep Clustering +2

Learning to Cluster for Proposal-Free Instance Segmentation

1 code implementation17 Mar 2018 Yen-Chang Hsu, Zheng Xu, Zsolt Kira, Jiawei Huang

We utilize the most fundamental property of instance labeling -- the pairwise relationship between pixels -- as the supervision to formulate the learning objective, then apply it to train a fully convolutional network (FCN) for learning to perform pixel-wise clustering.

Autonomous Driving Clustering +6

Learning to cluster in order to transfer across domains and tasks

1 code implementation ICLR 2018 Yen-Chang Hsu, Zhaoyang Lv, Zsolt Kira

The key insight is that, in addition to features, we can transfer similarity information and this is sufficient to learn a similarity function and clustering network to perform both domain adaptation and cross-task transfer learning.

Constrained Clustering Transfer Learning +1

Grounded Objects and Interactions for Video Captioning

no code implementations16 Nov 2017 Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf

We address the problem of video captioning by grounding language generation on object interactions in the video.

Object Scene Understanding +3

On Convergence and Stability of GANs

8 code implementations ICLR 2018 Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira

We propose studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions.

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

4 code implementations30 Mar 2017 Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

Action Classification Action Recognition +3

Deep Image Category Discovery using a Transferred Similarity Function

no code implementations5 Dec 2016 Yen-Chang Hsu, Zhaoyang Lv, Zsolt Kira

We propose that this network can be learned with contrastive loss which is only based on weak binary pair-wise constraints.

Clustering Transfer Learning

A Continuous Optimization Approach for Efficient and Accurate Scene Flow

no code implementations27 Jul 2016 Zhaoyang Lv, Chris Beall, Pablo F. Alcantarilla, Fuxin Li, Zsolt Kira, Frank Dellaert

We propose a continuous optimization method for solving dense 3D scene flow problems from stereo imagery.

Position

Neural network-based clustering using pairwise constraints

2 code implementations19 Nov 2015 Yen-Chang Hsu, Zsolt Kira

Robustness analysis also shows that the method is largely insensitive to the number of clusters.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.