Search Results for author: Robinson Piramuthu

Found 33 papers, 9 papers with code

"Don't forget to put the milk back!" Dataset for Enabling Embodied Agents to Detect Anomalous Situations

no code implementations • 12 Apr 2024 • James F. Mullen Jr, Prasoon Goyal, Robinson Piramuthu, Michael Johnston, Dinesh Manocha, Reza Ghanadan

Our work assists in this goal by enabling robots to inform their users of dangerous or unsanitary anomalies in their home.

Paper
Add Code

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

no code implementations • 28 Nov 2023 • Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu

Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real-world applications, so building a lightweight VL architecture and an efficient learning schema is of great practical value.

Language Modelling Question Answering +3

Paper
Add Code

Characterizing Video Question Answering with Sparsified Inputs

no code implementations • 27 Nov 2023 • Shiyuan Huang, Robinson Piramuthu, Vicente Ordonez, Shih-Fu Chang, Gunnar A. Sigurdsson

From our experiments, we have observed only 5. 2%-5. 8% loss of performance with only 10% of video lengths, which corresponds to 2-4 frames selected from each video.

Question Answering Video Question Answering

Paper
Add Code

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

no code implementations • 12 Mar 2023 • Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie

In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed.

Decision Making

Paper
Add Code

RREx-BoT: Remote Referring Expressions with a Bag of Tricks

no code implementations • 30 Jan 2023 • Gunnar A. Sigurdsson, Jesse Thomason, Gaurav S. Sukhatme, Robinson Piramuthu

Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9. 84% on remote object grounding above state of the art models for REVERIE and of 5. 04% on FAO.

Object Object Localization

Paper
Add Code

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation

no code implementations • 30 Nov 2022 • Vishnu Sashank Dorbala, Gunnar Sigurdsson, Robinson Piramuthu, Jesse Thomason, Gaurav S. Sukhatme

Our results on the coarse-grained instruction following task of REVERIE demonstrate the navigational capability of CLIP, surpassing the supervised baseline in terms of both success rate (SR) and success weighted by path length (SPL).

Instruction Following Object Recognition +1

Paper
Add Code

Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy

1 code implementation • 15 Oct 2022 • Shiyuan Huang, Robinson Piramuthu, Shih-Fu Chang, Gunnar A. Sigurdsson

Specifically, we insert a lightweight Feature Compression Module (FeatComp) into a VideoQA model which learns to extract task-specific tiny features as little as 10 bits, which are optimal for answering certain types of questions.

Feature Compression Question Answering +1

323

Paper
Code

A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic Search

no code implementations • 21 Jun 2022 • Brandon Trabucco, Gunnar Sigurdsson, Robinson Piramuthu, Gaurav S. Sukhatme, Ruslan Salakhutdinov

Physically rearranging objects is an important capability for embodied agents.

Semantic Segmentation

Paper
Add Code

TEACh: Task-driven Embodied Agents that Chat

3 code implementations • 1 Oct 2021 • Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes.

Dialogue Understanding

126

Paper
Code

VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator

1 code implementation • Findings (ACL) 2022 • Ayush Shrivastava, Karthik Gopalakrishnan, Yang Liu, Robinson Piramuthu, Gokhan Tür, Devi Parikh, Dilek Hakkani-Tür

Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN).

Binary Classification Imitation Learning +3

Paper
Code

Self-Attentive 3D Human Pose and Shape Estimation from Videos

no code implementations • 26 Mar 2021 • Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang

The key insights of our method are two-fold.

Ranked #53 on 3D Human Pose Estimation on MPI-INF-3DHP

3D human pose and shape estimation

Paper
Add Code

Weakly-Supervised Semantic Segmentation via Sub-category Exploration

1 code implementation • CVPR 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Existing weakly-supervised semantic segmentation methods using image-level annotations typically rely on initial responses to locate object regions.

Ranked #66 on Weakly-Supervised Semantic Segmentation on PASCAL VOC 2012 val

Clustering Object +2

178

Paper
Code

Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty Regularization

no code implementations • 3 Aug 2020 • Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Obtaining object response maps is one important step to achieve weakly-supervised semantic segmentation using image-level labels.

Classification Data Augmentation +4

Paper
Add Code

Mobile Head Tracking for eCommerce and Beyond

1 code implementation • 18 Dec 2018 • Muratcan Cicek, Jinrong Xie, Qiaosong Wang, Robinson Piramuthu

Unlike desktop and laptop computers, they are also much easier to carry indoors and outdoors. To address this, we implement and open source button that is sensitive to head movements tracked from the front camera of iPhone X.

Human-Computer Interaction

755

Paper
Code

Understanding Image Quality and Trust in Peer-to-Peer Marketplaces

no code implementations • 26 Nov 2018 • Xiao Ma, Lina Mezghani, Kimberly Wilber, Hui Hong, Robinson Piramuthu, Mor Naaman, Serge Belongie

In this work, we conducted a large-scale study on the quality of user-generated images in peer-to-peer marketplaces.

Paper
Add Code

Brand > Logo: Visual Analysis of Fashion Brands

1 code implementation • 23 Oct 2018 • M. Hadi Kiapour, Robinson Piramuthu

In this work, we analyze learned visual representations by deep networks that are trained to recognize fashion brands.

Marketing

153

Paper
Code

Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback

no code implementations • 24 Sep 2018 • Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu

In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results.

Attribute Image Retrieval

Paper
Add Code

Adversarial Learning for Fine-grained Image Search

no code implementations • 6 Jul 2018 • Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu

Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories.

Generative Adversarial Network Image Retrieval

Paper
Add Code

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations

2 code implementations • 3 Jul 2018 • Shuai Zheng, Fan Yang, M. Hadi Kiapour, Robinson Piramuthu

Understanding clothes from a single image has strong commercial and cultural impacts on modern societies.

Fashion Understanding object-detection +2

324

Paper
Code

Conditional Image-Text Embedding Networks

1 code implementation • ECCV 2018 • Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model.

Phrase Grounding

Paper
Code

Towards the Success Rate of One: Real-time Unconstrained Salient Object Detection

no code implementations • 31 Jul 2017 • Mahyar Najibi, Fan Yang, Qiaosong Wang, Robinson Piramuthu

In this work, we propose an efficient and effective approach for unconstrained salient object detection in images using deep convolutional neural networks.

Object object-detection +2

Paper
Add Code

Visual Search at eBay

no code implementations • 10 Jun 2017 • Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, Robinson Piramuthu

We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale.

Paper
Add Code

GraB: Visual Saliency via Novel Graph Model and Background Priors

no code implementations • CVPR 2016 • Qiaosong Wang, Wen Zheng, Robinson Piramuthu

We propose an unsupervised bottom-up saliency detection approach by exploiting novel graph structure and background priors.

Saliency Detection Superpixels

Paper
Add Code

HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition

no code implementations • ICCV 2015 • Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, Yizhou Yu

In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy.

Image Classification Object Recognition

Paper
Add Code

Efficient Media Retrieval from Non-Cooperative Queries

no code implementations • 19 Nov 2014 • Kevin Shih, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu

Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names.

Optical Character Recognition (OCR) Retrieval +1

Paper
Add Code

Fashion Apparel Detection: The Role of Deep Convolutional Neural Network and Pose-dependent Priors

no code implementations • 19 Nov 2014 • Kota Hara, Vignesh Jagadeesh, Robinson Piramuthu

In this work, we propose and address a new computer vision task, which we call fashion item detection, where the aim is to detect various fashion items a person in the image is wearing or carrying.

object-detection Object Detection

Paper
Add Code

ConceptLearner: Discovering Visual Concepts from Weakly Labeled Image Collections

no code implementations • CVPR 2015 • Bolei Zhou, Vignesh Jagadeesh, Robinson Piramuthu

Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition system, since it is expensive to obtain fully labeled data for a large number of concept categories.

object-detection Object Detection +1

Paper
Add Code

Im2Fit: Fast 3D Model Fitting and Anthropometrics using Single Consumer Depth Camera and Synthetic Data

no code implementations • 3 Oct 2014 • Qiaosong Wang, Vignesh Jagadeesh, Bryan Ressler, Robinson Piramuthu

In this paper, we propose a method for capturing accurate human body shape and anthropometrics from a single consumer grade depth sensor.

Virtual Try-on

Paper
Add Code

HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition

4 code implementations • 3 Oct 2014 • Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis Decoste, Wei Di, Yizhou Yu

In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy.

Ranked #174 on Image Classification on CIFAR-100

Image Classification Object Recognition

Paper
Code

When relevance is not Enough: Promoting Visual Attractiveness for Fashion E-commerce

no code implementations • 13 Jun 2014 • Wei Di, Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu, Elizabeth Churchill

This study aims to address the effectiveness of types of image in showcasing fashion apparel in terms of its attractiveness, i. e. the ability to draw consumer's attention, interest, and in return their engagement.

Human-Computer Interaction K.4.4; H.2.8