Search Results for author: Robinson Piramuthu

Found 33 papers, 9 papers with code

When relevance is not Enough: Promoting Visual Attractiveness for Fashion E-commerce

no code implementations13 Jun 2014 Wei Di, Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu, Elizabeth Churchill

This study aims to address the effectiveness of types of image in showcasing fashion apparel in terms of its attractiveness, i. e. the ability to draw consumer's attention, interest, and in return their engagement.

Human-Computer Interaction K.4.4; H.2.8

Im2Fit: Fast 3D Model Fitting and Anthropometrics using Single Consumer Depth Camera and Synthetic Data

no code implementations3 Oct 2014 Qiaosong Wang, Vignesh Jagadeesh, Bryan Ressler, Robinson Piramuthu

In this paper, we propose a method for capturing accurate human body shape and anthropometrics from a single consumer grade depth sensor.

Virtual Try-on

ConceptLearner: Discovering Visual Concepts from Weakly Labeled Image Collections

no code implementations CVPR 2015 Bolei Zhou, Vignesh Jagadeesh, Robinson Piramuthu

Discovering visual knowledge from weakly labeled data is crucial to scale up computer vision recognition system, since it is expensive to obtain fully labeled data for a large number of concept categories.

object-detection Object Detection +1

Efficient Media Retrieval from Non-Cooperative Queries

no code implementations19 Nov 2014 Kevin Shih, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu

Text is ubiquitous in the artificial world and easily attainable when it comes to book title and author names.

Optical Character Recognition (OCR) Retrieval +1

Fashion Apparel Detection: The Role of Deep Convolutional Neural Network and Pose-dependent Priors

no code implementations19 Nov 2014 Kota Hara, Vignesh Jagadeesh, Robinson Piramuthu

In this work, we propose and address a new computer vision task, which we call fashion item detection, where the aim is to detect various fashion items a person in the image is wearing or carrying.

object-detection Object Detection

GraB: Visual Saliency via Novel Graph Model and Background Priors

no code implementations CVPR 2016 Qiaosong Wang, Wen Zheng, Robinson Piramuthu

We propose an unsupervised bottom-up saliency detection approach by exploiting novel graph structure and background priors.

Saliency Detection Superpixels

Visual Search at eBay

no code implementations10 Jun 2017 Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, Robinson Piramuthu

We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale.

Towards the Success Rate of One: Real-time Unconstrained Salient Object Detection

no code implementations31 Jul 2017 Mahyar Najibi, Fan Yang, Qiaosong Wang, Robinson Piramuthu

In this work, we propose an efficient and effective approach for unconstrained salient object detection in images using deep convolutional neural networks.

Object object-detection +2

Conditional Image-Text Embedding Networks

1 code implementation ECCV 2018 Bryan A. Plummer, Paige Kordas, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu, Svetlana Lazebnik

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model.

Phrase Grounding

Adversarial Learning for Fine-grained Image Search

no code implementations6 Jul 2018 Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu

Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories.

Generative Adversarial Network Image Retrieval

Give me a hint! Navigating Image Databases using Human-in-the-loop Feedback

no code implementations24 Sep 2018 Bryan A. Plummer, M. Hadi Kiapour, Shuai Zheng, Robinson Piramuthu

In this paper, we introduce an attribute-based interactive image search which can leverage human-in-the-loop feedback to iteratively refine image search results.

Attribute Image Retrieval

Brand > Logo: Visual Analysis of Fashion Brands

1 code implementation23 Oct 2018 M. Hadi Kiapour, Robinson Piramuthu

In this work, we analyze learned visual representations by deep networks that are trained to recognize fashion brands.

Marketing

Understanding Image Quality and Trust in Peer-to-Peer Marketplaces

no code implementations26 Nov 2018 Xiao Ma, Lina Mezghani, Kimberly Wilber, Hui Hong, Robinson Piramuthu, Mor Naaman, Serge Belongie

In this work, we conducted a large-scale study on the quality of user-generated images in peer-to-peer marketplaces.

Mobile Head Tracking for eCommerce and Beyond

1 code implementation18 Dec 2018 Muratcan Cicek, Jinrong Xie, Qiaosong Wang, Robinson Piramuthu

Unlike desktop and laptop computers, they are also much easier to carry indoors and outdoors. To address this, we implement and open source button that is sensitive to head movements tracked from the front camera of iPhone X.

Human-Computer Interaction

VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator

1 code implementation Findings (ACL) 2022 Ayush Shrivastava, Karthik Gopalakrishnan, Yang Liu, Robinson Piramuthu, Gokhan Tür, Devi Parikh, Dilek Hakkani-Tür

Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN).

Binary Classification Imitation Learning +3

TEACh: Task-driven Embodied Agents that Chat

3 code implementations1 Oct 2021 Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes.

Dialogue Understanding

Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy

1 code implementation15 Oct 2022 Shiyuan Huang, Robinson Piramuthu, Shih-Fu Chang, Gunnar A. Sigurdsson

Specifically, we insert a lightweight Feature Compression Module (FeatComp) into a VideoQA model which learns to extract task-specific tiny features as little as 10 bits, which are optimal for answering certain types of questions.

Feature Compression Question Answering +1

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation

no code implementations30 Nov 2022 Vishnu Sashank Dorbala, Gunnar Sigurdsson, Robinson Piramuthu, Jesse Thomason, Gaurav S. Sukhatme

Our results on the coarse-grained instruction following task of REVERIE demonstrate the navigational capability of CLIP, surpassing the supervised baseline in terms of both success rate (SR) and success weighted by path length (SPL).

Instruction Following Object Recognition +1

RREx-BoT: Remote Referring Expressions with a Bag of Tricks

no code implementations30 Jan 2023 Gunnar A. Sigurdsson, Jesse Thomason, Gaurav S. Sukhatme, Robinson Piramuthu

Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9. 84% on remote object grounding above state of the art models for REVERIE and of 5. 04% on FAO.

Object Object Localization

Decision Making for Human-in-the-loop Robotic Agents via Uncertainty-Aware Reinforcement Learning

no code implementations12 Mar 2023 Siddharth Singi, Zhanpeng He, Alvin Pan, Sandip Patel, Gunnar A. Sigurdsson, Robinson Piramuthu, Shuran Song, Matei Ciocarlie

In a Human-in-the-Loop paradigm, a robotic agent is able to act mostly autonomously in solving a task, but can request help from an external expert when needed.

Decision Making

Characterizing Video Question Answering with Sparsified Inputs

no code implementations27 Nov 2023 Shiyuan Huang, Robinson Piramuthu, Vicente Ordonez, Shih-Fu Chang, Gunnar A. Sigurdsson

From our experiments, we have observed only 5. 2%-5. 8% loss of performance with only 10% of video lengths, which corresponds to 2-4 frames selected from each video.

Question Answering Video Question Answering

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

no code implementations28 Nov 2023 Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu

Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real-world applications, so building a lightweight VL architecture and an efficient learning schema is of great practical value.

Language Modelling Question Answering +3

Cannot find the paper you are looking for? You can Submit a new open access paper.