Search Results for author: Subarna Tripathi

Found 31 papers, 13 papers with code

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

1 code implementation6 Dec 2023 Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella

We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos.

Action Anticipation Video Understanding

Single-Stage Visual Relationship Learning using Conditional Queries

no code implementations9 Jun 2023 Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships.

Graph Generation Multi-Task Learning +1

Unbiased Scene Graph Generation in Videos

1 code implementation CVPR 2023 Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury

The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG.

Graph Generation Unbiased Scene Graph Generation

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

2 code implementations15 Jul 2022 Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.

Audio-Visual Active Speaker Detection Graph Learning +1

Text Spotting Transformers

1 code implementation CVPR 2022 Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu

In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.

Text Detection Text Spotting

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos

1 code implementation CVPR 2022 Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang

To tackle this task, we first provide an automatic way to collect trajectory and hotspots labels on large-scale data.

Object

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

1 code implementation18 Dec 2021 Shengyu Feng, Subarna Tripathi, Hesham Mostafa, Marcel Nassar, Somdeb Majumdar

Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions.

Graph Generation Object +3

Learning Spatial-Temporal Graphs for Active Speaker Detection

no code implementations2 Dec 2021 Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.

Audio-Visual Active Speaker Detection Node Classification

In Defense of Scene Graphs for Image Captioning

1 code implementation ICCV 2021 Kien Nguyen, Subarna Tripathi, Bang Du, Tanaya Guha, Truong Q. Nguyen

Several studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image captioning performance and that scene graph-based captioning models have to incur the overhead of explicit use of image features to generate decent captions.

Human-Object Interaction Detection Image Captioning

Structured Query-Based Image Retrieval Using Scene Graphs

no code implementations13 May 2020 Brigit Schroeder, Subarna Tripathi

A structured query can capture the complexity of object interactions (e. g. 'woman rides motorcycle') unlike single objects (e. g. 'woman' or 'motorcycle').

Image Retrieval Object +1

Triplet-Aware Scene Graph Embeddings

no code implementations19 Sep 2019 Brigit Schroeder, Subarna Tripathi, Hanlin Tang

We see a significant performance increase in both metrics that measure the goodness of layout prediction, mean intersection-over-union (mIoU)(52. 3% vs. 49. 2%) and relation score (61. 7% vs. 54. 1%), after the addition of triplet supervision and data augmentation.

Data Augmentation Graph Embedding +7

Compact Scene Graphs for Layout Composition and Patch Retrieval

no code implementations19 Apr 2019 Subarna Tripathi, Sharath Nittur Sridhar, Sairam Sundaresan, Hanlin Tang

Structured representations such as scene graphs serve as an efficient and compact representation that can be used for downstream rendering or retrieval tasks.

Image Generation Retrieval

Toward Joint Image Generation and Compression using Generative Adversarial Networks

no code implementations23 Jan 2019 Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen

The proposed method is a promising baseline method for joint image generation and compression using generative adversarial networks.

Generative Adversarial Network Image Compression +2

Using Scene Graph Context to Improve Image Generation

no code implementations11 Jan 2019 Subarna Tripathi, Anahita Bhiwandiwalla, Alexei Bastidas, Hanlin Tang

Generating realistic images from scene graphs asks neural networks to be able to reason about object relationships and compositionality.

Image Generation from Scene Graphs Open-Ended Question Answering +1

Correction by Projection: Denoising Images with Generative Adversarial Networks

no code implementations12 Mar 2018 Subarna Tripathi, Zachary C. Lipton, Truong Q. Nguyen

In this paper, we propose to denoise corrupted images by finding the nearest point on the GAN manifold, recovering latent vectors by minimizing distances in image space.

Denoising

Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

no code implementations4 Apr 2017 Subarna Tripathi, Maxwell Collins, Matthew Brown, Serge Belongie

In a more realistic environment, without the oracle keypoints, the proposed deep person instance segmentation model conditioned on human pose achieves 3. 8% to 10. 5% relative improvements comparing with its strongest baseline of a deep network trained only for segmentation.

Instance Segmentation Segmentation +1

Precise Recovery of Latent Vectors from Generative Adversarial Networks

1 code implementation15 Feb 2017 Zachary C. Lipton, Subarna Tripathi

Generative adversarial networks (GANs) transform latent vectors into visually plausible images.

A Statistical Approach to Continuous Self-Calibrating Eye Gaze Tracking for Head-Mounted Virtual Reality Systems

no code implementations20 Dec 2016 Subarna Tripathi, Brian Guenter

This eliminates the need for an explicit calibration step and automatically compensates for small movements of the headset with respect to the head.

Position regression

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks

no code implementations15 Jul 2016 Subarna Tripathi, Zachary C. Lipton, Serge Belongie, Truong Nguyen

Then we train a recurrent neural network that takes as input sequences of pseudo-labeled frames and optimizes an objective that encourages both accuracy on the target frame and consistency across consecutive frames.

Object object-detection +1

Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation

no code implementations20 Jan 2016 Subarna Tripathi, Serge Belongie, Youngbae Hwang, Truong Nguyen

We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion.

Clustering Object +2

Real-time Sign Language Fingerspelling Recognition using Convolutional Neural Networks from Depth map

1 code implementation10 Sep 2015 Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen

We train CNNs for the classification of 31 alphabets and numbers using a subset of collected depth data from multiple subjects.

Sign Language Recognition

Semantic Video Segmentation : Exploring Inference Efficiency

1 code implementation4 Sep 2015 Subarna Tripathi, Serge Belongie, Youngbae Hwang, Truong Nguyen

We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames.

Image Segmentation Segmentation +3

Improving Streaming Video Segmentation with Early and Mid-Level Visual Processing

no code implementations14 Feb 2014 Subarna Tripathi, Youngbae Hwang, Serge Belongie, Truong Nguyen

Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues.

Motion Segmentation Segmentation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.