Search Results for author: Subarna Tripathi

Found 31 papers, 13 papers with code

VideoSAGE: Video Summarization with Graph Representation Learning

1 code implementation • 14 Apr 2024 • Jose M. Rojas Chaves, Subarna Tripathi

We propose a graph-based representation learning framework for video summarization.

Graph Representation Learning Node Classification +1

Paper
Code

Action Scene Graphs for Long-Form Understanding of Egocentric Videos

1 code implementation • 6 Dec 2023 • Ivan Rodin, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella

We present Egocentric Action Scene Graphs (EASGs), a new representation for long-form understanding of egocentric videos.

Action Anticipation Video Understanding

Paper
Code

Single-Stage Visual Relationship Learning using Conditional Queries

no code implementations • 9 Jun 2023 • Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships.

Graph Generation Multi-Task Learning +1

Paper
Add Code

SViTT: Temporal Learning of Sparse Video-Text Transformers

1 code implementation • CVPR 2023 • Yi Li, Kyle Min, Subarna Tripathi, Nuno Vasconcelos

Do video-text transformers learn to model temporal relationships across frames?

Ranked #4 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)

Question Answering Retrieval +3

Paper
Code

Unbiased Scene Graph Generation in Videos

1 code implementation • CVPR 2023 • Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury

The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG.

Graph Generation Unbiased Scene Graph Generation

Paper
Code

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

2 code implementations • 15 Jul 2022 • Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows.

Ranked #1 on Node Classification on AVA

Audio-Visual Active Speaker Detection Graph Learning +1

Paper
Code

Text Spotting Transformers

1 code implementation • CVPR 2022 • Xiang Zhang, Yongwen Su, Subarna Tripathi, Zhuowen Tu

In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild.

Ranked #6 on Text Spotting on ICDAR 2015

Text Detection Text Spotting

170

Paper
Code

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos

1 code implementation • CVPR 2022 • Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang

To tackle this task, we first provide an automatic way to collect trajectory and hotspots labels on large-scale data.

Object

Paper
Code

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

1 code implementation • 18 Dec 2021 • Shengyu Feng, Subarna Tripathi, Hesham Mostafa, Marcel Nassar, Somdeb Majumdar

Dynamic scene graph generation from a video is challenging due to the temporal dynamics of the scene and the inherent temporal fluctuations of predictions.

Graph Generation Object +3

Paper
Code

Learning Spatial-Temporal Graphs for Active Speaker Detection

no code implementations • 2 Dec 2021 • Sourya Roy, Kyle Min, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

We address the problem of active speaker detection through a new framework, called SPELL, that learns long-range multimodal graphs to encode the inter-modal relationship between audio and visual data.

Audio-Visual Active Speaker Detection Node Classification

Paper
Add Code

Towards Panoptic 3D Parsing for Single Image in the Wild

no code implementations • 4 Nov 2021 • Sainan Liu, Vincent Nguyen, Yuan Gao, Subarna Tripathi, Zhuowen Tu

Our proposed panoptic 3D parsing framework points to a promising direction in computer vision.

3D Reconstruction 3D Shape Reconstruction +8

Paper
Add Code

Learning of Visual Relations: The Devil is in the Tails

no code implementations • ICCV 2021 • Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

Significant effort has been recently devoted to modeling visual relations.

Graph Generation Scene Graph Generation

Paper
Add Code

In Defense of Scene Graphs for Image Captioning

1 code implementation • ICCV 2021 • Kien Nguyen, Subarna Tripathi, Bang Du, Tanaya Guha, Truong Q. Nguyen

Several studies have noted that the naive use of scene graphs from a black-box scene graph generator harms image captioning performance and that scene graph-based captioning models have to incur the overhead of explicit use of image features to generate decent captions.

Human-Object Interaction Detection Image Captioning

Paper
Code

Structured Query-Based Image Retrieval Using Scene Graphs

no code implementations • 13 May 2020 • Brigit Schroeder, Subarna Tripathi

A structured query can capture the complexity of object interactions (e. g. 'woman rides motorcycle') unlike single objects (e. g. 'woman' or 'motorcycle').

Image Retrieval Object +1

Paper
Add Code

Triplet-Aware Scene Graph Embeddings

no code implementations • 19 Sep 2019 • Brigit Schroeder, Subarna Tripathi, Hanlin Tang

We see a significant performance increase in both metrics that measure the goodness of layout prediction, mean intersection-over-union (mIoU)(52. 3% vs. 49. 2%) and relation score (61. 7% vs. 54. 1%), after the addition of triplet supervision and data augmentation.

Data Augmentation Graph Embedding +7

Paper
Add Code

Compact Scene Graphs for Layout Composition and Patch Retrieval

no code implementations • 19 Apr 2019 • Subarna Tripathi, Sharath Nittur Sridhar, Sairam Sundaresan, Hanlin Tang

Structured representations such as scene graphs serve as an efficient and compact representation that can be used for downstream rendering or retrieval tasks.

Image Generation Retrieval

Paper
Add Code

Heuristics for Image Generation from Scene Graphs

no code implementations • ICLR Workshop LLD 2019 • Subarna Tripathi, Anahita Bhiwandiwalla, Alexei Bastidas, Hanlin Tang

Existing scene graph to image models have two stages: (1) a scene composition stage, and an (2) image generation stage.

Image Generation from Scene Graphs Relation

Paper
Add Code

Toward Joint Image Generation and Compression using Generative Adversarial Networks

no code implementations • 23 Jan 2019 • Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen

The proposed method is a promising baseline method for joint image generation and compression using generative adversarial networks.

Generative Adversarial Network Image Compression +2

Paper
Add Code

Using Scene Graph Context to Improve Image Generation

no code implementations • 11 Jan 2019 • Subarna Tripathi, Anahita Bhiwandiwalla, Alexei Bastidas, Hanlin Tang

Generating realistic images from scene graphs asks neural networks to be able to reason about object relationships and compositionality.

Image Generation from Scene Graphs Open-Ended Question Answering +1

Paper
Add Code

PartNet: A Large-scale Benchmark for Fine-grained and Hierarchical Part-level 3D Object Understanding

5 code implementations • CVPR 2019 • Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, Hao Su

We present PartNet: a consistent, large-scale dataset of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information.

Ranked #3 on 3D Instance Segmentation on PartNet

3D Instance Segmentation 3D Semantic Segmentation +2

1,349

Paper
Code

Correction by Projection: Denoising Images with Generative Adversarial Networks

no code implementations • 12 Mar 2018 • Subarna Tripathi, Zachary C. Lipton, Truong Q. Nguyen

In this paper, we propose to denoise corrupted images by finding the nearest point on the GAN manifold, recovering latent vectors by minimizing distances in image space.

Denoising

Paper
Add Code

LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

no code implementations • 16 May 2017 • Subarna Tripathi, Gokce Dane, Byeongkeun Kang, Vasudev Bhaskaran, Truong Nguyen

Thus the consolidation of a CNN-based object detection for an embedded system is more challenging.

Face Detection Image Classification +4

Paper
Add Code

Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

no code implementations • 4 Apr 2017 • Subarna Tripathi, Maxwell Collins, Matthew Brown, Serge Belongie

In a more realistic environment, without the oracle keypoints, the proposed deep person instance segmentation model conditioned on human pose achieves 3. 8% to 10. 5% relative improvements comparing with its strongest baseline of a deep network trained only for segmentation.

Instance Segmentation Segmentation +1

Paper
Add Code

Precise Recovery of Latent Vectors from Generative Adversarial Networks

1 code implementation • 15 Feb 2017 • Zachary C. Lipton, Subarna Tripathi

Generative adversarial networks (GANs) transform latent vectors into visually plausible images.

Paper
Code

A Statistical Approach to Continuous Self-Calibrating Eye Gaze Tracking for Head-Mounted Virtual Reality Systems

no code implementations • 20 Dec 2016 • Subarna Tripathi, Brian Guenter

This eliminates the need for an explicit calibration step and automatically compensates for small movements of the headset with respect to the head.

Position regression

Paper
Add Code

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks

no code implementations • 15 Jul 2016 • Subarna Tripathi, Zachary C. Lipton, Serge Belongie, Truong Nguyen

Then we train a recurrent neural network that takes as input sequences of pseudo-labeled frames and optimizes an objective that encourages both accuracy on the target frame and consistency across consecutive frames.

Object object-detection +1

Paper
Add Code

Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation

no code implementations • 20 Jan 2016 • Subarna Tripathi, Serge Belongie, Youngbae Hwang, Truong Nguyen

We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion.

Clustering Object +2

Paper
Add Code

Real-time Sign Language Fingerspelling Recognition using Convolutional Neural Networks from Depth map

1 code implementation • 10 Sep 2015 • Byeongkeun Kang, Subarna Tripathi, Truong Q. Nguyen

We train CNNs for the classification of 31 alphabets and numbers using a subset of collected depth data from multiple subjects.

Sign Language Recognition

Paper
Code

Semantic Video Segmentation : Exploring Inference Efficiency

1 code implementation • 4 Sep 2015 • Subarna Tripathi, Serge Belongie, Youngbae Hwang, Truong Nguyen

We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames.

Image Segmentation Segmentation +3

Paper
Code

Beyond Semantic Image Segmentation : Exploring Efficient Inference in Video

no code implementations • 1 Jul 2015 • Subarna Tripathi, Serge Belongie, Truong Nguyen

We explore the efficiency of the CRF inference module beyond image level semantic segmentation.

Image Segmentation Segmentation +2

Paper
Add Code

Improving Streaming Video Segmentation with Early and Mid-Level Visual Processing

no code implementations • 14 Feb 2014 • Subarna Tripathi, Youngbae Hwang, Serge Belongie, Truong Nguyen

Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues.

Motion Segmentation Segmentation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.