Search Results for author: Ashish Shah

Found 11 papers, 2 papers with code

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

1 code implementation • 8 Apr 2024 • Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.

Ranked #1 on Video Classification on COIN

Question Answering Video Captioning +4

112

Paper
Code

Object-Centric Unsupervised Image Captioning

1 code implementation • 2 Dec 2021 • Zihang Meng, David Yang, Xuefei Cao, Ashish Shah, Ser-Nam Lim

Our work in this paper overcomes this by harvesting objects corresponding to a given sentence from the training set, even if they don't belong to the same image.

Image Captioning Object +1

Paper
Code

Testing-Time Adaptation through Online Normalization Estimation

no code implementations • 29 Sep 2021 • Xuefeng Hu, Mustafa Uzunbas, Bor-Chun Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim

We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.

Test-time Adaptation Unsupervised Domain Adaptation +1

Paper
Add Code

Fast Adaptive Anomaly Detection

no code implementations • 29 Sep 2021 • Ze Wang, Yipin Zhou, Rui Wang, Tsung-Yu Lin, Ashish Shah, Ser-Nam Lim

Anything outside of a given normal population is by definition an anomaly.

Anomaly Detection Meta-Learning

Paper
Add Code

Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation

no code implementations • 9 Oct 2021 • Peirong Liu, Rui Wang, Xuefei Cao, Yipin Zhou, Ashish Shah, Ser-Nam Lim

Key findings are twofold: (1) by capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field, and (2) by utilizing the source image itself, we are able to inpaint occluded/missing regions arising from large motion changes.

Image Animation Motion Estimation

Paper
Add Code

MixNorm: Test-Time Adaptation Through Online Normalization Estimation

no code implementations • 21 Oct 2021 • Xuefeng Hu, Gokhan Uzunbas, Sirius Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim

We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.

Test-time Adaptation Unsupervised Domain Adaptation +1

Paper
Add Code

Raising the Bar on the Evaluation of Out-of-Distribution Detection

no code implementations • 24 Sep 2022 • Jishnu Mukhoti, Tsung-Yu Lin, Bor-Chun Chen, Ashish Shah, Philip H. S. Torr, Puneet K. Dokania, Ser-Nam Lim

In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +2

Paper
Add Code

Unifying Tracking and Image-Video Object Detection

no code implementations • 20 Nov 2022 • Peirong Liu, Rui Wang, Pengchuan Zhang, Omid Poursaeed, Yipin Zhou, Xuefei Cao, Sreya Dutta Roy, Ashish Shah, Ser-Nam Lim

We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model.

Multi-Object Tracking Object +2

Paper
Add Code

Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning

no code implementations • CVPR 2023 • Jishnu Mukhoti, Tsung-Yu Lin, Omid Poursaeed, Rui Wang, Ashish Shah, Philip H. S. Torr, Ser-Nam Lim

We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility function for CLIP's contrastive loss, intending to train an alignment between the patch tokens of the vision encoder and the CLS token of the text encoder.

Ranked #1 on Open Vocabulary Semantic Segmentation on Cityscape-171

Contrastive Learning Image Classification +5

Paper
Add Code

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

no code implementations • 20 Sep 2023 • Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, SerNam Lim

While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length.

Temporal Action Localization Video Classification +1

Paper
Add Code

Universal Pyramid Adversarial Training for Improved ViT Performance

no code implementations • 26 Dec 2023 • Ping-Yeh Chiang, Yipin Zhou, Omid Poursaeed, Satya Narayan Shukla, Ashish Shah, Tom Goldstein, Ser-Nam Lim

Recently, Pyramid Adversarial training (Herrmann et al., 2022) has been shown to be very effective for improving clean accuracy and distribution-shift robustness of vision transformers.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.