Search Results for author: Hardik Shah

Found 9 papers, 2 papers with code

Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability

no code implementations • 31 Jan 2024 • Navin Kamuni, Hardik Shah, Sathishkumar Chintala, Naveen Kunchakuri, Sujatha Alla Old Dominion

This policy is trained by reinforcement learning algorithms by taking advantage of an environment in which an agent receives feedback in the form of a reward signal.

reinforcement-learning Semantic Similarity +1

Paper
Add Code

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

no code implementations • 17 Nov 2023 • Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

Evaluation results show our method improves visual quality by 14%, prompt alignment by 16. 2% and scene diversity by 15. 3%, compared to prompt engineering the base Emu model for stickers generation.

Image Generation Prompt Engineering

Paper
Add Code

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

1 code implementation • ICCV 2023 • Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks.

Ranked #1 on Video Summarization on Query-Focused Video Summarization Dataset

Action Recognition Moment Queries +4

Paper
Code

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

no code implementations • 9 Jun 2023 • Anshul Nasery, Hardik Shah, Arun Sai Suggala, Prateek Jain

Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization.

Neural Architecture Search Neural Network Compression +2

Paper
Add Code

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations • 31 Mar 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

Paper
Add Code

DIME-FM : DIstilling Multimodal and Efficient Foundation Models

no code implementations • ICCV 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.

Image Classification

Paper
Add Code

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

no code implementations • 8 Nov 2022 • Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard

We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection.

Benchmarking Retrieval

Paper
Add Code

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks

no code implementations • 10 Oct 2022 • Pedro Rodriguez, Mahmoud Azab, Becka Silvert, Renato Sanchez, Linzy Labson, Hardik Shah, Seungwhan Moon

Searching troves of videos with textual descriptions is a core multimodal retrieval task.

Retrieval Text to Video Retrieval +2

Paper
Add Code

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

1 code implementation • 9 Oct 2022 • Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann Lecun, Rama Chellappa

Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations.

object-detection Object Detection +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.