Search Results for author: Shraman Pramanick

Found 11 papers, 6 papers with code

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

no code implementations • 19 Dec 2023 • Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework.

Attribute Language Modelling +1

Paper
Add Code

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

no code implementations • 30 Nov 2023 • Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei HUANG, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge.

Video Understanding

Paper
Add Code

UniVTG: Towards Unified Video-Language Temporal Grounding

1 code implementation • ICCV 2023 • Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou

Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels.

Ranked #3 on Natural Language Moment Retrieval on TACoS

Highlight Detection Moment Retrieval +3

279

Paper
Code

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

1 code implementation • ICCV 2023 • Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks.

Ranked #1 on Video Summarization on Query-Focused Video Summarization Dataset

Action Recognition Moment Queries +4

Paper
Code

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

1 code implementation • 9 Oct 2022 • Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann Lecun, Rama Chellappa

Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations.

object-detection Object Detection +2

Paper
Code

Where in the World is this Image? Transformer-based Geo-localization in the Wild

1 code implementation • 29 Apr 2022 • Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem.

Ranked #3 on Photo geolocation estimation on Im2GPS3k

Photo geolocation estimation Scene Recognition +1

Paper
Code

Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

no code implementations • 21 Oct 2021 • Shraman Pramanick, Aniket Roy, Vishal M. Patel

Multimodal learning is an emerging yet challenging research area.

Humor Detection Sarcasm Detection

Paper
Add Code

Detecting Harmful Memes and Their Targets

no code implementations • Findings (ACL) 2021 • Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

In this work, we propose two novel problem formulations: detecting harmful memes and the social entities that these harmful memes target.

Hateful Meme Classification

Paper
Add Code

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

1 code implementation • Findings (EMNLP) 2021 • Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

We focus on two tasks: (i)detecting harmful memes, and (ii)identifying the social entities they target.

Paper
Code

See, Hear, Read: Leveraging Multimodality with Guided Attention for Abstractive Text Summarization

no code implementations • 20 May 2021 • Yash Kumar Atri, Shraman Pramanick, Vikram Goyal, Tanmoy Chakraborty

However, existing methods use short videos as the visual modality and short summary as the ground-truth, therefore, perform poorly on lengthy videos and long ground-truth summary.

Abstractive Text Summarization Language Modelling

Paper
Add Code

Exercise? I thought you said 'Extra Fries': Leveraging Sentence Demarcations and Multi-hop Attention for Meme Affect Analysis

1 code implementation • 23 Mar 2021 • Shraman Pramanick, Md Shad Akhtar, Tanmoy Chakraborty

We evaluate MHA-MEME on the 'Memotion Analysis' dataset for all three sub-tasks - sentiment classification, affect classification, and affect class quantification.

Sentence Sentiment Analysis +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.