no code implementations • EMNLP 2020 • Victor Martinez, Krishna Somandepalli, Yalda Tehranian-Uhls, Shrikanth Narayanan
Exposure to violent, sexual, or substance-abuse content in media increases the willingness of children and adolescents to imitate similar behaviors.
no code implementations • 22 May 2024 • Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli
Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process.
no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #4 on Text-to-Video Generation on MSR-VTT
no code implementations • 7 Sep 2023 • Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou
Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult.
no code implementations • 27 Aug 2023 • Digbalay Bose, Rajat Hebbar, Tiantian Feng, Krishna Somandepalli, Anfeng Xu, Shrikanth Narayanan
Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures.
1 code implementation • 13 Mar 2023 • Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan
The process of human affect understanding involves the ability to infer person specific emotional states from various sources including images, speech, and language.
1 code implementation • 5 Mar 2023 • Amir Shirian, Mona Ahmadian, Krishna Somandepalli, Tanaya Guha
Heterogeneous graphs provide a compact, efficient, and scalable way to model data involving multiple disparate modalities.
1 code implementation • 14 Feb 2023 • Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai, Shrikanth Narayanan
In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S).
1 code implementation • 20 Oct 2022 • Digbalay Bose, Rajat Hebbar, Krishna Somandepalli, Haoyang Zhang, Yin Cui, Kree Cole-McLaughlin, Huisheng Wang, Shrikanth Narayanan
Longform media such as movies have complex narrative structures, with events spanning a rich variety of ambient visual scenes.
1 code implementation • 16 Jul 2022 • Amir Shirian, Krishna Somandepalli, Victor Sanchez, Tanaya Guha
In contrast, we employ heterogeneous graphs to explicitly capture the spatial and temporal relationships between the modalities and represent detailed information about the underlying signal.
no code implementations • 24 Jun 2022 • Josh Belanich, Krishna Somandepalli, Brian Eoff, Brendan Jou
This technical report presents the modeling approaches used in our submission to the ICML Expressive Vocalizations Workshop & Competition multitask track (ExVo-MultiTask).
1 code implementation • 17 Mar 2022 • Raghuveer Peri, Krishna Somandepalli, Shrikanth Narayanan
In this paper, we systematically evaluate the biases present in speaker recognition systems with respect to gender across a range of system operating points.
1 code implementation • 31 Jan 2022 • Amir Shirian, Krishna Somandepalli, Tanaya Guha
Large scale databases with high-quality manual annotations are scarce in audio domain.
no code implementations • 13 Oct 2021 • Digbalay Bose, Krishna Somandepalli, Souvik Kundu, Rimita Lahiri, Jonathan Gratch, Shrikanth Narayanan
Computational modeling of the emotions evoked by art in humans is a challenging problem because of the subjective and nuanced nature of art and affective signals.
1 code implementation • 8 Oct 2021 • Sabyasachee Baruah, Krishna Somandepalli, Shrikanth Narayanan
We analyze the frequency and sentiment trends of different occupations, study the effect of media attributes like genre, country of production, and title type on these trends, and investigate if the incidence of professions in media subtitles correlate with their real-world employment statistics.
no code implementations • 25 Aug 2020 • Krishna Somandepalli, Rajat Hebbar, Shrikanth Narayanan
Our work in this paper focuses on two key aspects of this problem: the lack of domain-specific training or benchmark datasets, and adapting face embeddings learned on web images to long-form content, specifically movies.
no code implementations • 19 Aug 2020 • Victor R. Martinez, Krishna Somandepalli, Karan Singla, Anil Ramanakrishna, Yalda T. Uhls, Shrikanth Narayanan
To date, we are the first to show that language used in movie scripts is a strong indicator of violent content, and that there are systematic portrayals of certain demographics as victims and perpetrators in a large dataset.
no code implementations • 12 May 2020 • Krishna Somandepalli, Shrikanth Narayanan
A key objective in multi-view learning is to model the information common to multiple parallel views of a class of objects/events to improve downstream learning tasks.
no code implementations • 9 Mar 2020 • Rahul Sharma, Krishna Somandepalli, Shrikanth Narayanan
Avoiding the need for manual annotations for active speakers in visual frames, acquiring of which is very expensive, we present a weakly supervised system for the task of localizing active speakers in movie content.
1 code implementation • 3 Nov 2019 • Raghuveer Peri, Monisankha Pal, Arindam Jati, Krishna Somandepalli, Shrikanth Narayanan
In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations.
1 code implementation • 3 Apr 2019 • Krishna Somandepalli, Naveen Kumar, Ruchir Travadi, Shrikanth Narayanan
We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an extension to representation learning using CCA when the underlying signal is observed across multiple (more than two) modalities.