no code implementations • 1 May 2024 • Prateek Verma, Minh-Hao Van, Xintao Wu
VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning.
no code implementations • 21 Feb 2024 • Minh-Hao Van, Prateek Verma, Xintao Wu
Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.
no code implementations • 15 Sep 2023 • Prateek Verma
With the advent of modern AI architectures, a shift has happened towards end-to-end architectures.
no code implementations • 20 Aug 2023 • Prateek Verma
Further, we can also use the convolution operation with a signal to be learned from scratch, and we will explore papers in the literature that uses this with that robust Transformer architectures.
no code implementations • 3 Aug 2023 • Prateek Verma, Shan Zhong, Xiaoyu Liu, Adithya Rajan
Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box.
no code implementations • 2 Jul 2023 • Prateek Verma
This work aims to adapt these architectures in a causal setup for training LLMs.
no code implementations • 9 Jun 2023 • Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky
We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops.
no code implementations • 18 Mar 2023 • Prateek Verma, Chris Chafe
In this work, we propose a way of computing a content-adaptive learnable time-frequency representation.
no code implementations • 27 Oct 2022 • Prateek Verma, Chris Chafe, Jonathan Berger
Typically, researchers use an excitation such as a pistol shot or balloon pop as an impulse signal with which an auralization can be created.
no code implementations • 1 Sep 2022 • Rishabh Dahale, Vaibhav Talwadker, Preeti Rao, Prateek Verma
We use the transformer sequence to sequence model to generate a basic drum pattern conditioned on the melodic accompaniment to find that improvisation is largely absent, attributed possibly to its expectedly relatively low representation in the training data.
no code implementations • 16 Aug 2022 • Prateek Verma, Jonathan Berger
A convolutional architecture is first trained to take in an audio sample and mimic the ratings of experts with about 78 % accuracy for various instrument families and notes for perceptual qualities.
no code implementations • 16 Jun 2022 • Prateek Verma
Modeling long-term dependencies for audio signals is a particularly challenging problem, as even small-time scales yield on the order of a hundred thousand samples.
no code implementations • 7 Oct 2021 • Prateek Verma
A classification head (a feed-forward layer), similar to the approach in SimCLR is trained on a learned representation.
no code implementations • 30 Jun 2021 • Prateek Verma, Chris Chafe
We show how causal transformer generative models can be used for raw waveform synthesis.
no code implementations • 1 May 2021 • Prateek Verma, Jonathan Berger
In addition, we also show how multi-rate signal processing ideas inspired from wavelets, can be applied to the Transformer embeddings to improve the results.
Ranked #8 on Audio Classification on FSD50K
no code implementations • 3 Dec 2020 • Bibit Bianchini, Prateek Verma, Kenneth Salisbury
We demonstrate high classification accuracies among our proposed gesture definitions on a test set, emphasizing that neural net-work classifiers on the raw data outperform other combinations of feature sets and algorithms.
no code implementations • 22 Oct 2020 • Prateek Verma, Julius Smith
In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels.
no code implementations • 17 Jul 2020 • Camille Noufi, Prateek Verma
We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks.
no code implementations • 14 Jul 2020 • Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi
Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network.
no code implementations • 10 Feb 2020 • Prateek Verma, Kenneth Salisbury
Daisy chaining high precision sound event detectors using signal processing combined with neural architectures and high dimensional clustering of unlabelled data is a vastly powerful idea, and can be explored in a variety of ways in future.
no code implementations • 15 Jul 2019 • Prateek Verma, Aliasgar Kutiyanawala, Ke Shen
Products in an ecommerce catalog contain information-rich fields like description and bullets that can be useful to extract entities (attributes) using NER based systems.
no code implementations • 23 Apr 2019 • Michelle Guo, Albert Haque, Prateek Verma
In this paper, we address the task of spoken language understanding.
no code implementations • 10 Apr 2019 • Prateek Verma, Chris Chafe, Jonathan Berger
We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural architecture.
1 code implementation • 20 Feb 2019 • Albert Haque, Michelle Guo, Prateek Verma, Li Fei-Fei
We propose spoken sentence embeddings which capture both acoustic and linguistic content.
no code implementations • 30 Mar 2018 • Albert Haque, Michelle Guo, Prateek Verma
We present an end-to-end method for transforming audio from one style to another.
2 code implementations • 4 Jan 2018 • Prateek Verma, Julius O. Smith
In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using back-propagation to optimize the sound to conform to filter-outputs from a pre-trained neural architecture of interest.
Sound Multimedia Audio and Speech Processing
no code implementations • TACL 2018 • Vinodkumar Prabhakaran, Camilla Griffiths, Hang Su, Prateek Verma, Nelson Morgan, Jennifer L. Eberhardt, Dan Jurafsky
We apply computational dialog methods to police body-worn camera footage to model conversations between police officers and community members in traffic stops.
no code implementations • Stanford 2015 • Sevy Harris, Prateek Verma
The goal of this project was to design an image processing algorithm that scans in sheet music and plays the music described on the page.