Search Results for author: Prateek Verma

Found 28 papers, 2 papers with code

Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis

no code implementations1 May 2024 Prateek Verma, Minh-Hao Van, Xintao Wu

VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning.

Image Captioning Question Answering +2

On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

no code implementations21 Feb 2024 Minh-Hao Van, Prateek Verma, Xintao Wu

Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.

Diverse Neural Audio Embeddings -- Bringing Features back !

no code implementations15 Sep 2023 Prateek Verma

With the advent of modern AI architectures, a shift has happened towards end-to-end architectures.

Audio Classification

Neural Architectures Learning Fourier Transforms, Signal Processing and Much More....

no code implementations20 Aug 2023 Prateek Verma

Further, we can also use the convolution operation with a signal to be learned from scratch, and we will explore papers in the literature that uses this with that robust Transformer architectures.

Audio Signal Processing

Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

no code implementations3 Aug 2023 Prateek Verma, Shan Zhong, Xiaoyu Liu, Adithya Rajan

Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box.

Natural Language Queries

Developing Speech Processing Pipelines for Police Accountability

no code implementations9 Jun 2023 Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky

We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops.

Content Adaptive Front End For Audio Signal Processing

no code implementations18 Mar 2023 Prateek Verma, Chris Chafe

In this work, we propose a way of computing a content-adaptive learnable time-frequency representation.

Audio Signal Processing Scene Understanding

One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In Any Room/ Concert Hall

no code implementations27 Oct 2022 Prateek Verma, Chris Chafe, Jonathan Berger

Typically, researchers use an excitation such as a pistol shot or balloon pop as an impulse signal with which an auralization can be created.

Generating Coherent Drum Accompaniment With Fills And Improvisations

no code implementations1 Sep 2022 Rishabh Dahale, Vaibhav Talwadker, Preeti Rao, Prateek Verma

We use the transformer sequence to sequence model to generate a basic drum pattern conditioned on the melodic accompaniment to find that improvisation is largely absent, attributed possibly to its expectedly relatively low representation in the training data.

Music Generation

Enhancing Audio Perception of Music By AI Picked Room Acoustics

no code implementations16 Aug 2022 Prateek Verma, Jonathan Berger

A convolutional architecture is first trained to take in an audio sample and mimic the ratings of experts with about 78 % accuracy for various instrument families and notes for perceptual qualities.

A Language Model With Million Sample Context For Raw Audio Using Transformer Architectures

no code implementations16 Jun 2022 Prateek Verma

Modeling long-term dependencies for audio signals is a particularly challenging problem, as even small-time scales yield on the order of a hundred thousand samples.

Language Modelling

A Generative Model for Raw Audio Using Transformer Architectures

no code implementations30 Jun 2021 Prateek Verma, Chris Chafe

We show how causal transformer generative models can be used for raw waveform synthesis.

Audio Synthesis

Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions

no code implementations1 May 2021 Prateek Verma, Jonathan Berger

In addition, we also show how multi-rate signal processing ideas inspired from wavelets, can be applied to the Transformer embeddings to improve the results.

Audio Classification Unsupervised Pre-training

Towards Human Haptic Gesture Interpretation for Robotic Systems

no code implementations3 Dec 2020 Bibit Bianchini, Prateek Verma, Kenneth Salisbury

We demonstrate high classification accuracies among our proposed gesture definitions on a test set, emphasizing that neural net-work classifiers on the raw data outperform other combinations of feature sets and algorithms.

Classification General Classification

A Framework for Generative and Contrastive Learning of Audio Representations

no code implementations22 Oct 2020 Prateek Verma, Julius Smith

In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels.

Contrastive Learning

Self-Supervised Learning of Context-Aware Pitch Prosody Representations

no code implementations17 Jul 2020 Camille Noufi, Prateek Verma

We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks.

General Classification Information Retrieval +5

A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

no code implementations14 Jul 2020 Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi

Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network.

Packet Loss Concealment

Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space

no code implementations10 Feb 2020 Prateek Verma, Kenneth Salisbury

Daisy chaining high precision sound event detectors using signal processing combined with neural architectures and high dimensional clustering of unlabelled data is a vastly powerful idea, and can be explored in a variety of ways in future.

Clustering

Ranking sentences from product description & bullets for better search

no code implementations15 Jul 2019 Prateek Verma, Aliasgar Kutiyanawala, Ke Shen

Products in an ecommerce catalog contain information-rich fields like description and bullets that can be useful to extract entities (attributes) using NER based systems.

Extractive Summarization NER +3

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

no code implementations10 Apr 2019 Prateek Verma, Chris Chafe, Jonathan Berger

We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural architecture.

Music Recommendation

Audio-Linguistic Embeddings for Spoken Sentences

1 code implementation20 Feb 2019 Albert Haque, Michelle Guo, Prateek Verma, Li Fei-Fei

We propose spoken sentence embeddings which capture both acoustic and linguistic content.

Decoder Emotion Recognition +5

Conditional End-to-End Audio Transforms

no code implementations30 Mar 2018 Albert Haque, Michelle Guo, Prateek Verma

We present an end-to-end method for transforming audio from one style to another.

Neural Style Transfer for Audio Spectograms

2 code implementations4 Jan 2018 Prateek Verma, Julius O. Smith

In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using back-propagation to optimize the sound to conform to filter-outputs from a pre-trained neural architecture of interest.

Sound Multimedia Audio and Speech Processing

Detecting Institutional Dialog Acts in Police Traffic Stops

no code implementations TACL 2018 Vinodkumar Prabhakaran, Camilla Griffiths, Hang Su, Prateek Verma, Nelson Morgan, Jennifer L. Eberhardt, Dan Jurafsky

We apply computational dialog methods to police body-worn camera footage to model conversations between police officers and community members in traffic stops.

speech-recognition Speech Recognition

Sheet Music Reader

no code implementations Stanford 2015 Sevy Harris, Prateek Verma

The goal of this project was to design an image processing algorithm that scans in sheet music and plays the music described on the page.

Cannot find the paper you are looking for? You can Submit a new open access paper.