Search Results for author: Prateek Verma

Found 28 papers, 2 papers with code

Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis

no code implementations • 1 May 2024 • Prateek Verma, Minh-Hao Van, Xintao Wu

VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning.

Image Captioning Question Answering +2

Paper
Add Code

On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study

no code implementations • 21 Feb 2024 • Minh-Hao Van, Prateek Verma, Xintao Wu

Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.

Paper
Add Code

Diverse Neural Audio Embeddings -- Bringing Features back !

no code implementations • 15 Sep 2023 • Prateek Verma

With the advent of modern AI architectures, a shift has happened towards end-to-end architectures.

Audio Classification

Paper
Add Code

Neural Architectures Learning Fourier Transforms, Signal Processing and Much More....

no code implementations • 20 Aug 2023 • Prateek Verma

Further, we can also use the convolution operation with a signal to be learned from scratch, and we will explore papers in the literature that uses this with that robust Transformer architectures.

Audio Signal Processing

Paper
Add Code

Seasonality Based Reranking of E-commerce Autocomplete Using Natural Language Queries

no code implementations • 3 Aug 2023 • Prateek Verma, Shan Zhong, Xiaoyu Liu, Adithya Rajan

Query autocomplete (QAC) also known as typeahead, suggests list of complete queries as user types prefix in the search box.

Natural Language Queries

Paper
Add Code

Conformer LLMs -- Convolution Augmented Large Language Models

no code implementations • 2 Jul 2023 • Prateek Verma

This work aims to adapt these architectures in a causal setup for training LLMs.

Automatic Speech Recognition Language Modelling +2

Paper
Add Code

Developing Speech Processing Pipelines for Police Accountability

no code implementations • 9 Jun 2023 • Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky

We investigate the potential of large pre-trained speech models for facilitating reviews, focusing on ASR and officer speech detection in footage from traffic stops.

Paper
Add Code

Content Adaptive Front End For Audio Signal Processing

no code implementations • 18 Mar 2023 • Prateek Verma, Chris Chafe

In this work, we propose a way of computing a content-adaptive learnable time-frequency representation.

Audio Signal Processing Scene Understanding

Paper
Add Code

One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In Any Room/ Concert Hall

no code implementations • 27 Oct 2022 • Prateek Verma, Chris Chafe, Jonathan Berger

Typically, researchers use an excitation such as a pistol shot or balloon pop as an impulse signal with which an auralization can be created.

Paper
Add Code

Generating Coherent Drum Accompaniment With Fills And Improvisations

no code implementations • 1 Sep 2022 • Rishabh Dahale, Vaibhav Talwadker, Preeti Rao, Prateek Verma

We use the transformer sequence to sequence model to generate a basic drum pattern conditioned on the melodic accompaniment to find that improvisation is largely absent, attributed possibly to its expectedly relatively low representation in the training data.

Music Generation

Paper
Add Code

Enhancing Audio Perception of Music By AI Picked Room Acoustics

no code implementations • 16 Aug 2022 • Prateek Verma, Jonathan Berger

A convolutional architecture is first trained to take in an audio sample and mimic the ratings of experts with about 78 % accuracy for various instrument families and notes for perceptual qualities.

Paper
Add Code

A Language Model With Million Sample Context For Raw Audio Using Transformer Architectures

no code implementations • 16 Jun 2022 • Prateek Verma

Modeling long-term dependencies for audio signals is a particularly challenging problem, as even small-time scales yield on the order of a hundred thousand samples.

Language Modelling

Paper
Add Code

Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....

no code implementations • 7 Oct 2021 • Prateek Verma

A classification head (a feed-forward layer), similar to the approach in SimCLR is trained on a learned representation.

Decoder Representation Learning

Paper
Add Code

A Generative Model for Raw Audio Using Transformer Architectures

no code implementations • 30 Jun 2021 • Prateek Verma, Chris Chafe

We show how causal transformer generative models can be used for raw waveform synthesis.

Audio Synthesis

Paper
Add Code

Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions

no code implementations • 1 May 2021 • Prateek Verma, Jonathan Berger

In addition, we also show how multi-rate signal processing ideas inspired from wavelets, can be applied to the Transformer embeddings to improve the results.

Ranked #8 on Audio Classification on FSD50K

Audio Classification Unsupervised Pre-training

Paper
Add Code

Towards Human Haptic Gesture Interpretation for Robotic Systems

no code implementations • 3 Dec 2020 • Bibit Bianchini, Prateek Verma, Kenneth Salisbury

We demonstrate high classification accuracies among our proposed gesture definitions on a test set, emphasizing that neural net-work classifiers on the raw data outperform other combinations of feature sets and algorithms.

Classification General Classification

Paper
Add Code

A Framework for Generative and Contrastive Learning of Audio Representations

no code implementations • 22 Oct 2020 • Prateek Verma, Julius Smith

In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels.

Contrastive Learning

Paper
Add Code

Self-Supervised Learning of Context-Aware Pitch Prosody Representations

no code implementations • 17 Jul 2020 • Camille Noufi, Prateek Verma

We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks.

General Classification Information Retrieval +5

Paper
Add Code

A Deep Learning Approach for Low-Latency Packet Loss Concealment of Audio Signals in Networked Music Performance Applications

no code implementations • 14 Jul 2020 • Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi

Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications: it aims at revolutionizing the traditional concept of musical interaction by enabling remote musicians to interact and perform together through a telecommunication network.

Packet Loss Concealment

Paper
Add Code

Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space

no code implementations • 10 Feb 2020 • Prateek Verma, Kenneth Salisbury

Daisy chaining high precision sound event detectors using signal processing combined with neural architectures and high dimensional clustering of unlabelled data is a vastly powerful idea, and can be explored in a variety of ways in future.

Clustering

Paper
Add Code

Ranking sentences from product description & bullets for better search

no code implementations • 15 Jul 2019 • Prateek Verma, Aliasgar Kutiyanawala, Ke Shen

Products in an ecommerce catalog contain information-rich fields like description and bullets that can be useful to extract entities (attributes) using NER based systems.

Extractive Summarization NER +3

Paper
Add Code

End-to-End Spoken Language Translation

no code implementations • 23 Apr 2019 • Michelle Guo, Albert Haque, Prateek Verma

In this paper, we address the task of spoken language understanding.

Sentence Spoken Language Understanding +1

Paper
Add Code

Neuralogram: A Deep Neural Network Based Representation for Audio Signals

no code implementations • 10 Apr 2019 • Prateek Verma, Chris Chafe, Jonathan Berger

We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural architecture.

Music Recommendation

Paper
Add Code

Audio-Linguistic Embeddings for Spoken Sentences

1 code implementation • 20 Feb 2019 • Albert Haque, Michelle Guo, Prateek Verma, Li Fei-Fei

We propose spoken sentence embeddings which capture both acoustic and linguistic content.

Decoder Emotion Recognition +5

Paper
Code

Conditional End-to-End Audio Transforms

no code implementations • 30 Mar 2018 • Albert Haque, Michelle Guo, Prateek Verma

We present an end-to-end method for transforming audio from one style to another.

Paper
Add Code

Neural Style Transfer for Audio Spectograms

2 code implementations • 4 Jan 2018 • Prateek Verma, Julius O. Smith

In our work, we present a method for creating new sounds using a similar approach, treating it as a style-transfer problem, starting from a random-noise input signal and iteratively using back-propagation to optimize the sound to conform to filter-outputs from a pre-trained neural architecture of interest.

Sound Multimedia Audio and Speech Processing

Paper
Code

Detecting Institutional Dialog Acts in Police Traffic Stops

no code implementations • TACL 2018 • Vinodkumar Prabhakaran, Camilla Griffiths, Hang Su, Prateek Verma, Nelson Morgan, Jennifer L. Eberhardt, Dan Jurafsky

We apply computational dialog methods to police body-worn camera footage to model conversations between police officers and community members in traffic stops.

speech-recognition Speech Recognition

Paper
Add Code

Sheet Music Reader

no code implementations • Stanford 2015 • Sevy Harris, Prateek Verma

The goal of this project was to design an image processing algorithm that scans in sheet music and plays the music described on the page.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.