Search Results for author: Jason Li

Found 20 papers, 11 papers with code

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

1 code implementation • 13 Oct 2023 • Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg

We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

9,988

Paper
Code

Exploring Embeddings for Measuring Text Relatedness: Unveiling Sentiments and Relationships in Online Comments

no code implementations • 15 Sep 2023 • Anthony Olakangil, Cindy Wang, Justin Nguyen, Qunbo Zhou, Kaavya Jethwa, Jason Li, Aryan Narendra, Nishk Patel, Arjun Rajaram

This paper investigates sentiment and semantic relationships among comments across various social media platforms, as well as discusses the importance of shared opinions across these different media platforms, using word embeddings to analyze components in sentences and documents.

Word Embeddings

Paper
Add Code

AstroLLaMA: Towards Specialized Foundation Models in Astronomy

no code implementations • 12 Sep 2023 • Tuan Dung Nguyen, Yuan-Sen Ting, Ioana Ciucă, Charlie O'Neill, Ze-Chang Sun, Maja Jabłońska, Sandor Kruk, Ernest Perkowski, Jack Miller, Jason Li, Josh Peek, Kartheik Iyer, Tomasz Różański, Pranav Khetarpal, Sharaf Zaman, David Brodrick, Sergio J. Rodríguez Méndez, Thang Bui, Alyssa Goodman, Alberto Accomazzi, Jill Naiman, Jesse Cranney, Kevin Schawinski, UniverseTBD

Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy.

Astronomy Causal Language Modeling +2

Paper
Add Code

zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training

1 code implementation • 30 Jul 2023 • Haochen Sun, Tonghe Bai, Jason Li, Hongyang Zhang

In response to this challenge, we present zero-knowledge deep learning (zkDL), an efficient zero-knowledge proof for deep learning training.

Paper
Code

NutritionVerse-Thin: An Optimized Strategy for Enabling Improved Rendering of 3D Thin Food Models

no code implementations • 12 Apr 2023 • Chi-en Amy Tai, Jason Li, Sriram Kumar, Saeejith Nair, Yuhao Chen, Pengcheng Xi, Alexander Wong

With the growth in capabilities of generative models, there has been growing interest in using photo-realistic renders of common 3D food items to improve downstream tasks such as food printing, nutrition prediction, or management of food wastage.

Management Nutrition

Paper
Add Code

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations

no code implementations • 16 Feb 2023 • Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg

In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning.

Self-Supervised Learning Speaker Verification +1

Paper
Add Code

Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task

1 code implementation • 20 Dec 2022 • Jason Li, Nicholas Watters, Yingting, Wang, Hansem Sohn, Mehrdad Jazayeri

This not only provides a generative model of eye movements in this task but also suggests a computational theory for how humans solve the task, namely that humans use mental simulation.

Paper
Code

Adapting TTS models For New Speakers using Transfer Learning

no code implementations • 12 Oct 2021 • Paarth Neekhara, Jason Li, Boris Ginsburg

We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, using only a few minutes of speech data.

Transfer Learning Voice Cloning

Paper
Add Code

Offensive Language and Hate Speech Detection with Deep Learning and Transfer Learning

no code implementations • 6 Aug 2021 • Bencheng Wei, Jason Li, Ajay Gupta, Hafiza Umair, Atsu Vovor, Natalie Durzynski

Differentiating if a text message belongs to hate speech and offensive language is a key challenge in automatic detection of toxic text content.

Data Augmentation Hate Speech Detection +4

Paper
Add Code

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

no code implementations • 7 Sep 2020 • Jin Cao, Yibo Zhao, Linjun Zhang, Jason Li

The key to our approach is a computationally lightweight forward addition algorithm that we developed to recursively extract the functional dependencies between table columns that are scalable to tables with many columns.

Paper
Add Code

Cycle Text-To-Image GAN with BERT

4 code implementations • 26 Mar 2020 • Trevor Tsue, Samir Sen, Jason Li

We explore novel approaches to the task of image generation from their respective captions, building on state-of-the-art GAN architectures.

Image Generation Word Embeddings

Paper
Code

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

2 code implementations • 15 Feb 2020 • Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Jason Li, Taroon Bharti, Ming Zhou

However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks.

Ranked #2 on Action Segmentation on COIN (using extra training data)

Action Segmentation Language Modelling +2

327

Paper
Code

Training Deep Networks with Stochastic Gradient Normalized by Layerwise Adaptive Second Moments

no code implementations • ICLR 2020 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

General Classification Image Classification +5

Paper
Add Code

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

4 code implementations • 26 Oct 2019 • Rafael Valle, Jason Li, Ryan Prenger, Bryan Catanzaro

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.

Style Transfer

847

Paper
Code

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

15 code implementations • 22 Oct 2019 • Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang

We propose a new end-to-end neural acoustic model for automatic speech recognition.

Ranked #33 on Speech Recognition on LibriSpeech test-clean

Speech Recognition Audio and Speech Processing

9,988

Paper
Code

NeMo: a toolkit for building AI applications using Neural Modules

1 code implementation • 14 Sep 2019 • Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen

NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition.

Ranked #1 on Speech Recognition on Common Voice Spanish (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

9,988

Paper
Code

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

3 code implementations • 27 May 2019 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

General Classification speech-recognition +2

1,534

Paper
Code

Jasper: An End-to-End Convolutional Neural Acoustic Model

10 code implementations • 5 Apr 2019 • Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.

Ranked #3 on Speech Recognition on Hub5'00 SwitchBoard

Language Modelling Speech Recognition

2,917

Paper
Code

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

no code implementations • 2 Nov 2018 • Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin

Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

3 code implementations • 25 May 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,534

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.