Search Results for author: Kevin Zhang

Found 21 papers, 5 papers with code

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

no code implementations10 Apr 2024 Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

no code implementations6 Apr 2024 Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla

In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis.

Autonomous Navigation Novel View Synthesis

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

no code implementations5 Feb 2024 Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler

Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring.

3D Scene Reconstruction Neural Rendering +2

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations26 Dec 2023 Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

A Scalable Training Strategy for Blind Multi-Distribution Noise Removal

no code implementations30 Oct 2023 Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler

Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape.

Active Learning Denoising

Seeing the World through Your Eyes

no code implementations15 Jun 2023 Hadi AlZayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang

The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like.

Machine learning reveals features of spinon Fermi surface

no code implementations5 Jun 2023 Kevin Zhang, Shi Feng, Yuri D. Lensky, Nandini Trivedi, Eun-Ah Kim

With rapid progress in simulation of strongly interacting quantum Hamiltonians, the challenge in characterizing unknown phases becomes a bottleneck for scientific progress.

G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer

no code implementations4 May 2023 Kevin Zhang, Vipul Mann, Venkat Venkatasubramanian

Additional analyses of G-MATT attention maps demonstrate the ability to retain chemistry knowledge without relying on excessively complex model architectures.

Retrosynthesis Single-step retrosynthesis

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

1 code implementation CVPR 2023 Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu, Yandong Guo, Shanghang Zhang

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings.

3D Object Detection Decoder +3

T-SEA: Transfer-based Self-Ensemble Attack on Object Detection

1 code implementation CVPR 2023 Hao Huang, Ziyan Chen, Huanran Chen, Yongtao Wang, Kevin Zhang

Then, we analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch to efficiently make use of the limited information and prevent the patch from overfitting.

Adversarial Attack Model Optimization +2

i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

2 code implementations20 Oct 2022 Kevin Zhang, Zhiqiang Shen

(2) Whether we can enhance the representations in the latent feature space by controlling the degree of semantics during sampling on Masked Autoencoders?

Image Reconstruction

MetaDIP: Accelerating Deep Image Prior with Meta Learning

no code implementations18 Sep 2022 Kevin Zhang, Mingyang Xie, Maharshi Gor, Yi-Ting Chen, Yvonne Zhou, Christopher A. Metzler

Deep image prior (DIP) is a recently proposed technique for solving imaging inverse problems by fitting the reconstructed images to the output of an untrained convolutional neural network.

Denoising Meta-Learning +1

Sequential Models in the Synthetic Data Vault

1 code implementation28 Jul 2022 Kevin Zhang, Neha Patki, Kalyan Veeramachaneni

After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN.

Generative Adversarial Network

Fomite transmission and disinfection strategies for SARS-CoV-2 and related viruses

no code implementations23 May 2020 Nicolas Castaño, Seth Cordts, Myra Kurosu Jalil, Kevin Zhang, Saisneha Koppaka, Alison Bick, Rajorshi Paul, Sindy KY Tang

Contaminated objects or surfaces, referred to as fomites, play a critical role in the spread of viruses, including SARS-CoV-2, the virus responsible for the COVID-19 pandemic.

Memory-efficient Learning for Large-scale Computational Imaging

no code implementations NeurIPS Workshop Deep_Invers 2019 Michael Kellman, Kevin Zhang, Jon Tamir, Emrah Bostan, Michael Lustig, Laura Waller

Critical aspects of computational imaging systems, such as experimental design and image priors, can be optimized through deep networks formed by the unrolled iterations of classical model-based reconstructions (termed physics-based networks).

Experimental Design Super-Resolution

Leveraging Multimodal Haptic Sensory Data for Robust Cutting

no code implementations27 Sep 2019 Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer

In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events.

Cannot find the paper you are looking for? You can Submit a new open access paper.