Search Results for author: Kevin Zhang

Found 29 papers, 6 papers with code

Acoustic Neural 3D Reconstruction Under Pose Drift

no code implementations11 Mar 2025 Tianxiang Lin, Mohamad Qadri, Kevin Zhang, Adithya Pediredla, Christopher A. Metzler, Michael Kaess

We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses.

3D Reconstruction Pose Estimation

MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives

no code implementations7 Jan 2025 Wisdom O. Ikezogwo, Kevin Zhang, Mehmet Saygin Seyfioglu, Fatemeh Ghezloo, Linda Shapiro, Ranjay Krishna

We propose MedicalNarratives, a dataset curated from medical pedagogical videos similar in nature to data collected in Think-Aloud studies and inspired by Localized Narratives, which collects grounded image-text data by curating instructors' speech and mouse cursor movements synchronized in time.

Articles

SCBench: A Sports Commentary Benchmark for Video LLMs

no code implementations23 Dec 2024 Kuangzhi Ge, Lingjun Chen, Kevin Zhang, Yulin Luo, Tianyu Shi, Liaoyuan Fan, Xiang Li, Guanqun Wang, Shanghang Zhang

Inspired by these challenges, we propose a novel task: sports video commentary generation, developed $\textbf{SCBench}$ for Video LLMs.

Benchmarking

Aligning AI-driven discovery with human intuition

no code implementations9 Oct 2024 Kevin Zhang, Hod Lipson

We propose a new general principle for distilling representations that are naturally more aligned with human intuition, without relying on prior physical knowledge.

Using Deep Autoregressive Models as Causal Inference Engines

no code implementations27 Sep 2024 Daniel Jiwoong Im, Kevin Zhang, Nakul Verma, Kyunghyun Cho

We propose an autoregressive (AR) CI framework capable of handling complex confounders and sequential actions common in modern applications.

Causal Inference

CaBaGe: Data-Free Model Extraction using ClAss BAlanced Generator Ensemble

no code implementations16 Sep 2024 Jonathan Rosenthal, Shanchao Liang, Kevin Zhang, Lin Tan

Machine Learning as a Service (MLaaS) is often provided as a pay-per-query, black-box system to clients.

Model extraction

MR-MLLM: Mutual Reinforcement of Multimodal Comprehension and Vision Perception

no code implementations22 Jun 2024 Guanqun Wang, Xinyu Wei, Jiaming Liu, Ray Zhang, Yichi Zhang, Kevin Zhang, Maurice Chong, Shanghang Zhang

In recent years, multimodal large language models (MLLMs) have shown remarkable capabilities in tasks like visual question answering and common sense reasoning, while visual perception models have made significant strides in perception tasks, such as detection and segmentation.

Common Sense Reasoning Language Modelling +6

Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

no code implementations10 Apr 2024 Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

This paper presents Conformer-1, an end-to-end Automatic Speech Recognition (ASR) model trained on an extensive dataset of 570k hours of speech audio data, 91% of which was acquired from publicly available sources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion

1 code implementation6 Apr 2024 Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla

In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis.

3D geometry Autonomous Navigation +1

AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion

no code implementations5 Feb 2024 Mohamad Qadri, Kevin Zhang, Akshay Hinduja, Michael Kaess, Adithya Pediredla, Christopher A. Metzler

Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring.

3D Scene Reconstruction Neural Rendering +2

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations CVPR 2024 Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

A Scalable Training Strategy for Blind Multi-Distribution Noise Removal

no code implementations30 Oct 2023 Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler

Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape.

Active Learning Denoising

Seeing the World through Your Eyes

no code implementations CVPR 2024 Hadi AlZayer, Kevin Zhang, Brandon Feng, Christopher Metzler, Jia-Bin Huang

The reflective nature of the human eye is an underappreciated source of information about what the world around us looks like.

Machine learning reveals features of spinon Fermi surface

no code implementations5 Jun 2023 Kevin Zhang, Shi Feng, Yuri D. Lensky, Nandini Trivedi, Eun-Ah Kim

With rapid progress in simulation of strongly interacting quantum Hamiltonians, the challenge in characterizing unknown phases becomes a bottleneck for scientific progress.

G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer

no code implementations4 May 2023 Kevin Zhang, Vipul Mann, Venkat Venkatasubramanian

Additional analyses of G-MATT attention maps demonstrate the ability to retain chemistry knowledge without relying on excessively complex model architectures.

Retrosynthesis Single-step retrosynthesis

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

1 code implementation CVPR 2023 Anthony Chen, Kevin Zhang, Renrui Zhang, Zihan Wang, Yuheng Lu, Yandong Guo, Shanghang Zhang

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings.

3D Object Detection Decoder +3

T-SEA: Transfer-based Self-Ensemble Attack on Object Detection

1 code implementation CVPR 2023 Hao Huang, Ziyan Chen, Huanran Chen, Yongtao Wang, Kevin Zhang

Then, we analogize patch optimization with regular model optimization, proposing a series of self-ensemble approaches on the input data, the attacked model, and the adversarial patch to efficiently make use of the limited information and prevent the patch from overfitting.

Adversarial Attack Model Optimization +2

i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

2 code implementations20 Oct 2022 Kevin Zhang, Zhiqiang Shen

(2) Whether we can enhance the representations in the latent feature space by controlling the degree of semantics during sampling on Masked Autoencoders?

Image Reconstruction

MetaDIP: Accelerating Deep Image Prior with Meta Learning

no code implementations18 Sep 2022 Kevin Zhang, Mingyang Xie, Maharshi Gor, Yi-Ting Chen, Yvonne Zhou, Christopher A. Metzler

Deep image prior (DIP) is a recently proposed technique for solving imaging inverse problems by fitting the reconstructed images to the output of an untrained convolutional neural network.

Denoising Meta-Learning +1

Sequential Models in the Synthetic Data Vault

1 code implementation28 Jul 2022 Kevin Zhang, Neha Patki, Kalyan Veeramachaneni

After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN.

Generative Adversarial Network

Fomite transmission and disinfection strategies for SARS-CoV-2 and related viruses

no code implementations23 May 2020 Nicolas Castaño, Seth Cordts, Myra Kurosu Jalil, Kevin Zhang, Saisneha Koppaka, Alison Bick, Rajorshi Paul, Sindy KY Tang

Contaminated objects or surfaces, referred to as fomites, play a critical role in the spread of viruses, including SARS-CoV-2, the virus responsible for the COVID-19 pandemic.

Memory-efficient Learning for Large-scale Computational Imaging

no code implementations NeurIPS Workshop Deep_Invers 2019 Michael Kellman, Kevin Zhang, Jon Tamir, Emrah Bostan, Michael Lustig, Laura Waller

Critical aspects of computational imaging systems, such as experimental design and image priors, can be optimized through deep networks formed by the unrolled iterations of classical model-based reconstructions (termed physics-based networks).

compressed sensing Experimental Design +1

Leveraging Multimodal Haptic Sensory Data for Robust Cutting

no code implementations27 Sep 2019 Kevin Zhang, Mohit Sharma, Manuela Veloso, Oliver Kroemer

In this paper, we propose using vibrations and force-torque feedback from the interactions to adapt the slicing motions and monitor for contact events.

Cannot find the paper you are looking for? You can Submit a new open access paper.