no code implementations • 18 Sep 2023 • Hsuan Su, Ting-yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel
In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 13 Jun 2023 • Haoping Bai, Shancong Mou, Tatiana Likhomanenko, Ramazan Gokberk Cinbis, Oncel Tuzel, Ping Huang, Jiulong Shan, Jianjun Shi, Meng Cao
We introduce the VISION Datasets, a diverse collection of 14 industrial inspection datasets, uniquely poised to meet these challenges.
no code implementations • CVPR 2023 • Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel
Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray.
no code implementations • CVPR 2023 • Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, Oncel Tuzel
We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation.
no code implementations • 27 Mar 2023 • Karren Yang, Ting-yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases?
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
4 code implementations • ICCV 2023 • Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan
To this end, we introduce a novel token mixing operator, RepMixer, a building block of FastViT, that uses structural reparameterization to lower the memory access cost by removing skip-connections in the network.
1 code implementation • ICCV 2023 • Fartash Faghri, Hadi Pouransari, Sachin Mehta, Mehrdad Farajtabar, Ali Farhadi, Mohammad Rastegari, Oncel Tuzel
Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3. 4% improved accuracy.
1 code implementation • 8 Mar 2023 • Florian Jaeckle, Fartash Faghri, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
The task of retrieving the most similar data from a gallery set to a given query data is performed through a similarity comparison on features.
1 code implementation • 20 Dec 2022 • Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari
To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations.
no code implementations • 24 Oct 2022 • Mohammad Samragh, Arnav Kundu, Ting-yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik
This paper explores the possibility of using visual object detection techniques for word localization in speech data.
no code implementations • 8 Oct 2022 • Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel, Fartash Faghri
Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets.
7 code implementations • CVPR 2023 • Pavan Kumar Anasosalu Vasu, James Gabriel, Jeff Zhu, Oncel Tuzel, Anurag Ranjan
Furthermore, we show that our model generalizes to multiple tasks - image classification, object detection, and semantic segmentation with significant improvements in latency and accuracy as compared to existing efficient architectures when deployed on a mobile device.
Ranked #544 on
Image Classification
on ImageNet
1 code implementation • 23 Mar 2022 • Wei Jiang, Kwang Moo Yi, Golnoosh Samei, Oncel Tuzel, Anurag Ranjan
Photorealistic rendering and reposing of humans is important for enabling augmented reality experiences.
1 code implementation • CVPR 2022 • Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model.
no code implementations • 21 Oct 2021 • Ting-yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel
With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 13 Oct 2021 • Jen-Hao Rick Chang, Martin Bresler, Youssouf Chherawala, Adrien Delaye, Thomas Deselaers, Ryan Dixon, Oncel Tuzel
We use the framework to optimize data synthesis and demonstrate significant improvement on handwriting recognition over a model trained on real data only.
no code implementations • 8 Oct 2021 • Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel
Token Pooling is a simple and effective operator that can benefit many architectures.
no code implementations • 6 Oct 2021 • Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel
However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during inference.
no code implementations • 11 Jun 2021 • Pavan Kumar Anasosalu Vasu, Shreyas Saxena, Oncel Tuzel
When applied to datasets where one or more tasks can have noisy annotations, the proposed method learns to prioritize learning from clean labels for a given task, e. g. reducing surface estimation errors by up to 60%.
no code implementations • 2 Nov 2020 • Ting-yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel
Our policy adapts the augmentation parameters based on the training loss of the data samples.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 2 Nov 2020 • Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel
The DNN, in prior methods, is trained independent of the HMM parameters to minimize the cross-entropy loss between the predicted and the ground-truth state probabilities.
Ranked #2 on
Keyword Spotting
on hey Siri
no code implementations • 30 Jun 2020 • Hadi Pouransari, Mojan Javaheripi, Vinay Sharma, Oncel Tuzel
We propose extracurricular learning, a novel knowledge distillation method, that bridges this gap by (1) modeling student and teacher output distributions; (2) sampling examples from an approximation to the underlying data distribution; and (3) matching student and teacher output distributions over this extended set including uncertain samples.
1 code implementation • 30 Jun 2020 • Joseph Y. Cheng, Hanlin Goh, Kaan Dogrusoz, Oncel Tuzel, Erdrin Azemi
Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100).
no code implementations • 9 Mar 2020 • Ting-yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir
We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i. e., no style annotation, such as speaker information, is required.
1 code implementation • 9 Jan 2020 • Hadi Pouransari, Zhucheng Tu, Oncel Tuzel
We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.
1 code implementation • NeurIPS 2019 • Shreyas Saxena, Oncel Tuzel, Dennis Decoste
To the best of our knowledge, our work is the first curriculum learning method to show gains on large scale image classification and detection tasks.
no code implementations • 25 Sep 2019 • Hadi Pouransari, Oncel Tuzel
We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed optimal quantization algorithms.
1 code implementation • 2 Apr 2019 • Vishwanath A. Sindagi, Yin Zhou, Oncel Tuzel
Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data.
Ranked #5 on
3D Object Detection
on DAIR-V2X-I
1 code implementation • 7 Dec 2018 • Saurabh Adya, Vinay Palakkode, Oncel Tuzel
In this work, we propose and evaluate the stochastic preconditioned nonlinear conjugate gradient algorithm for large scale DNN training tasks.
no code implementations • 19 Feb 2018 • Seyed-Mohsen Moosavi-Dezfooli, Ashish Shrivastava, Oncel Tuzel
Improving the robustness of neural networks against these attacks is important, especially for security-critical applications.
45 code implementations • CVPR 2018 • Yin Zhou, Oncel Tuzel
Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.
Ranked #1 on
Object Localization
on KITTI Cars Hard
no code implementations • 6 Feb 2017 • Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-Massoud Farahmand
We propose augmenting deep neural networks with an attention mechanism for the visual object detection task.
9 code implementations • CVPR 2017 • Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, Russ Webb
With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations.
Ranked #3 on
Image-to-Image Translation
on Cityscapes Labels-to-Photo
(Per-class Accuracy metric)
4 code implementations • NeurIPS 2016 • Ming-Yu Liu, Oncel Tuzel
We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images.
Ranked #3 on
Image-to-Image Translation
on Cityscapes Labels-to-Photo
(Class IOU metric)
no code implementations • CVPR 2016 • Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu, Rama Chellapa
In contrast to the existing approaches that use discrete Conditional Random Field (CRF) models, we propose to use a Gaussian CRF model for the task of semantic segmentation.
no code implementations • CVPR 2016 • Bharat Singh, Tim K. Marks, Michael Jones, Oncel Tuzel, Ming Shao
We present a multi-stream bi-directional recurrent neural network for fine-grained action detection.
Action Recognition In Videos
Fine-Grained Action Detection
+2
no code implementations • 23 Mar 2016 • Oncel Tuzel, Yuichi Taguchi, John R. Hershey
In our deep network architecture the global and local constraints that define a face can be efficiently modeled and learned end-to-end using training data.
no code implementations • 13 Nov 2015 • Oncel Tuzel, Tim K. Marks, Salil Tambe
Face alignment is particularly challenging when there are large variations in pose (in-plane and out-of-plane rotations) and facial expression.
no code implementations • CVPR 2016 • Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu
We propose a novel deep network architecture for image\\ denoising based on a Gaussian Conditional Random Field (GCRF) model.
no code implementations • 15 Jun 2015 • Ming-Yu Liu, Shuoxin Lin, Srikumar Ramalingam, Oncel Tuzel
We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving.
no code implementations • CVPR 2015 • Abhishek Sharma, Oncel Tuzel, David W. Jacobs
We propose to tackle this problem by including the classification loss of the internal nodes of the random parse trees in the original RCPN loss function.
no code implementations • 28 Feb 2015 • Chinmay Hegde, Oncel Tuzel, Fatih Porikli
1) For the edge layer, we use a nonparametric approach by constructing a dictionary of patches from a given image, and synthesize edge regions in a higher-resolution version of the image.
no code implementations • NeurIPS 2014 • Abhishek Sharma, Oncel Tuzel, Ming-Yu Liu
Then a top-down propagation of the aggregated information takes place that enhances the contextual information of each local feature.
no code implementations • CVPR 2013 • Ming-Yu Liu, Oncel Tuzel, Yuichi Taguchi
We propose an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image.