By leveraging a multi-band decomposition of the raw waveform, we show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU.
We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos.
To this end, we present SensorX2car, a calibration toolbox for the online calibration of sensor-to-car coordinate systems in road scenes.
To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.
Ranked #1 on
Program Synthesis
on HumanEval
We propose WOFT -- a novel method for planar object tracking that estimates a full 8 degrees-of-freedom pose, i. e. the homography w. r. t.
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on
Speech Recognition
on CHiME6
For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example that aggregates the available annotations; (2) A confidence score for how likely each consensus label is correct; (3) A rating for each annotator quantifying the overall correctness of their labels.
Pre-trained language models (PLMs) have shown their effectiveness in multiple scenarios.
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more.
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience.