Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation.
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on Speech Recognition on CHiME6
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
Ranked #11 on Text-to-Image Generation on COCO
Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.
Ranked #1 on Question Answering on Natural Questions
To this end, we present SensorX2car, a calibration toolbox for the online calibration of sensor-to-car coordinate systems in road scenes.
Our experiments show that subword model performs best for Chinese-to-English translation with the vocabulary which is not so big while hybrid word-character model is most suitable for English-to-Chinese translation.
We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale.
Modern 3D semantic instance segmentation approaches predominantly rely on specialized voting mechanisms followed by carefully designed geometric clustering techniques.
Ranked #1 on 3D Instance Segmentation on ScanNet200