Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.
Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate high-quality instruction data for code.
TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios.
Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities.
We also introduce latent DiffiT which consists of transformer model with the proposed self-attention layers, for high-resolution image generation.
Ranked #2 on
Image Generation
on ImageNet 256x256
Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity.
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?
Automatic Speech Recognition
Speech-to-Speech Translation
+3
To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes.