The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases.
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on Speech Recognition on CHiME6
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.
The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences.
Ranked #1 on Long-range modeling on LRA
Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency.
Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text.