We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.
Ranked #30 on Question Answering on TriviaQA
To overcome these limitations, we introduce StreamingT2V, an autoregressive approach for long video generation of 80, 240, 600, 1200 or more frames with smooth transitions.
Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding.
This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i. e. a minimap).
Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization.
We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.
To address this, we propose OneChart: a reliable agent specifically devised for the structural extraction of chart information.
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on Speech Recognition on Common Voice Italian (using extra training data)
For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information.
Ranked #1 on 2D Semantic Segmentation on xBD
TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%.
Multivariate Time Series Forecasting Representation Learning +2