In this paper we propose to study generalization of neural networks on small algorithmically generated datasets.
Therefore, we release the first open-access decompilation LLMs ranging from 1B to 33B pre-trained on 4 billion tokens of C source code and the corresponding assembly code.
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.
Despite recent advances in image-to-video generation, better controllability and local animation are less explored.
We introduce a new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture that improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968.
The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.
Ranked #23 on Visual Question Answering on MM-Vet
Due to its simple design, this paradigm holds promise for narrowing the architectural gap between vision and language.
The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.
We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations.
Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.