The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad.
Generative models, e. g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts.
We show that SAMMO generalizes previous methods and improves the performance of complex prompts on (1) instruction tuning, (2) RAG pipeline tuning, and (3) prompt compression, across several different LLMs.
Next, we employ TFB to perform a thorough evaluation of 21 Univariate Time Series Forecasting (UTSF) methods on 8, 068 univariate time series and 14 Multivariate Time Series Forecasting (MTSF) methods on 25 datasets.
In this study, we propose AniPortrait, a novel framework for generating high-quality animation driven by audio and a reference portrait image.
Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.
It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval, which provides a unified model foundation for real-world IR applications.
Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in mathematical language to rigorous derivations in formal systems.
To this end, we propose a novel filter-based VINS framework named SchurVINS, which could guarantee both high accuracy by building a complete residual model and low computational complexity with Schur complement.