We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.
This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task.
Ranked #1 on Speech Enhancement on VoiceBank + DEMAND
In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.
To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.
We first construct the Feedback Collection, a new dataset that consists of 1K fine-grained score rubrics, 20K instructions, and 100K responses and language feedback generated by GPT-4.
We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.
Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.
Ranked #6 on Visual Question Answering on MM-Vet
Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.
Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)
We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.
However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.