Trending Research

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

microsoft/monitors4codegen • NeurIPS 2023

We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.

153

0.40 stars / hour

Paper
Code

An Investigation of Incorporating Mamba for Speech Enhancement

roychao19477/semamba • 10 May 2024

This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task.

Ranked #1 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

0.40 stars / hour

Paper
Code

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

alibaba-damo-academy/FunASR • • 28 Nov 2021

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,749

0.40 stars / hour

Paper
Code

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR • • 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,750

0.40 stars / hour

Paper
Code

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

stanford-oval/storm • 12 Oct 2023

We first construct the Feedback Collection, a new dataset that consists of 1K fine-grained score rubrics, 20K instructions, and 100K responses and language feedback generated by GPT-4.

Language Modelling Large Language Model

4,323

0.40 stars / hour

Paper
Code

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO • 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,711

0.39 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

2,038

0.38 stars / hour

Paper
Code

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

shi-labs/cumo • • 9 May 2024

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.

Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

Image Captioning visual instruction following +1

0.38 stars / hour

Paper
Code

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

vikparuchuri/marker • • 11 Jan 2021

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

Language Modelling Question Answering

8,745

0.37 stars / hour

Paper
Code

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

thudm/inf-dit • 7 May 2024

However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.

Image Generation Super-Resolution

125

0.37 stars / hour

Paper
Code