Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

microsoft/monitors4codegen NeurIPS 2023

We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.

153
0.40 stars / hour

An Investigation of Incorporating Mamba for Speech Enhancement

roychao19477/semamba 10 May 2024

This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task.

Speech Enhancement

29
0.40 stars / hour

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

alibaba-damo-academy/FunASR 28 Nov 2021

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,749
0.40 stars / hour

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,750
0.40 stars / hour

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

stanford-oval/storm 12 Oct 2023

We first construct the Feedback Collection, a new dataset that consists of 1K fine-grained score rubrics, 20K instructions, and 100K responses and language feedback generated by GPT-4.

Language Modelling Large Language Model

4,323
0.40 stars / hour

UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,711
0.39 stars / hour

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

4k Language Modelling +3

2,038
0.38 stars / hour

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

shi-labs/cumo 9 May 2024

Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks.

 Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

Image Captioning visual instruction following +1

70
0.38 stars / hour

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

vikparuchuri/marker 11 Jan 2021

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

Language Modelling Question Answering

8,745
0.37 stars / hour

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

thudm/inf-dit 7 May 2024

However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.

Image Generation Super-Resolution

125
0.37 stars / hour