Saturn: An Optimized Data System for Large Model Deep Learning Workloads

knagrecha/saturn 3 Sep 2023

Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools.

Model Selection Scheduling

78
0.60 stars / hour

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

filapro/oneformer3d 24 Nov 2023

Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design.

3D Instance Segmentation 3D Object Detection +4

29
0.58 stars / hour

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

yule-buaa/mergelm 6 Nov 2023

Based on this observation, we further sparsify delta parameters of multiple SFT homologous models with DARE and subsequently merge them into a single model by parameter averaging.

GSM8K Instruction Following

308
0.55 stars / hour

Emergence of Segmentation with Minimalistic White-Box Transformers

ma-lab-berkeley/crate 30 Aug 2023

Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection.

Segmentation Self-Supervised Learning

793
0.55 stars / hour

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

vinthony/video-retalking 27 Nov 2022

Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism.

Video Editing Video Generation

3,983
0.52 stars / hour

PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

mbzuai-oryx/video-llava 22 Nov 2023

Extending image-based Large Multimodal Models (LMM) to videos is challenging due to the inherent complexity of video data.

Benchmarking Question Answering +1

84
0.49 stars / hour

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

ai-forever/kandinskyvideo 22 Nov 2023

The first stage concerns keyframes synthesis to figure the storyline of a video, while the second one is devoted to interpolation frames generation to make movements of the scene and objects smooth.

SSIM Text-to-Video Generation +1

88
0.48 stars / hour

Noise-injected Consistency Training and Entropy-constrained Pseudo Labeling for Semi-supervised Extractive Summarization

opensum/cpsum COLING 2022

Labeling large amounts of extractive summarization data is often prohibitive expensive due to time, financial, and expertise constraints, which poses great challenges to incorporating summarization system in practical applications.

Extractive Summarization

53
0.45 stars / hour

ProAgent: From Robotic Process Automation to Agentic Process Automation

openbmb/proagent 2 Nov 2023

Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents.

Decision Making

335
0.45 stars / hour

GLM-130B: An Open Bilingual Pre-trained Model

thudm/chatglm 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

Language Modelling Multi-task Language Understanding +1

6,122
0.42 stars / hour