GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

6,023
3.21 stars / hour
11,871
2.31 stars / hour

Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws

wassimtenachi/physo 6 Mar 2023

Here we present $\Phi$-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints.

Symbolic Regression

1,114
1.86 stars / hour

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

chenyangqiqi/fatezero 16 Mar 2023

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Video Editing

283
1.78 stars / hour

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

microsoft/visual-chatgpt 8 Mar 2023

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

24,532
1.57 stars / hour

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

winfredy/sadtalker 22 Nov 2022

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Talking Head Generation

447
1.22 stars / hour

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

ganwanshui/simpleoccupancy 16 Mar 2023

Towards a more comprehensive perception of a 3D scene, in this paper, we propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.

3D Object Detection Autonomous Driving +2

36
1.13 stars / hour

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

thu-ml/unidiffuser 12 Mar 2023

Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality.

Text-to-Image Generation

715
1.07 stars / hour

GLM-130B: An Open Bilingual Pre-trained Model

THUDM/GLM 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

Language Modelling Multi-task Language Understanding +1

923
1.03 stars / hour

FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization

jiawei-yang/freenerf 13 Mar 2023

One is to regularize the frequency range of NeRF's inputs, while the other is to penalize the near-camera density fields.

Neural Rendering Novel View Synthesis

184
1.00 stars / hour