GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

5,886
7.86 stars / hour

Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws

wassimtenachi/physo 6 Mar 2023

Here we present $\Phi$-SO, a Physical Symbolic Optimization framework for recovering analytical symbolic expressions from physics data using deep reinforcement learning techniques by learning units constraints.

Symbolic Regression

1,067
3.23 stars / hour
10,890
3.09 stars / hour

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

chenyangqiqi/fatezero 16 Mar 2023

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Video Editing

245
2.75 stars / hour

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

microsoft/visual-chatgpt 8 Mar 2023

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

23,338
1.93 stars / hour

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

thu-ml/unidiffuser 12 Mar 2023

Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality.

Text-to-Image Generation

703
1.41 stars / hour

FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization

jiawei-yang/freenerf 13 Mar 2023

One is to regularize the frequency range of NeRF's inputs, while the other is to penalize the near-camera density fields.

Neural Rendering Novel View Synthesis

180
1.37 stars / hour

GLM-130B: An Open Bilingual Pre-trained Model

THUDM/GLM 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

Language Modelling Multi-task Language Understanding +1

890
1.16 stars / hour

DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

jiayuzou2020/diffbev 15 Mar 2023

Diffusion models naturally have the ability to denoise noisy samples to the ideal data, which motivates us to utilize the diffusion model to get a better BEV representation.

3D Object Detection Autonomous Driving +3

96
1.10 stars / hour

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

opengvlab/internimage 18 Nov 2022

The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset.

3D Object Detection

757
1.07 stars / hour