Data-centric Artificial Intelligence: A Survey

daochenzha/data-centric-ai 17 Mar 2023

Artificial Intelligence (AI) is making a profound impact in almost every domain.

ADAPT: Action-aware Driving Caption Transformer

jxbbb/adapt 1 Feb 2023

To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action.

Autonomous Driving Decision Making

Anti-DreamBooth: Protecting users from personalized text-to-image synthesis

vinairesearch/anti-dreambooth 27 Mar 2023

Despite the complicated formulation of DreamBooth and Diffusion-based text-to-image models, our methods effectively defend users from the malicious use of those models.

Image Generation

Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

dvlab-research/ref-npr 6 Dec 2022

We propose a ray registration process based on the stylized reference view to obtain pseudo-ray supervision in novel views.

Semantic correspondence

Masked Diffusion Transformer is a Strong Image Synthesizer

sail-sg/mdt 25 Mar 2023

Despite its success in image synthesis, we observe that diffusion probabilistic models (DPMs) often lack contextual reasoning ability to learn the relations among object parts in an image, leading to a slow learning process.

Image Generation

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

winfredy/sadtalker 22 Nov 2022

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Talking Head Generation

You Only Segment Once: Towards Real-Time Panoptic Segmentation

hujiecpp/yoso 26 Mar 2023

To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decoder for the panoptic kernel generation.

Panoptic Segmentation

OCR-free Document Understanding Transformer

clovaai/donut 30 Nov 2021

Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.

Optical Character Recognition (OCR)

GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

neuralgraphdatabases/awesome-logical-query 26 Mar 2023

Extending the idea of graph databases (graph DBs), NGDB consists of a Neural Graph Storage and a Neural Graph Engine.

Link Prediction Logical Reasoning +1

