On the Measure of Intelligence

fchollet/ARC 5 Nov 2019

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans.


Mercury: A Code Efficiency Benchmark for Code Large Language Models

elfsong/mercury 12 Feb 2024

Based on the distribution, we introduce a new metric Beyond, which computes a runtime-percentile-weighted Pass score to reflect functional correctness and code efficiency simultaneously.

Code Generation Computational Efficiency

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

whu-sigma/hypersigma 17 Jun 2024

To tackle the spectral and spatial redundancy challenges in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA.

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

megvii-research/megfaceanimate 31 May 2024

Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research.

Style Transfer Synthetic Data Generation

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

foundationvision/omnitokenizer 13 Jun 2024

To exploit the complementary nature of image and video data, we further propose a progressive training strategy, where OmniTokenizer is first trained on image data on a fixed resolution to develop the spatial encoding capacity and then jointly trained on image and video data on multiple resolutions to learn the temporal dynamics.

Language Modelling

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

gojasper/flash-diffusion 4 Jun 2024

In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the generation of pre-trained diffusion models: Flash Diffusion.

Face Swapping Image Inpainting +1

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

om-ai-lab/OmDet 11 Mar 2024

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.

Open Vocabulary Object Detection Real-Time Object Detection

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

gair-nlp/olympicarena 18 Jun 2024

We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions.


X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design

ericlbuehler/mistral.rs 11 Feb 2024

Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks.

graph construction Knowledge Graphs +2

Unveiling Encoder-Free Vision-Language Models

baaivision/eve 17 Jun 2024

Training pure VLMs that accept the seamless vision and language inputs, i. e., without vision encoders, remains challenging and rarely explored.

Decoder Inductive Bias +1

