Law of the Weakest Link: Cross Capabilities of Large Language Models

facebookresearch/llm-cross-capabilities 30 Sep 2024

The development and evaluation of Large Language Models (LLMs) have largely focused on individual capabilities.

28
0.33 stars / hour

Archon: An Architecture Search Framework for Inference-Time Techniques

scalingintelligence/archon 23 Sep 2024

Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space.

Hyperparameter Optimization Instruction Following +2

60
0.32 stars / hour

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

qwenlm/qwen2-vl 18 Sep 2024

We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing.

Temporal Relation Extraction Visual Question Answering

2,462
0.31 stars / hour

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

time-moe/time-moe 24 Sep 2024

However, despite the success of large-scale pre-training in language and vision domains, pre-trained time series models remain limited in scale and operate at a high cost, hindering the development of larger capable forecasting models in real-world applications.

Computational Efficiency Time Series +1

142
0.31 stars / hour

KISS-Matcher: Fast and Robust Point Cloud Registration Revisited

mit-spark/kiss-matcher 23 Sep 2024

While global point cloud registration systems have advanced significantly in all aspects, many studies have focused on specific components, such as feature extraction, graph-theoretic pruning, or pose solvers.

Point Cloud Registration

116
0.30 stars / hour

Augmented Dual-Contrastive Aggregation Learning for Unsupervised Visible-Infrared Person Re-Identification

yangbincv/ADCA ACM MM 2022

Visible infrared person re-identification (VI-ReID) aims at searching out the corresponding infrared (visible) images from a gallery set captured by other spectrum cameras.

Contrastive Learning Person Re-Identification

50
0.30 stars / hour

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

oryx-mllm/oryx 19 Sep 2024

Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours.

document understanding Video Question Answering

240
0.27 stars / hour

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

redaigc/storymaker 19 Sep 2024

However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative.

Personalized Image Generation Text-to-Image Generation

474
0.27 stars / hour

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

fudan-generative-vision/champ 21 Mar 2024

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

4,328
0.27 stars / hour

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

gair-nlp/prox 25 Sep 2024

Large language model pre-training has traditionally relied on human experts to craft heuristics for improving the corpora quality, resulting in numerous rules developed to date.

Large Language Model

132
0.27 stars / hour