Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

sterzhang/image-textualization 11 Jun 2024

Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval.

Hallucination Image Retrieval +1

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

kvcache-ai/Mooncake 24 Jun 2024

Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs.

KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

pat-jj/KG-FIT 26 May 2024

Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery.

Informativeness Knowledge Graph Embedding +3

On the Measure of Intelligence

fchollet/ARC 5 Nov 2019

To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans.


4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

apple/ml-4m 13 Jun 2024

In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora.

Instance Segmentation multimodal generation +1

MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis

limuloo/migc 2 Jul 2024

Lastly, we introduced the Consistent-MIG algorithm to enhance the iterative MIG ability of MIGC and MIGC++.

Attribute Image Generation +1

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

imlixinyang/director3d 25 Jun 2024

To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions.

3D Generation Denoising +2

OpenStreetView-5M: The Many Roads to Global Visual Geolocation

gastruc/osv5m CVPR 2024

Determining the location of an image anywhere on Earth is a complex visual task, which makes it particularly relevant for evaluating computer vision algorithms.

Photo geolocation estimation

Adam-mini: Use Fewer Learning Rates To Gain More

zyushun/adam-mini 24 Jun 2024

We find that $\geq$ 90% of these learning rates in $v$ could be harmlessly removed if we (1) carefully partition the parameters into blocks following our proposed principle on Hessian structure; (2) assign a single but good learning rate to each parameter block.

TALENT: A Tabular Analytics and Learning Toolbox

qile2000/LAMDA-TALENT 4 Jul 2024

Tabular data is one of the most common data sources in machine learning.

