Trending Research

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

robertvacareanu/llm4regression • 11 Apr 2024

We analyze how well pre-trained large language models (e. g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates.

Language Modelling Large Language Model +1

1.10 stars / hour

Paper
Code

Vision-Language Pre-Training for Boosting Scene Text Detectors

AlibabaResearch/AdvancedLiterateMachinery • • CVPR 2022

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

862

1.00 stars / hour

Paper
Code

Levenshtein OCR

AlibabaResearch/AdvancedLiterateMachinery • • 8 Sep 2022

A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.

Imitation Learning Optical Character Recognition (OCR) +1

864

0.99 stars / hour

Paper
Code

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

alibabaresearch/advancedliteratemachinery • • 8 Apr 2024

The core of LayoutLLM is a layout instruction tuning strategy, which is specially designed to enhance the comprehension and utilization of document layouts.

document understanding

865

0.99 stars / hour

Paper
Code

OmniFusion Technical Report

airi-institute/omnifusion • • 9 Apr 2024

We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality.

Ranked #36 on Visual Question Answering on MM-Vet

Visual Question Answering

158

0.96 stars / hour

Paper
Code

ReFT: Representation Finetuning for Language Models

stanfordnlp/pyreft • • 4 Apr 2024

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

506

0.82 stars / hour

Paper
Code

Video-Based Human Pose Regression via Decoupled Space-Time Aggregation

zgspose/dsta • • 29 Mar 2024

In light of this, we propose a novel Decoupled Space-Time Aggregation network (DSTA) to separately capture the spatial contexts between adjacent joints and the temporal cues of each individual joint, thereby avoiding the conflation of spatiotemporal dimensions.

Pose Estimation regression

0.73 stars / hour

Paper
Code

Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation

sail-sg/clot • • 5 Dec 2023

To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus is suitable for LoT study.

Logical Reasoning

190

0.73 stars / hour

Paper
Code

GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

wuuu3511/gomvs • • 11 Apr 2024

More specifically, we correspond and propagate adjacent costs to the reference pixel by leveraging the local geometric smoothness in conjunction with surface normals.

0.62 stars / hour

Paper
Code

Hash3D: Training-free Acceleration for 3D Generation

Adamdad/hash3D • • 9 Apr 2024

The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models.

3D Generation Image to 3D +1

0.61 stars / hour

Paper
Code