Search Results

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

1 code implementation15 Nov 2024

The recently released model, Claude 3. 5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent.

MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

3 code implementations24 Oct 2023

We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.

Attention Is All You Need

590 code implementations NeurIPS 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.

Ranked #2 on Multimodal Machine Translation on Multi30K (BLUE (DE-EN) metric)

Abstractive Text Summarization All +12

MojiTalk: Generating Emotional Responses at Scale

2 code implementations ACL 2018

In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis.

Sentence

End-to-end Learning for GMI Optimized Geometric Constellation Shape

1 code implementation19 Jul 2019

Autoencoder-based geometric shaping is proposed that includes optimizing bit mappings.

MegaPortraits: One-shot Megapixel Neural Head Avatars

1 code implementation15 Jul 2022

In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i. e., when the appearance of the driving image is substantially different from the animated source image.

Chinese Spelling Correction as Rephrasing Language Model

2 code implementations17 Aug 2023

However, we note a critical flaw in the process of tagging one character to another, that the correction is excessively conditioned on the error.

Language Modeling Language Modelling +3

Does Chain-of-Thought Reasoning Help Mobile GUI Agent? An Empirical Study

1 code implementation21 Mar 2025

Reasoning capabilities have significantly improved the performance of vision-language models (VLMs) in domains such as mathematical problem-solving, coding, and visual question-answering.

Attribute Mathematical Problem-Solving +2

Do large language vision models understand 3D shapes?

1 code implementation14 Dec 2024

Mainly it seems that the models can easily identify the same object with a different orientation as well as matching identical 3D shapes of the same orientation but with different materials and textures.

Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records

1 code implementation20 Jan 2025

We explore the ability of two LLMs -- GPT-4o and Claude Sonnet 3. 5 -- to transcribe historical handwritten documents in a tabular format and compare their performance to traditional OCR/HTR systems: EasyOCR, Keras, Pytesseract, and TrOCR.

HTR Optical Character Recognition (OCR)