Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

voidism/lookback-lens 9 Jul 2024

We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model.


Simplifying Deep Temporal Difference Learning

mttga/purejaxql 5 Jul 2024

Our key theoretical result demonstrates for the first time that regularisation techniques such as LayerNorm can yield provably convergent TD algorithms without the need for a target network, even with off-policy data.

Q-Learning Reinforcement Learning (RL)

Gradient Boosting Reinforcement Learning

nvlabs/gbrl 11 Jul 2024

GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.

reinforcement-learning Reinforcement Learning (RL)

Agentless: Demystifying LLM-based Software Engineering Agents

OpenAutoCoder/Agentless 1 Jul 2024

However, the complexity of these agent-based approaches, together with the limited abilities of current LLMs, raises the following question: Do we really have to employ complex autonomous software agents?

Program Repair

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

om-ai-lab/OmDet 11 Mar 2024

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.

Open Vocabulary Object Detection Real-Time Object Detection +1

AudioLCM: Text-to-Audio Generation with Latent Consistency Models

Text-to-Audio/AudioLCM 1 Jun 2024

To overcome the convergence issue inherent in LDMs with reduced sample iterations, we propose the Guided Latent Consistency Distillation with a multi-step Ordinary Differential Equation (ODE) solver.

Audio Generation Audio Synthesis

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

gair-nlp/anole 8 Jul 2024

Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling and generation.

multimodal generation Text Generation

A Single Transformer for Scalable Vision-Language Modeling

yangyi-chen/solo 8 Jul 2024

We present SOLO, a single transformer for Scalable visiOn-Language mOdeling.

Language Modelling Mathematical Reasoning

Decomposition Betters Tracking Everything Everywhere

qianduoduolr/decomotion 9 Jul 2024

DecoMotion explicitly decomposes video content into static scenes and dynamic objects, either of which uses a quasi-3D canonical volume to represent.

Motion Estimation Point Tracking

WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving

wayveai/wayve_scenes 11 Jul 2024

We present WayveScenes101, a dataset designed to help the community advance the state of the art in novel view synthesis that focuses on challenging driving scenes containing many dynamic and deformable elements with changing geometry and texture.

Autonomous Driving Benchmarking +1

