We propose a novel approach for building accurate maps of dynamic environments utilizing a sequence of LiDAR scans.
Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge.
As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.
Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.
To investigate these aspects, we create and publish a novel TQA evaluation benchmark in English.
Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data.
Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision.
How to eliminate pronominal reference in group chats?
Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research.
PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.
Dense Captioning Video-based Generative Performance Benchmarking +1