Continual Learning of Large Language Models: A Comprehensive Survey

wang-ml-lab/llm-continual-learning-survey 25 Apr 2024

In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL.

Continual Learning

73
0.64 stars / hour

Dynamic Generation of Personalities with Large Language Models

hiyouga/llama-factory 10 Apr 2024

We propose a new metric to assess personality generation capability based on this evaluation method.

GPT-4 Personality Generation

20,645
0.62 stars / hour

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

20,657
0.61 stars / hour

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

qinghew/CharacterFactory 24 Apr 2024

In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models.

Consistent Character Generation Word Embeddings

78
0.59 stars / hour

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

3dtopia/lgm 7 Feb 2024

2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models.

1,268
0.58 stars / hour

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

yvanyin/metric3d Under review for Transaction 2024

For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training.

 Ranked #1 on Surface Normals Estimation on NYU Depth v2 (using extra training data)

Depth Estimation Surface Normal Estimation +1

708
0.56 stars / hour

Retrieval Head Mechanistically Explains Long-Context Factuality

nightdessert/retrieval_head 24 Apr 2024

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context.

Continual Pretraining Hallucination +3

85
0.56 stars / hour

Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels

baofff/U-ViT NeurIPS 2023

In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models.

Classification

734
0.53 stars / hour

Learning Visuotactile Skills with Two Multifingered Hands

ToruOwO/hato 25 Apr 2024

Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing.

57
0.50 stars / hour

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

zzxslp/som-llava 25 Apr 2024

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Visual Grounding Visual Question Answering +1

56
0.49 stars / hour