Paper List
Return a paginated listing of all papers.
GET /api/v1/papers/?ordering=-title&q=Large+Language+Models
https://paperswithcode.com/api/v1/papers/?ordering=-title&page=2&q=Large+Language+Models", "previous": null, "results": [ { "id": "l-tune-harnessing-large-language-models-for", "arxiv_id": "2411.03500", "nips_id": null, "url_abs": "https://arxiv.org/abs/2411.03500v1", "url_pdf": "https://arxiv.org/pdf/2411.03500v1.pdf", "title": "λ-Tune: Harnessing Large Language Models for Automated Database System Tuning", "abstract": "We introduce {\\lambda}-Tune, a framework that leverages Large Language Models (LLMs) for automated database system tuning. The design of {\\lambda}-Tune is motivated by the capabilities of the latest generation of LLMs. Different from prior work, leveraging LLMs to extract tuning hints for single parameters, {\\lambda}-Tune generates entire configuration scripts, based on a large input document, describing the tuning context. {\\lambda}-Tune generates alternative configurations, using a principled approach to identify the best configuration, out of a small set of candidates. In doing so, it minimizes reconfiguration overheads and ensures that evaluation costs are bounded as a function of the optimal run time. By treating prompt generation as a cost-based optimization problem, {\\lambda}-Tune conveys the most relevant context to the LLM while bounding the number of input tokens and, therefore, monetary fees for LLM invocations. We compare {\\lambda}-Tune to various baselines, using multiple benchmarks and PostgreSQL and MySQL as target systems for tuning, showing that {\\lambda}-Tune is significantly more robust than prior approaches.", "authors": [], "published": "2024-11-05", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "l-eclipse-multi-concept-personalized-text-to", "arxiv_id": "2402.05195", "nips_id": null, "url_abs": "https://arxiv.org/abs/2402.05195v2", "url_pdf": "https://arxiv.org/pdf/2402.05195v2.pdf", "title": "$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space", "abstract": "Despite the recent advances in personalized text-to-image (P-T2I) generative models, it remains challenging to perform finetuning-free multi-subject-driven T2I in a resource-efficient manner. Predominantly, contemporary approaches, involving the training of Hypernetworks and Multimodal Large Language Models (MLLMs), require heavy computing resources that range from 600 to 12300 GPU hours of training. These subject-driven T2I methods hinge on Latent Diffusion Models (LDMs), which facilitate T2I mapping through cross-attention layers. While LDMs offer distinct advantages, P-T2I methods' reliance on the latent space of these diffusion models significantly escalates resource demands, leading to inconsistent results and necessitating numerous iterations for a single desired image. In this paper, we present $\\lambda$-ECLIPSE, an alternative prior-training strategy that works in the latent space of a pre-trained CLIP model without relying on the diffusion UNet models. $\\lambda$-ECLIPSE leverages the image-text interleaved pre-training for fast and effective multi-subject-driven P-T2I. Through extensive experiments, we establish that $\\lambda$-ECLIPSE surpasses existing baselines in composition alignment while preserving concept alignment performance, even with significantly lower resource utilization. $\\lambda$-ECLIPSE performs multi-subject driven P-T2I with just 34M parameters and is trained on a mere 74 GPU hours. Additionally, $\\lambda$-ECLIPSE demonstrates the unique ability to perform multi-concept interpolations.", "authors": [ "Yezhou Yang", "Chitta Baral", "Sangmin Jung", "Maitreya Patel" ], "published": "2024-02-07", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "g-mod-exploring-mixture-of-depth-adaptation", "arxiv_id": "2410.13859", "nips_id": null, "url_abs": "https://arxiv.org/abs/2410.13859v1", "url_pdf": "https://arxiv.org/pdf/2410.13859v1.pdf", "title": "$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models", "abstract": "Despite the significant progress in multimodal large language models (MLLMs), their high computational cost remains a barrier to real-world deployment. Inspired by the mixture of depths (MoDs) in natural language processing, we aim to address this limitation from the perspective of ``activated tokens''. Our key insight is that if most tokens are redundant for the layer computation, then can be skipped directly via the MoD layer. However, directly converting the dense layers of MLLMs to MoD layers leads to substantial performance degradation. To address this issue, we propose an innovative MoD adaptation strategy for existing MLLMs called $\\gamma$-MoD. In $\\gamma$-MoD, a novel metric is proposed to guide the deployment of MoDs in the MLLM, namely rank of attention maps (ARank). Through ARank, we can effectively identify which layer is redundant and should be replaced with the MoD layer. Based on ARank, we further propose two novel designs to maximize the computational sparsity of MLLM while maintaining its performance, namely shared vision-language router and masked routing learning. With these designs, more than 90% dense layers of the MLLM can be effectively converted to the MoD ones. To validate our method, we apply it to three popular MLLMs, and conduct extensive experiments on 9 benchmark datasets. Experimental results not only validate the significant efficiency benefit of $\\gamma$-MoD to existing MLLMs but also confirm its generalization ability on various MLLMs. For example, with a minor performance drop, i.e., -1.5%, $\\gamma$-MoD can reduce the training and inference time of LLaVA-HR by 31.0% and 53.2%, respectively.", "authors": [ "Rongrong Ji", "Zhiqiang Shen", "Xiaoshuai Sun", "Yiyi Zhou", "Jiayi Ji", "Gen Luo", "Yaxin Luo" ], "published": "2024-10-17", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "b-dpo-direct-preference-optimization-with", "arxiv_id": "2407.08639", "nips_id": null, "url_abs": "https://arxiv.org/abs/2407.08639v2", "url_pdf": "https://arxiv.org/pdf/2407.08639v2.pdf", "title": "$β$-DPO: Direct Preference Optimization with Dynamic $β$", "abstract": "Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $\\beta$, as well as to the quality of the preference data. We analyze the impact of $\\beta$ and data quality on DPO, uncovering that optimal $\\beta$ values vary with the informativeness of pairwise data. Addressing the limitations of static $\\beta$ values, we introduce a novel framework that dynamically calibrates $\\beta$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $\\beta$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $\\beta$ adjustment technique significantly improves DPO's performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \\url{https://github.com/junkangwu/beta-DPO}.", "authors": [ "Xiangnan He", "Xiang Wang", "Bolin Ding", "Jinyang Gao", "Jiancan Wu", "Zhengyi Yang", "Yuexiang Xie", "Junkang Wu" ], "published": "2024-07-11", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "a-dpo-adaptive-reward-margin-is-what-direct", "arxiv_id": "2410.10148", "nips_id": null, "url_abs": "https://arxiv.org/abs/2410.10148v3", "url_pdf": "https://arxiv.org/pdf/2410.10148v3.pdf", "title": "$α$-DPO: Adaptive Reward Margin is What Direct Preference Optimization Needs", "abstract": "Aligning large language models (LLMs) with human values and intentions is crucial for their utility, honesty, and safety. Reinforcement learning from human feedback (RLHF) is a popular approach to achieve this alignment, but it faces challenges in computational efficiency and training stability. Recent methods like Direct Preference Optimization (DPO) and Simple Preference Optimization (SimPO) have proposed offline alternatives to RLHF, simplifying the process by reparameterizing the reward function. However, DPO depends on a potentially suboptimal reference model, and SimPO's assumption of a fixed target reward margin may lead to suboptimal decisions in diverse data settings. In this work, we propose $\\alpha$-DPO, an adaptive preference optimization algorithm designed to address these limitations by introducing a dynamic reward margin. Specifically, $\\alpha$-DPO employs an adaptive preference distribution, balancing the policy model and the reference model to achieve personalized reward margins. We provide theoretical guarantees for $\\alpha$-DPO, demonstrating its effectiveness as a surrogate optimization objective and its ability to balance alignment and diversity through KL divergence control. Empirical evaluations on AlpacaEval 2 and Arena-Hard show that $\\alpha$-DPO consistently outperforms DPO and SimPO across various model settings, establishing it as a robust approach for fine-tuning LLMs. Our method achieves significant improvements in win rates, highlighting its potential as a powerful tool for LLM alignment. The code is available at https://github.com/junkangwu/alpha-DPO", "authors": [ "Xiangnan He", "Xiang Wang", "Bolin Ding", "Jinyang Gao", "Jiancan Wu", "Zhengyi Yang", "Xue Wang", "Junkang Wu" ], "published": "2024-10-14", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zzzgpt-an-interactive-gpt-approach-to-enhance", "arxiv_id": "2310.16242", "nips_id": null, "url_abs": "https://arxiv.org/abs/2310.16242v2", "url_pdf": "https://arxiv.org/pdf/2310.16242v2.pdf", "title": "ZzzGPT: An Interactive GPT Approach to Enhance Sleep Quality", "abstract": "This paper explores the intersection of technology and sleep pattern comprehension, presenting a cutting-edge two-stage framework that harnesses the power of Large Language Models (LLMs). The primary objective is to deliver precise sleep predictions paired with actionable feedback, addressing the limitations of existing solutions. This innovative approach involves leveraging the GLOBEM dataset alongside synthetic data generated by LLMs. The results highlight significant improvements, underlining the efficacy of merging advanced machine-learning techniques with a user-centric design ethos. Through this exploration, we bridge the gap between technological sophistication and user-friendly design, ensuring that our framework yields accurate predictions and translates them into actionable insights.", "authors": [ "Flora D. Salim", "Hao Xue", "Kaixin Ji", "Marwah Alaofi", "Hiruni Kegalle", "Thuc Hanh Nguyen", "Yonchanok Khaokaew" ], "published": "2023-10-24", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zyda-a-1-3t-dataset-for-open-language", "arxiv_id": "2406.01981", "nips_id": null, "url_abs": "https://arxiv.org/abs/2406.01981v2", "url_pdf": "https://arxiv.org/pdf/2406.01981v2.pdf", "title": "Zyda: A 1.3T Dataset for Open Language Modeling", "abstract": "The size of large language models (LLMs) has scaled dramatically in recent years and their computational and data requirements have surged correspondingly. State-of-the-art language models, even at relatively smaller sizes, typically require training on at least a trillion tokens. This rapid advancement has eclipsed the growth of open-source datasets available for large-scale LLM pretraining. In this paper, we introduce Zyda (Zyphra Dataset), a dataset under a permissive license comprising 1.3 trillion tokens, assembled by integrating several major respected open-source datasets into a single, high-quality corpus. We apply rigorous filtering and deduplication processes, both within and across datasets, to maintain and enhance the quality derived from the original datasets. Our evaluations show that Zyda not only competes favorably with other open datasets like Dolma, FineWeb, and RefinedWeb, but also substantially improves the performance of comparable models from the Pythia suite. Our rigorous data processing methods significantly enhance Zyda's effectiveness, outperforming even the best of its constituent datasets when used independently.", "authors": [ "Quentin Anthony", "James Whittington", "Adam Ibrahim", "Jonathan Pilault", "Paolo Glorioso", "Beren Millidge", "Yury Tokpanov" ], "published": "2024-06-04", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zsllmcode-an-effective-approach-for", "arxiv_id": "2409.14644", "nips_id": null, "url_abs": "https://arxiv.org/abs/2409.14644v1", "url_pdf": "https://arxiv.org/pdf/2409.14644v1.pdf", "title": "zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning", "abstract": "Regarding software engineering (SE) tasks, Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning, unlike pre-trained models (PTMs). However, LLMs are primarily designed for natural language output, and cannot directly produce intermediate embeddings from source code. They also face some challenges, for example, the restricted context length may prevent them from handling larger inputs, limiting their applicability to many SE tasks; while hallucinations may occur when LLMs are applied to complex downstream tasks. Motivated by the above facts, we propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs. Our approach utilizes LLMs to convert source code into concise summaries through zero-shot learning, which is then transformed into functional code embeddings using specialized embedding models. This unsupervised approach eliminates the need for training and addresses the issue of hallucinations encountered with LLMs. To the best of our knowledge, this is the first approach that combines LLMs and embedding models to generate code embeddings. We conducted experiments to evaluate the performance of our approach. The results demonstrate the effectiveness and superiority of our approach over state-of-the-art unsupervised methods.", "authors": [ "Zhenyu Chen", "Chunrong Fang", "Rubing Huang", "Chenhui Cui", "Zixiang Xian" ], "published": "2024-09-23", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zs4c-zero-shot-synthesis-of-compilable-code", "arxiv_id": "2401.14279", "nips_id": null, "url_abs": "https://arxiv.org/abs/2401.14279v3", "url_pdf": "https://arxiv.org/pdf/2401.14279v3.pdf", "title": "ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs", "abstract": "Technical Q&A sites are valuable for software developers seeking knowledge, but the code snippets they provide are often uncompilable and incomplete due to unresolved types and missing libraries. This poses a challenge for users who wish to reuse or analyze these snippets. Existing methods either do not focus on creating compilable code or have low success rates. To address this, we propose ZS4C, a lightweight approach for zero-shot synthesis of compilable code from incomplete snippets using Large Language Models (LLMs). ZS4C operates in two stages: first, it uses an LLM, like GPT-3.5, to identify missing import statements in a snippet; second, it collaborates with a validator (e.g., compiler) to fix compilation errors caused by incorrect imports and syntax issues. We evaluated ZS4C on the StatType-SO benchmark and a new dataset, Python-SO, which includes 539 Python snippets from Stack Overflow across the 20 most popular Python libraries. ZS4C significantly outperforms existing methods, improving the compilation rate from 63% to 95.1% compared to the state-of-the-art SnR, marking a 50.1% improvement. On average, ZS4C can infer more accurate import statements (with an F1 score of 0.98) than SnR, with an improvement of 8.5% in the F1.", "authors": [ "Wenbin Zhang", "Muhammad Asaduzzaman", "Tse-Hsun Chen", "Yuan Tian", "Shaowei Wang", "Azmain Kabir" ], "published": "2024-01-25", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-relational-learning-on-temporal", "arxiv_id": "2311.10112", "nips_id": null, "url_abs": "https://arxiv.org/abs/2311.10112v2", "url_pdf": "https://arxiv.org/pdf/2311.10112v2.pdf", "title": "zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models", "abstract": "Modeling evolving knowledge over temporal knowledge graphs (TKGs) has become a heated topic. Various methods have been proposed to forecast links on TKGs. Most of them are embedding-based, where hidden representations are learned to represent knowledge graph (KG) entities and relations based on the observed graph contexts. Although these methods show strong performance on traditional TKG forecasting (TKGF) benchmarks, they face a strong challenge in modeling the unseen zero-shot relations that have no prior graph context. In this paper, we try to mitigate this problem as follows. We first input the text descriptions of KG relations into large language models (LLMs) for generating relation representations, and then introduce them into embedding-based TKGF methods. LLM-empowered representations can capture the semantic information in the relation descriptions. This makes the relations, whether seen or unseen, with similar semantic meanings stay close in the embedding space, enabling TKGF models to recognize zero-shot relations even without any observed graph context. Experimental results show that our approach helps TKGF models to achieve much better performance in forecasting the facts with previously unseen relations, while still maintaining their ability in link forecasting regarding seen relations.", "authors": [ "Volker Tresp", "Bo Xiong", "Ruotong Liao", "Yunpu Ma", "Jingpei Wu", "Heling Cai", "Zifeng Ding" ], "published": "2023-11-15", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zoqo-zero-order-quantized-optimization", "arxiv_id": "2501.06736", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.06736v1", "url_pdf": "https://arxiv.org/pdf/2501.06736v1.pdf", "title": "ZOQO: Zero-Order Quantized Optimization", "abstract": "The increasing computational and memory demands in deep learning present significant challenges, especially in resource-constrained environments. We introduce a zero-order quantized optimization (ZOQO) method designed for training models with quantized parameters and operations. Our approach leverages zero-order approximations of the gradient sign and adapts the learning process to maintain the parameters' quantization without the need for full-precision gradient calculations. We demonstrate the effectiveness of ZOQO through experiments in fine-tuning of large language models and black-box adversarial attacks. Despite the limitations of zero-order and quantized operations training, our method achieves competitive performance compared to full-precision methods, highlighting its potential for low-resource environments.", "authors": [ "Raja Giryes", "Noga Bar" ], "published": "2025-01-12", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zoomeye-enhancing-multimodal-llms-with-human", "arxiv_id": "2411.16044", "nips_id": null, "url_abs": "https://arxiv.org/abs/2411.16044v1", "url_pdf": "https://arxiv.org/pdf/2411.16044v1.pdf", "title": "ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration", "abstract": "An image, especially with high-resolution, typically consists of numerous visual elements, ranging from dominant large objects to fine-grained detailed objects. When perceiving such images, multimodal large language models~(MLLMs) face limitations due to the restricted input resolution of the pretrained vision encoder and the cluttered, dense context of the image, resulting in a focus on primary objects while easily overlooking detailed ones. In this paper, we propose Zoom Eye, a tree search algorithm designed to navigate the hierarchical and visual nature of images to capture relevant information. Zoom Eye conceptualizes an image as a tree, with each children node representing a zoomed sub-patch of the parent node and the root represents the overall image. Moreover, Zoom Eye is model-agnostic and training-free, so it enables any MLLMs to simulate human zooming actions by searching along the image tree from root to leaf nodes, seeking out pertinent information, and accurately responding to related queries. We experiment on a series of elaborate high-resolution benchmarks and the results demonstrate that Zoom Eye not only consistently improves the performance of a series base MLLMs with large margin~(e.g., LLaVA-v1.5-7B increases by 34.57\\% on $V^*$ Bench and 17.88\\% on HR-Bench), but also enables small 7B MLLMs to outperform strong large models such as GPT-4o. Our code is available at \\href{https://github.com/om-ai-lab/ZoomEye}{https://github.com/om-ai-lab/ZoomEye}.", "authors": [ "Jianwei Yin", "Mingwei Zhu", "Zilun Zhang", "Ruochen Xu", "Tiancheng Zhao", "Kangjia Zhao", "Haozhan Shen" ], "published": "2024-11-25", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zodiac-a-cardiologist-level-llm-framework-for", "arxiv_id": "2410.02026", "nips_id": null, "url_abs": "https://arxiv.org/abs/2410.02026v1", "url_pdf": "https://arxiv.org/pdf/2410.02026v1.pdf", "title": "Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics", "abstract": "Large language models (LLMs) have demonstrated remarkable progress in healthcare. However, a significant gap remains regarding LLMs' professionalism in domain-specific clinical practices, limiting their application in real-world diagnostics. In this work, we introduce ZODIAC, an LLM-powered framework with cardiologist-level professionalism designed to engage LLMs in cardiological diagnostics. ZODIAC assists cardiologists by extracting clinically relevant characteristics from patient data, detecting significant arrhythmias, and generating preliminary reports for the review and refinement by cardiologists. To achieve cardiologist-level professionalism, ZODIAC is built on a multi-agent collaboration framework, enabling the processing of patient data across multiple modalities. Each LLM agent is fine-tuned using real-world patient data adjudicated by cardiologists, reinforcing the model's professionalism. ZODIAC undergoes rigorous clinical validation with independent cardiologists, evaluated across eight metrics that measure clinical effectiveness and address security concerns. Results show that ZODIAC outperforms industry-leading models, including OpenAI's GPT-4o, Meta's Llama-3.1-405B, and Google's Gemini-pro, as well as medical-specialist LLMs like Microsoft's BioGPT. ZODIAC demonstrates the transformative potential of specialized LLMs in healthcare by delivering domain-specific solutions that meet the stringent demands of medical practice. Notably, ZODIAC has been successfully integrated into electrocardiography (ECG) devices, exemplifying the growing trend of embedding LLMs into Software-as-Medical-Device (SaMD).", "authors": [ "Zhaohan Xi", "Yong Chen", "Zhiheng Liu", "Yiwen Lu", "Alice Zheng", "Mengya Song", "Peng Zhang", "Yuan Zhou" ], "published": "2024-10-02", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zo-adamu-optimizer-adapting-perturbation-by", "arxiv_id": "2312.15184", "nips_id": null, "url_abs": "https://arxiv.org/abs/2312.15184v1", "url_pdf": "https://arxiv.org/pdf/2312.15184v1.pdf", "title": "ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization", "abstract": "Lowering the memory requirement in full-parameter training on large models has become a hot research area. MeZO fine-tunes the large language models (LLMs) by just forward passes in a zeroth-order SGD optimizer (ZO-SGD), demonstrating excellent performance with the same GPU memory usage as inference. However, the simulated perturbation stochastic approximation for gradient estimate in MeZO leads to severe oscillations and incurs a substantial time overhead. Moreover, without momentum regularization, MeZO shows severe over-fitting problems. Lastly, the perturbation-irrelevant momentum on ZO-SGD does not improve the convergence rate. This study proposes ZO-AdaMU to resolve the above problems by adapting the simulated perturbation with momentum in its stochastic approximation. Unlike existing adaptive momentum methods, we relocate momentum on simulated perturbation in stochastic gradient approximation. Our convergence analysis and experiments prove this is a better way to improve convergence stability and rate in ZO-SGD. Extensive experiments demonstrate that ZO-AdaMU yields better generalization for LLMs fine-tuning across various NLP tasks than MeZO and its momentum variants.", "authors": [ "Xiaobao Song", "Chuanyi Liu", "XiangPing Wu", "Yukang Lin", "Yang Xiang", "Youchen Pan", "Qingcai Chen", "Shuoran Jiang" ], "published": "2023-12-23", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zno-eval-benchmarking-reasoning-capabilities", "arxiv_id": "2501.06715", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.06715v1", "url_pdf": "https://arxiv.org/pdf/2501.06715v1.pdf", "title": "ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian", "abstract": "As the usage of large language models for problems outside of simple text understanding or generation increases, assessing their abilities and limitations becomes crucial. While significant progress has been made in this area over the last few years, most research has focused on benchmarking English, leaving other languages underexplored. This makes evaluating the reasoning and robustness level of language models in Ukrainian particularly challenging. The purpose of this work is to establish a comprehensive benchmark for the reasoning capabilities evaluation of large language models in the Ukrainian language. This paper presents the ZNO-Eval benchmark based on real exam tasks from Ukraine's standardized educational testing system: the External Independent Evaluation and the National Multi-subject Test. With single-answer options, multiple-choice, matching, and open-ended questions from diverse subjects, including Ukrainian language, mathematics, history, and geography, this dataset paves the way toward a thorough analysis of reasoning capabilities across different domains and complexities. Evaluation of several well-known language models, such as GPT-3.5-Turbo, GPT-4o, GPT-4-Turbo, Mistral Large, Claude 3 Opus, and Gemini-1.5 Pro on this benchmark demonstrated the superiority of GPT-4o in both common knowledge reasoning and intricate language tasks. At the same time, Gemini Pro and GPT-4 Turbo excelled in the arithmetic domain, leading in single-answer and open-ended math problems. While all models were close to max performance in text-only common knowledge tasks like history and geography, there still is a gap for Ukrainian language and math, thus highlighting the importance of developing specialized language benchmarks for more accurate assessments of model capabilities and limitations across different languages and contexts.", "authors": [ "Anastasiya Troynina", "Victoria Ruvinskaya", "Mykyta Syromiatnikov" ], "published": "2025-01-12", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zklora-efficient-zero-knowledge-proofs-for", "arxiv_id": "2501.13965", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.13965v1", "url_pdf": "https://arxiv.org/pdf/2501.13965v1.pdf", "title": "ZKLoRA: Efficient Zero-Knowledge Proofs for LoRA Verification", "abstract": "Low-Rank Adaptation (LoRA) is a widely adopted method for customizing large-scale language models. In distributed, untrusted training environments, an open source base model user may want to use LoRA weights created by an external contributor, leading to two requirements: (1) the base model user must confirm that the LoRA weights are effective when paired with the intended base model, and (2) the LoRA contributor must keep their proprietary weights private until compensation is assured. We present ZKLoRA, a zero-knowledge verification protocol that relies on succinct proofs and our novel Multi-Party Inference procedure to verify LoRA-base model compatibility without exposing LoRA weights. ZKLoRA produces deterministic correctness guarantees and validates each LoRA module in only 1-2 seconds on state-of-the-art large language models. This low-latency approach enables nearly real-time verification and promotes secure collaboration among geographically decentralized teams and contract-based training pipelines. The protocol ensures that the delivered LoRA module works as claimed, safeguarding the contributor's intellectual property while providing the base model user with verification of compatibility and lineage.", "authors": [ "Marcos Villagra", "Peter Potash", "Bidhan Roy" ], "published": "2025-01-21", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zkllm-zero-knowledge-proofs-for-large", "arxiv_id": "2404.16109", "nips_id": null, "url_abs": "https://arxiv.org/abs/2404.16109v1", "url_pdf": "https://arxiv.org/pdf/2404.16109v1.pdf", "title": "zkLLM: Zero Knowledge Proofs for Large Language Models", "abstract": "The recent surge in artificial intelligence (AI), characterized by the prominence of large language models (LLMs), has ushered in fundamental transformations across the globe. However, alongside these advancements, concerns surrounding the legitimacy of LLMs have grown, posing legal challenges to their extensive applications. Compounding these concerns, the parameters of LLMs are often treated as intellectual property, restricting direct investigations. In this study, we address a fundamental challenge within the realm of AI legislation: the need to establish the authenticity of outputs generated by LLMs. To tackle this issue, we present zkLLM, which stands as the inaugural specialized zero-knowledge proof tailored for LLMs to the best of our knowledge. Addressing the persistent challenge of non-arithmetic operations in deep learning, we introduce tlookup, a parallelized lookup argument designed for non-arithmetic tensor operations in deep learning, offering a solution with no asymptotic overhead. Furthermore, leveraging the foundation of tlookup, we introduce zkAttn, a specialized zero-knowledge proof crafted for the attention mechanism, carefully balancing considerations of running time, memory usage, and accuracy. Empowered by our fully parallelized CUDA implementation, zkLLM emerges as a significant stride towards achieving efficient zero-knowledge verifiable computations over LLMs. Remarkably, for LLMs boasting 13 billion parameters, our approach enables the generation of a correctness proof for the entire inference process in under 15 minutes. The resulting proof, compactly sized at less than 200 kB, is designed to uphold the privacy of the model parameters, ensuring no inadvertent information leakage.", "authors": [ "Hongyang Zhang", "Jason Li", "Haochen Sun" ], "published": "2024-04-24", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "ziya-vl-bilingual-large-vision-language-model", "arxiv_id": "2310.08166", "nips_id": null, "url_abs": "https://arxiv.org/abs/2310.08166v3", "url_pdf": "https://arxiv.org/pdf/2310.08166v3.pdf", "title": "Ziya-Visual: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning", "abstract": "Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs. However, such success is typically limited to English scenarios due to the lack of large-scale and high-quality non-English multi-modal resources, making it extremely difficult to establish competitive counterparts in other languages. In this paper, we introduce the Ziya-Visual series, a set of bilingual large-scale vision-language models (LVLMs) designed to incorporate visual semantics into LLM for multi-modal dialogue. Composed of Ziya-Visual-Base and Ziya-Visual-Chat, our models adopt the Querying Transformer from BLIP-2, further exploring the assistance of optimization schemes such as instruction tuning, multi-stage training and low-rank adaptation module for visual-language alignment. In addition, we stimulate the understanding ability of GPT-4 in multi-modal scenarios, translating our gathered English image-text datasets into Chinese and generating instruction-response through the in-context learning method. The experiment results demonstrate that compared to the existing LVLMs, Ziya-Visual achieves competitive performance across a wide range of English-only tasks including zero-shot image-text retrieval, image captioning, and visual question answering. The evaluation leaderboard accessed by GPT-4 also indicates that our models possess satisfactory image-text understanding and generation capabilities in Chinese multi-modal scenario dialogues. Code, demo and models are available at ~\\url{https://huggingface.co/IDEA-CCNL/Ziya-BLIP2-14B-Visual-v1}.", "authors": [ "Pingjian Zhang", "Yan Song", "Jiaxing Zhang", "Ruyi Gan", "Xinyu Gao", "XiaoJun Wu", "Dixiang Zhang", "Junyu Lu" ], "published": "2023-10-12", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "ziya2-data-centric-learning-is-all-llms-need", "arxiv_id": "2311.03301", "nips_id": null, "url_abs": "https://arxiv.org/abs/2311.03301v2", "url_pdf": "https://arxiv.org/pdf/2311.03301v2.pdf", "title": "Ziya2: Data-centric Learning is All LLMs Need", "abstract": "Various large language models (LLMs) have been proposed in recent years, including closed- and open-source ones, continually setting new records on multiple benchmarks. However, the development of LLMs still faces several issues, such as high cost of training models from scratch, and continual pre-training leading to catastrophic forgetting, etc. Although many such issues are addressed along the line of research on LLMs, an important yet practical limitation is that many studies overly pursue enlarging model sizes without comprehensively analyzing and optimizing the use of pre-training data in their learning process, as well as appropriate organization and leveraging of such data in training LLMs under cost-effective settings. In this work, we propose Ziya2, a model with 13 billion parameters adopting LLaMA2 as the foundation model, and further pre-trained on 700 billion tokens, where we focus on pre-training techniques and use data-centric optimization to enhance the learning process of Ziya2 on different stages. We define three data attributes and firstly establish data-centric scaling laws to illustrate how different data impacts LLMs. Experiments show that Ziya2 significantly outperforms other models in multiple benchmarks especially with promising results compared to representative open-source ones. Ziya2 (Base) is released at https://huggingface.co/IDEA-CCNL/Ziya2-13B-Base and https://modelscope.cn/models/Fengshenbang/Ziya2-13B-Base/summary.", "authors": [ "Hao Wang", "Qi Yang", "Yuanhe Tian", "Junqing He", "Yan Song", "Jiaxing Zhang", "Ping Yang", "Kunhao Pan", "Dixiang Zhang", "XiaoJun Wu", "Junyu Lu", "Renliang Sun", "Ziwei Wu", "Ruyi Gan" ], "published": "2023-11-06", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "ziplm-inference-aware-structured-pruning-of-1", "arxiv_id": "2302.04089", "nips_id": null, "url_abs": "https://arxiv.org/abs/2302.04089v2", "url_pdf": "https://arxiv.org/pdf/2302.04089v2.pdf", "title": "ZipLM: Inference-Aware Structured Pruning of Language Models", "abstract": "The breakthrough performance of large language models (LLMs) comes with major computational footprints and high deployment costs. In this paper, we progress towards resolving this problem by proposing a novel structured compression approach for LLMs, called ZipLM. ZipLM achieves state-of-the-art accuracy-vs-speedup, while matching a set of desired target runtime speedups in any given inference environment. Specifically, given a model, a dataset, an inference environment, as well as a set of speedup targets, ZipLM iteratively identifies and removes components with the worst loss-runtime trade-off. Unlike prior methods that specialize in either the post-training/one-shot or the gradual compression setting, and only for specific families of models such as BERT (encoder) or GPT (decoder), ZipLM produces state-of-the-art compressed models across all these settings. Furthermore, ZipLM achieves superior results for a fraction of the computational cost relative to prior distillation and pruning techniques, making it a cost-effective approach for generating an entire family of smaller, faster, and highly accurate models, guaranteed to meet the desired inference specifications. In particular, ZipLM outperforms all prior BERT-base distillation and pruning techniques, such as CoFi, MiniLM, and TinyBERT. Moreover, it matches the performance of the heavily optimized MobileBERT model, obtained via extensive architecture search, by simply pruning the baseline BERT-large model. When compressing GPT2, ZipLM outperforms DistilGPT2 while being 60% smaller and 30% faster. Our code is available at: https://github.com/IST-DASLab/ZipLM.", "authors": [ "Dan Alistarh", "Elias Frantar", "Eldar Kurtic" ], "published": "2023-02-07", "conference": "ziplm-inference-aware-structured-pruning-of", "conference_url_abs": "https://openreview.net/forum?id=d8j3lsBWpV", "conference_url_pdf": "https://openreview.net/pdf?id=d8j3lsBWpV", "proceeding": "neurips-2023-11" }, { "id": "zigzagkv-dynamic-kv-cache-compression-for", "arxiv_id": "2412.09036", "nips_id": null, "url_abs": "https://arxiv.org/abs/2412.09036v1", "url_pdf": "https://arxiv.org/pdf/2412.09036v1.pdf", "title": "ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty", "abstract": "Large Language models (LLMs) have become a research hotspot. To accelerate the inference of LLMs, storing computed caches in memory has become the standard technique. However, as the inference length increases, growing KV caches might lead to out-of-memory issues. Many existing methods address this issue through KV cache compression, primarily by preserving key tokens throughout all layers to reduce information loss. Most of them allocate a uniform budget size for each layer to retain. However, we observe that the minimum budget sizes needed to retain essential information vary across layers and models based on the perspectives of attention and hidden state output. Building on this observation, this paper proposes a simple yet effective KV cache compression method that leverages layer uncertainty to allocate budget size for each layer. Experimental results show that the proposed method can reduce memory usage of the KV caches to only $\\sim$20\\% when compared to Full KV inference while achieving nearly lossless performance.", "authors": [ "Min Zhang", "Kehai Chen", "Yao Hu", "Yan Gao", "Yikun Lei", "Chen Zhang", "Xikai Liu", "Meizhi Zhong" ], "published": "2024-12-12", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "z-icl-zero-shot-in-context-learning-with", "arxiv_id": "2212.09865", "nips_id": null, "url_abs": "https://arxiv.org/abs/2212.09865v2", "url_pdf": "https://arxiv.org/pdf/2212.09865v2.pdf", "title": "Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations", "abstract": "Although large language models can be prompted for both zero- and few-shot learning, performance drops significantly when no demonstrations are available. In this paper, we introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo-demonstrations for a given test input using a raw text corpus. Concretely, pseudo-demonstrations are constructed by (1) finding the nearest neighbors to the test input from the corpus and pairing them with random task labels, and (2) applying a set of techniques to reduce the amount of direct copying the model does from the resulting demonstrations. Evaluation on nine classification datasets shows that Z-ICL outperforms previous zero-shot methods by a significant margin, and is on par with in-context learning with labeled training data in the few-shot setting. Overall, Z-ICL provides a significantly higher estimate of the zero-shot performance levels of a model, and supports future efforts to develop better pseudo-demonstrations that further improve zero-shot results.", "authors": [ "Hannaneh Hajishirzi", "Luke Zettlemoyer", "Iz Beltagy", "Sewon Min", "Xinxi Lyu" ], "published": "2022-12-19", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zhujiu-a-multi-dimensional-multi-faceted", "arxiv_id": "2308.14353", "nips_id": null, "url_abs": "https://arxiv.org/abs/2308.14353v1", "url_pdf": "https://arxiv.org/pdf/2308.14353v1.pdf", "title": "ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models", "abstract": "The unprecedented performance of large language models (LLMs) requires comprehensive and accurate evaluation. We argue that for LLMs evaluation, benchmarks need to be comprehensive and systematic. To this end, we propose the ZhuJiu benchmark, which has the following strengths: (1) Multi-dimensional ability coverage: We comprehensively evaluate LLMs across 7 ability dimensions covering 51 tasks. Especially, we also propose a new benchmark that focuses on knowledge ability of LLMs. (2) Multi-faceted evaluation methods collaboration: We use 3 different yet complementary evaluation methods to comprehensively evaluate LLMs, which can ensure the authority and accuracy of the evaluation results. (3) Comprehensive Chinese benchmark: ZhuJiu is the pioneering benchmark that fully assesses LLMs in Chinese, while also providing equally robust evaluation abilities in English. (4) Avoiding potential data leakage: To avoid data leakage, we construct evaluation data specifically for 37 tasks. We evaluate 10 current mainstream LLMs and conduct an in-depth discussion and analysis of their results. The ZhuJiu benchmark and open-participation leaderboard are publicly released at http://www.zhujiu-benchmark.com/ and we also provide a demo video at https://youtu.be/qypkJ89L1Ic.", "authors": [ "Jun Zhao", "Kang Liu", "Shengping Liu", "Yubo Chen", "Pengfei Cao", "JunHao Chen", "Pengfan Du", "Haining Xie", "Baoli Zhang" ], "published": "2023-08-28", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zhongjing-enhancing-the-chinese-medical", "arxiv_id": "2308.03549", "nips_id": null, "url_abs": "https://arxiv.org/abs/2308.03549v3", "url_pdf": "https://arxiv.org/pdf/2308.03549v3.pdf", "title": "Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue", "abstract": "Recent advances in Large Language Models (LLMs) have achieved remarkable breakthroughs in understanding and responding to user intents. However, their performance lag behind general use cases in some expertise domains, such as Chinese medicine. Existing efforts to incorporate Chinese medicine into LLMs rely on Supervised Fine-Tuning (SFT) with single-turn and distilled dialogue data. These models lack the ability for doctor-like proactive inquiry and multi-turn comprehension and cannot align responses with experts' intentions. In this work, we introduce Zhongjing, the first Chinese medical LLaMA-based LLM that implements an entire training pipeline from continuous pre-training, SFT, to Reinforcement Learning from Human Feedback (RLHF). Additionally, we construct a Chinese multi-turn medical dialogue dataset of 70,000 authentic doctor-patient dialogues, CMtMedQA, which significantly enhances the model's capability for complex dialogue and proactive inquiry initiation. We also define a refined annotation rule and evaluation criteria given the unique characteristics of the biomedical domain. Extensive experimental results show that Zhongjing outperforms baselines in various capacities and matches the performance of ChatGPT in some abilities, despite the 100x parameters. Ablation studies also demonstrate the contributions of each component: pre-training enhances medical knowledge, and RLHF further improves instruction-following ability and safety. Our code, datasets, and models are available at https://github.com/SupritYoung/Zhongjing.", "authors": [ "Hanjie Zhao", "Hongying Zan", "Yuxiang Jia", "Hongfei Xu", "Guangyu Zhou", "Senbin Zhu", "Songhua Yang" ], "published": "2023-08-07", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-to-strong-generalization-eliciting", "arxiv_id": "2409.12425", "nips_id": null, "url_abs": "https://arxiv.org/abs/2409.12425v1", "url_pdf": "https://arxiv.org/pdf/2409.12425v1.pdf", "title": "Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels", "abstract": "Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. However, this paradigm is limited by the availability of gold labels, while in certain scenarios, LLMs may need to perform tasks that are too complex for humans to provide such labels. To tackle this challenge, this study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization. We iteratively prompt LLMs to annotate unlabeled data and retain high-quality labels by filtering. Surprisingly, we obverse that this iterative process gradually unlocks LLMs' potential on downstream tasks. Our experiments on extensive classification and reasoning tasks confirm the effectiveness of our proposed framework. Our analysis indicates that this paradigm is effective for both in-context learning and fine-tuning, and for various model sizes.", "authors": [ "Lidong Bing", "Anh Tuan Luu", "Boyang Li", "Xiaobao Wu", "Wenxuan Zhang", "Qin Chao", "Chaoqun Liu" ], "published": "2024-09-19", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zerotop-zero-shot-task-oriented-semantic", "arxiv_id": "2212.10815", "nips_id": null, "url_abs": "https://arxiv.org/abs/2212.10815v1", "url_pdf": "https://arxiv.org/pdf/2212.10815v1.pdf", "title": "ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models", "abstract": "We explore the use of large language models (LLMs) for zero-shot semantic parsing. Semantic parsing involves mapping natural language utterances to task-specific meaning representations. Language models are generally trained on the publicly available text and code and cannot be expected to directly generalize to domain-specific parsing tasks in a zero-shot setting. In this work, we propose ZEROTOP, a zero-shot task-oriented parsing method that decomposes a semantic parsing problem into a set of abstractive and extractive question-answering (QA) problems, enabling us to leverage the ability of LLMs to zero-shot answer reading comprehension questions. For each utterance, we prompt the LLM with questions corresponding to its top-level intent and a set of slots and use the LLM generations to construct the target meaning representation. We observe that current LLMs fail to detect unanswerable questions; and as a result, cannot handle questions corresponding to missing slots. To address this problem, we fine-tune a language model on public QA datasets using synthetic negative samples. Experimental results show that our QA-based decomposition paired with the fine-tuned LLM can correctly parse ~16% of utterances in the MTOP dataset without requiring any annotated data.", "authors": [ "Subhro Roy", "Jason Wolfe", "Dheeraj Mekala" ], "published": "2022-12-21", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zeroth-order-policy-gradient-for", "arxiv_id": "2409.17401", "nips_id": null, "url_abs": "https://arxiv.org/abs/2409.17401v1", "url_pdf": "https://arxiv.org/pdf/2409.17401v1.pdf", "title": "Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference", "abstract": "Reward inference (learning a reward model from human preferences) is a critical intermediate step in Reinforcement Learning from Human Feedback (RLHF) for fine-tuning Large Language Models (LLMs) such as ChatGPT. In practice, reward inference faces several fundamental challenges, including double problem misspecification, reward model evaluation without ground truth, distribution shift, and overfitting in joint reward model and policy training. An alternative approach that avoids these pitfalls is direct policy optimization without reward inference, such as Direct Preference Optimization (DPO), which provides a much simpler pipeline and has shown empirical success in LLMs. However, DPO utilizes the closed-form expression between the optimal policy and the reward function, which only works under the bandit setting or deterministic MDPs. This paper develops two RLHF algorithms without reward inference, which work for general RL problems beyond bandits and deterministic MDPs, and general preference models beyond the Bradely-Terry model. The key idea is to estimate the local value function difference from human preferences and then approximate the policy gradient with a zeroth-order gradient approximator. For both algorithms, we establish rates of convergence in terms of the number of policy gradient iterations, as well as the number of trajectory samples and human preference queries per iteration. Our results show there exist provably efficient methods to solve general RLHF problems without reward inference.", "authors": [ "Lei Ying", "Qining Zhang" ], "published": "2024-09-25", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zeroth-order-optimization-meets-human", "arxiv_id": "2303.03751", "nips_id": null, "url_abs": "https://arxiv.org/abs/2303.03751v3", "url_pdf": "https://arxiv.org/pdf/2303.03751v3.pdf", "title": "Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles", "abstract": "In this study, we delve into an emerging optimization challenge involving a black-box objective function that can only be gauged via a ranking oracle-a situation frequently encountered in real-world scenarios, especially when the function is evaluated by human judges. Such challenge is inspired from Reinforcement Learning with Human Feedback (RLHF), an approach recently employed to enhance the performance of Large Language Models (LLMs) using human guidance. We introduce ZO-RankSGD, an innovative zeroth-order optimization algorithm designed to tackle this optimization problem, accompanied by theoretical assurances. Our algorithm utilizes a novel rank-based random estimator to determine the descent direction and guarantees convergence to a stationary point. Moreover, ZO-RankSGD is readily applicable to policy optimization problems in Reinforcement Learning (RL), particularly when only ranking oracles for the episode reward are available. Last but not least, we demonstrate the effectiveness of ZO-RankSGD in a novel application: improving the quality of images generated by a diffusion generative model with human ranking feedback. Throughout experiments, we found that ZO-RankSGD can significantly enhance the detail of generated images with only a few rounds of human feedback. Overall, our work advances the field of zeroth-order optimization by addressing the problem of optimizing functions with only ranking feedback, and offers a new and effective approach for aligning Artificial Intelligence (AI) with human intentions.", "authors": [ "Tsung-Hui Chang", "Dmitry Rybin", "Zhiwei Tang" ], "published": "2023-03-07", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zeroth-order-fine-tuning-of-llms-with-extreme", "arxiv_id": "2406.02913", "nips_id": null, "url_abs": "https://arxiv.org/abs/2406.02913v1", "url_pdf": "https://arxiv.org/pdf/2406.02913v1.pdf", "title": "Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity", "abstract": "Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO fine-tuning of LLMs. Specifically, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. This approach allows the majority of un-tuned parameters to be quantized to accommodate the constraint of limited device memory. Our findings reveal that the pre-training process can identify a set of \"sensitive parameters\" that can guide the ZO fine-tuning of LLMs on downstream tasks. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance, while offering wall-clock time speedup. Additionally, we show that ZO fine-tuning targeting these 0.1% sensitive parameters, combined with 4 bit quantization, enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8 GiB of memory and notably reduced latency.", "authors": [ "Zhaozhuo Xu", "Beidi Chen", "Xiaodong Yu", "Christopher De Sa", "Osbert Bastani", "Jacob R. Gardner", "Yide Ran", "Xinyu Yang", "Zirui Liu", "Yimeng Zeng", "Jikai Long", "Wentao Guo" ], "published": "2024-06-05", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "subzero-random-subspace-zeroth-order", "arxiv_id": "2410.08989", "nips_id": null, "url_abs": "https://arxiv.org/abs/2410.08989v2", "url_pdf": "https://arxiv.org/pdf/2410.08989v2.pdf", "title": "Zeroth-Order Fine-Tuning of LLMs in Random Subspaces", "abstract": "Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model's parameter dimension$\\unicode{x2013}$a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs' high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks.", "authors": [ "Hua Huang", "Jia Li", "Sike Wang", "Pan Zhou", "Ziming Yu" ], "published": "2024-10-11", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-th-order-algorithm-for-softmax-attention", "arxiv_id": "2307.08352", "nips_id": null, "url_abs": "https://arxiv.org/abs/2307.08352v1", "url_pdf": "https://arxiv.org/pdf/2307.08352v1.pdf", "title": "Zero-th Order Algorithm for Softmax Attention Optimization", "abstract": "Large language models (LLMs) have brought about significant transformations in human society. Among the crucial computations in LLMs, the softmax unit holds great importance. Its helps the model generating a probability distribution on potential subsequent words or phrases, considering a series of input words. By utilizing this distribution, the model selects the most probable next word or phrase, based on the assigned probabilities. The softmax unit assumes a vital function in LLM training as it facilitates learning from data through the adjustment of neural network weights and biases. With the development of the size of LLMs, computing the gradient becomes expensive. However, Zero-th Order method can approximately compute the gradient with only forward passes. In this paper, we present a Zero-th Order algorithm specifically tailored for Softmax optimization. We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs. By leveraging the Zeroth-Order method, our work contributes to the advancement of optimization techniques in the context of complex language models.", "authors": [ "Zhao Song", "Sridhar Mahadevan", "Zhihang Li", "Yichuan Deng" ], "published": "2023-07-17", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-visual-relation-detection-via-1", "arxiv_id": "2305.12476", "nips_id": null, "url_abs": "https://arxiv.org/abs/2305.12476v4", "url_pdf": "https://arxiv.org/pdf/2305.12476v4.pdf", "title": "Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models", "abstract": "Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that identifies relationship (or interaction) types between object pairs within an image. However, naively utilizing CLIP with prevalent class-based prompts for zero-shot VRD has several weaknesses, e.g., it struggles to distinguish between different fine-grained relation types and it neglects essential spatial information of two objects. To this end, we propose a novel method for zero-shot VRD: RECODE, which solves RElation detection via COmposite DEscription prompts. Specifically, RECODE first decomposes each predicate category into subject, object, and spatial components. Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component. Different visual cues enhance the discriminability of similar relation categories from different perspectives, which significantly boosts performance in VRD. To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues. Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE.", "authors": [ "Long Chen", "Yueting Zhuang", "Jian Shao", "Guikun Chen", "Jun Xiao", "Lin Li" ], "published": "2023-05-21", "conference": "zero-shot-visual-relation-detection-via", "conference_url_abs": "https://openreview.net/forum?id=wiv21EJ0Vd", "conference_url_pdf": "https://openreview.net/pdf?id=wiv21EJ0Vd", "proceeding": "neurips-2023-11" }, { "id": "zero-shot-video-moment-retrieval-via-off-the", "arxiv_id": "2501.07972", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.07972v1", "url_pdf": "https://arxiv.org/pdf/2501.07972v1.pdf", "title": "Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models", "abstract": "The target of video moment retrieval (VMR) is predicting temporal spans within a video that semantically match a given linguistic query. Existing VMR methods based on multimodal large language models (MLLMs) overly rely on expensive high-quality datasets and time-consuming fine-tuning. Although some recent studies introduce a zero-shot setting to avoid fine-tuning, they overlook inherent language bias in the query, leading to erroneous localization. To tackle the aforementioned challenges, this paper proposes Moment-GPT, a tuning-free pipeline for zero-shot VMR utilizing frozen MLLMs. Specifically, we first employ LLaMA-3 to correct and rephrase the query to mitigate language bias. Subsequently, we design a span generator combined with MiniGPT-v2 to produce candidate spans adaptively. Finally, to leverage the video comprehension capabilities of MLLMs, we apply VideoChatGPT and span scorer to select the most appropriate spans. Our proposed method substantially outperforms the state-ofthe-art MLLM-based and zero-shot models on several public datasets, including QVHighlights, ActivityNet-Captions, and Charades-STA.", "authors": [ "Sidan Du", "Yang Li", "Wenxin Liang", "Ming Li", "Benxiang Zhai", "Yunzhuo Sun", "Yifang Xu" ], "published": "2025-01-14", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-verification-guided-chain-of", "arxiv_id": "2501.13122", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.13122v1", "url_pdf": "https://arxiv.org/pdf/2501.13122v1.pdf", "title": "Zero-Shot Verification-guided Chain of Thoughts", "abstract": "Previous works have demonstrated the effectiveness of Chain-of-Thought (COT) prompts and verifiers in guiding Large Language Models (LLMs) through the space of reasoning. However, most such studies either use a fine-tuned verifier or rely on manually handcrafted few-shot examples. In contrast, in this paper, we focus on LLM-based self-verification of self-generated reasoning steps via COT prompts in a completely zero-shot regime. To explore this setting, we design a new zero-shot prompt, which we call COT STEP, to aid zero-shot decomposition of reasoning steps and design two new zero-shot prompts for LLM-based verifiers. We evaluate the verifiers' ability to classify the correctness of reasoning chains and explore different ways to use verifier scores in guiding reasoning for various mathematical and commonsense reasoning tasks with different LLMs.", "authors": [ "Cornelia Caragea", "Jishnu Ray Chowdhury" ], "published": "2025-01-21", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "text-classification-of-column-headers-with-a", "arxiv_id": "2403.00884", "nips_id": null, "url_abs": "https://arxiv.org/abs/2403.00884v3", "url_pdf": "https://arxiv.org/pdf/2403.00884v3.pdf", "title": "Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment", "abstract": "Traditional dataset retrieval systems rely on metadata for indexing, rather than on the underlying data values. However, high-quality metadata creation and enrichment often require manual annotations, which is a labour-intensive and challenging process to automate. In this study, we propose a method to support metadata enrichment using topic annotations generated by three Large Language Models (LLMs): ChatGPT-3.5, GoogleBard, and GoogleGemini. Our analysis focuses on classifying column headers based on domain-specific topics from the Consortium of European Social Science Data Archives (CESSDA), a Linked Data controlled vocabulary. Our approach operates in a zero-shot setting, integrating the controlled topic vocabulary directly within the input prompt. This integration serves as a Large Context Windows approach, with the aim of improving the results of the topic classification task. We evaluated the performance of the LLMs in terms of internal consistency, inter-machine alignment, and agreement with human classification. Additionally, we investigate the impact of contextual information (i.e., dataset description) on the classification outcomes. Our findings suggest that ChatGPT and GoogleGemini outperform GoogleBard in terms of internal consistency as well as LLM-human-agreement. Interestingly, we found that contextual information had no significant impact on LLM performance. This work proposes a novel approach that leverages LLMs for topic classification of column headers using a controlled vocabulary, presenting a practical application of LLMs and Large Context Windows within the Semantic Web domain. This approach has the potential to facilitate automated metadata enrichment, thereby enhancing dataset retrieval and the Findability, Accessibility, Interoperability, and Reusability (FAIR) of research data on the Web.", "authors": [ "Jacco van Ossenbruggen", "Lise Stork", "Tobias Kuhn", "Margherita Martorana" ], "published": "2024-03-01", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-text-guided-infinite-image", "arxiv_id": "2407.12642", "nips_id": null, "url_abs": "https://arxiv.org/abs/2407.12642v2", "url_pdf": "https://arxiv.org/pdf/2407.12642v2.pdf", "title": "Zero-shot Text-guided Infinite Image Synthesis with LLM guidance", "abstract": "Text-guided image editing and generation methods have diverse real-world applications. However, text-guided infinite image synthesis faces several challenges. First, there is a lack of text-image paired datasets with high-resolution and contextual diversity. Second, expanding images based on text requires global coherence and rich local context understanding. Previous studies have mainly focused on limited categories, such as natural landscapes, and also required to train on high-resolution images with paired text. To address these challenges, we propose a novel approach utilizing Large Language Models (LLMs) for both global coherence and local context understanding, without any high-resolution text-image paired training dataset. We train the diffusion model to expand an image conditioned on global and local captions generated from the LLM and visual feature. At the inference stage, given an image and a global caption, we use the LLM to generate a next local caption to expand the input image. Then, we expand the image using the global caption, generated local caption and the visual feature to consider global consistency and spatial local context. In experiments, our model outperforms the baselines both quantitatively and qualitatively. Furthermore, our model demonstrates the capability of text-guided arbitrary-sized image generation in zero-shot manner with LLM guidance.", "authors": [ "Taehwan Kim", "Taegyeong Lee", "Soyeong Kwon" ], "published": "2024-07-17", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-strategies-for-length-controllable", "arxiv_id": "2501.00233", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.00233v1", "url_pdf": "https://arxiv.org/pdf/2501.00233v1.pdf", "title": "Zero-Shot Strategies for Length-Controllable Summarization", "abstract": "Large language models (LLMs) struggle with precise length control, particularly in zero-shot settings. We conduct a comprehensive study evaluating LLMs' length control capabilities across multiple measures and propose practical methods to improve controllability. Our experiments with LLaMA 3 reveal stark differences in length adherence across measures and highlight inherent biases of the model. To address these challenges, we introduce a set of methods: length approximation, target adjustment, sample filtering, and automated revisions. By combining these methods, we demonstrate substantial improvements in length compliance while maintaining or enhancing summary quality, providing highly effective zero-shot strategies for precise length control without the need for model fine-tuning or architectural changes. With our work, we not only advance our understanding of LLM behavior in controlled text generation but also pave the way for more reliable and adaptable summarization systems in real-world applications.", "authors": [ "Alexander Waibel", "Fabian Retkowski" ], "published": "2024-12-31", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "who-wrote-this-zero-shot-statistical-tests", "arxiv_id": "2501.02406", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.02406v2", "url_pdf": "https://arxiv.org/pdf/2501.02406v2.pdf", "title": "Zero-Shot Statistical Tests for LLM-Generated Text Detection using Finite Sample Concentration Inequalities", "abstract": "Verifying the provenance of content is crucial to the function of many organizations, e.g., educational institutions, social media platforms, firms, etc. This problem is becoming increasingly difficult as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions utilize in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within the institution. In this paper, we answer the following question: Given a piece of text, can we identify whether it was produced by LLM $A$ or $B$ (where $B$ can be a human)? We model LLM-generated text as a sequential stochastic process with complete dependence on history and design zero-shot statistical tests to distinguish between (i) the text generated by two different sets of LLMs $A$ (in-house) and $B$ (non-sanctioned) and also (ii) LLM-generated and human-generated texts. We prove that the type I and type II errors for our tests decrease exponentially in the text length. In designing our tests, we derive concentration inequalities on the difference between log-perplexity and the average entropy of the string under $A$. Specifically, for a given string, we demonstrate that if the string is generated by $A$, the log-perplexity of the string under $A$ converges to the average entropy of the string under $A$, except with an exponentially small probability in string length. We also show that if $B$ generates the text, except with an exponentially small probability in string length, the log-perplexity of the string under $A$ converges to the average cross-entropy of $B$ and $A$. Lastly, we present preliminary experimental results to support our theoretical results. By enabling guaranteed (with high probability) finding of the origin of harmful LLM-generated text with arbitrary size, we can help combat misinformation.", "authors": [ "Ambuj Tewari", "Mohamed Mostagir", "Mojtaba Abdolmaleki", "Tara Radvand" ], "published": "2025-01-04", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-stance-detection-using-contextual", "arxiv_id": "2405.11637", "nips_id": null, "url_abs": "https://arxiv.org/abs/2405.11637v1", "url_pdf": "https://arxiv.org/pdf/2405.11637v1.pdf", "title": "Zero-Shot Stance Detection using Contextual Data Generation with LLMs", "abstract": "Stance detection, the classification of attitudes expressed in a text towards a specific topic, is vital for applications like fake news detection and opinion mining. However, the scarcity of labeled data remains a challenge for this task. To address this problem, we propose Dynamic Model Adaptation with Contextual Data Generation (DyMoAdapt) that combines Few-Shot Learning and Large Language Models. In this approach, we aim to fine-tune an existing model at test time. We achieve this by generating new topic-specific data using GPT-3. This method could enhance performance by allowing the adaptation of the model to new topics. However, the results did not increase as we expected. Furthermore, we introduce the Multi Generated Topic VAST (MGT-VAST) dataset, which extends VAST using GPT-3. In this dataset, each context is associated with multiple topics, allowing the model to understand the relationship between contexts and various potential topics", "authors": [ "Sauleh Eetemadi", "Babak Behkamkia", "Ghazaleh Mahmoudi" ], "published": "2024-05-19", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-spam-email-classification-using-pre", "arxiv_id": "2405.15936", "nips_id": null, "url_abs": "https://arxiv.org/abs/2405.15936v1", "url_pdf": "https://arxiv.org/pdf/2405.15936v1.pdf", "title": "Zero-Shot Spam Email Classification Using Pre-trained Large Language Models", "abstract": "This paper investigates the application of pre-trained large language models (LLMs) for spam email classification using zero-shot prompting. We evaluate the performance of both open-source (Flan-T5) and proprietary LLMs (ChatGPT, GPT-4) on the well-known SpamAssassin dataset. Two classification approaches are explored: (1) truncated raw content from email subject and body, and (2) classification based on summaries generated by ChatGPT. Our empirical analysis, leveraging the entire dataset for evaluation without further training, reveals promising results. Flan-T5 achieves a 90% F1-score on the truncated content approach, while GPT-4 reaches a 95% F1-score using summaries. While these initial findings on a single dataset suggest the potential for classification pipelines of LLM-based subtasks (e.g., summarisation and classification), further validation on diverse datasets is necessary. The high operational costs of proprietary models, coupled with the general inference costs of LLMs, could significantly hinder real-world deployment for spam filtering.", "authors": [ "Sergio Rojas-Galeano" ], "published": "2024-05-24", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-slot-filling-in-the-age-of-llms-for", "arxiv_id": "2411.18980", "nips_id": null, "url_abs": "https://arxiv.org/abs/2411.18980v1", "url_pdf": "https://arxiv.org/pdf/2411.18980v1.pdf", "title": "Zero-shot Slot Filling in the Age of LLMs for Dialogue Systems", "abstract": "Zero-shot slot filling is a well-established subtask of Natural Language Understanding (NLU). However, most existing methods primarily focus on single-turn text data, overlooking the unique complexities of conversational dialogue. Conversational data is highly dynamic, often involving abrupt topic shifts, interruptions, and implicit references that make it difficult to directly apply zero-shot slot filling techniques, even with the remarkable capabilities of large language models (LLMs). This paper addresses these challenges by proposing strategies for automatic data annotation with slot induction and black-box knowledge distillation (KD) from a teacher LLM to a smaller model, outperforming vanilla LLMs on internal datasets by 26% absolute increase in F1 score. Additionally, we introduce an efficient system architecture for call center product settings that surpasses off-the-shelf extractive models by 34% relative F1 score, enabling near real-time inference on dialogue streams with higher accuracy, while preserving low latency.", "authors": [ "Maragathamani Boothalingam", "Sindhuja Gopalan", "Kadri Hacioglu", "Mansi Rana" ], "published": "2024-11-28", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-slot-and-intent-detection-in-low", "arxiv_id": "2304.13292", "nips_id": null, "url_abs": "https://arxiv.org/abs/2304.13292v1", "url_pdf": "https://arxiv.org/pdf/2304.13292v1.pdf", "title": "Zero-Shot Slot and Intent Detection in Low-Resource Languages", "abstract": "Intent detection and slot filling are critical tasks in spoken and natural language understanding for task-oriented dialog systems. In this work we describe our participation in the slot and intent detection for low-resource language varieties (SID4LR; Aepli et al. (2023)). We investigate the slot and intent detection (SID) tasks using a wide range of models and settings. Given the recent success of multitask-prompted finetuning of large language models, we also test the generalization capability of the recent encoder-decoder model mT0 (Muennighoff et al., 2022) on new tasks (i.e., SID) in languages they have never intentionally seen. We show that our best model outperforms the baseline by a large margin (up to +30 F1 points) in both SID tasks", "authors": [ "Muhammad Abdul-Mageed", "Alcides Alcoba Inciarte", "El Moatez Billah Nagoudi", "Gagan Bhatia", "Sang Yun Kwon" ], "published": "2023-04-26", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-sentiment-analysis-in-low-resource", "arxiv_id": "2402.02113", "nips_id": null, "url_abs": "https://arxiv.org/abs/2402.02113v1", "url_pdf": "https://arxiv.org/pdf/2402.02113v1.pdf", "title": "Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon", "abstract": "Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT--3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages.", "authors": [ "Timothy Baldwin", "Iryna Gurevych", "Zeerak Talat", "Tilman Beck", "Fajri Koto" ], "published": "2024-02-03", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-rtl-code-generation-with-attention", "arxiv_id": "2401.08683", "nips_id": null, "url_abs": "https://arxiv.org/abs/2401.08683v1", "url_pdf": "https://arxiv.org/pdf/2401.08683v1.pdf", "title": "Zero-Shot RTL Code Generation with Attention Sink Augmented Large Language Models", "abstract": "The design and optimization of hardware have traditionally been resource-intensive, demanding considerable expertise and dependence on established design automation tools. This paper discusses the possibility of exploiting large language models to streamline the code generation process in hardware design. In contrast to earlier studies, this paper aims to use large language models that accepts high-level design specifications through a single prompt to generate corresponding Register-Transfer Level (RTL) code. The ability to use large language models on RTL code generation not only expedites design iteration cycles but also facilitates the exploration of design spaces that have computational challenges for conventional techniques. Through our evaluation, we demonstrate the shortcoming of existing attention mechanisms, and present the abilities of language models to produce functional, optimized, and industry-standard compliant RTL code when a novel attention mechanism is used. These findings underscore the expanding role of large language models in shaping the future landscape of architectural exploration and automation in hardware design.", "authors": [ "Ismail Akturk", "Selim Sandal" ], "published": "2024-01-12", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-robotic-manipulation-with-language", "arxiv_id": "2501.15214", "nips_id": null, "url_abs": "https://arxiv.org/abs/2501.15214v1", "url_pdf": "https://arxiv.org/pdf/2501.15214v1.pdf", "title": "Zero-shot Robotic Manipulation with Language-guided Instruction and Formal Task Planning", "abstract": "Robotic manipulation is often challenging due to the long-horizon tasks and the complex object relationships. A common solution is to develop a task and motion planning framework that integrates planning for high-level task and low-level motion. Recently, inspired by the powerful reasoning ability of Large Language Models (LLMs), LLM-based planning approaches have achieved remarkable progress. However, these methods still heavily rely on expert-specific knowledge, often generating invalid plans for unseen and unfamiliar tasks. To address this issue, we propose an innovative language-guided symbolic task planning (LM-SymOpt) framework with optimization. It is the first expert-free planning framework since we combine the world knowledge from LLMs with formal reasoning, resulting in improved generalization capability to new tasks. Specifically, differ to most existing work, our LM-SymOpt employs LLMs to translate natural language instructions into symbolic representations, thereby representing actions as high-level symbols and reducing the search space for planning. Next, after evaluating the action probability of completing the task using LLMs, a weighted random sampling method is introduced to generate candidate plans. Their feasibility is assessed through symbolic reasoning and their cost efficiency is then evaluated using trajectory optimization for selecting the optimal planning. Our experimental results show that LM-SymOpt outperforms existing LLM-based planning approaches.", "authors": [ "Yaochu Jin", "Ting Gao", "Ziqi Zheng", "Yuping Yan", "Zihan Ye", "Junfeng Tang" ], "published": "2025-01-25", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-recommendations-with-pre-trained", "arxiv_id": "2309.01026", "nips_id": null, "url_abs": "https://arxiv.org/abs/2309.01026v2", "url_pdf": "https://arxiv.org/pdf/2309.01026v2.pdf", "title": "Zero-Shot Recommendations with Pre-Trained Large Language Models for Multimodal Nudging", "abstract": "We present a method for zero-shot recommendation of multimodal non-stationary content that leverages recent advancements in the field of generative AI. We propose rendering inputs of different modalities as textual descriptions and to utilize pre-trained LLMs to obtain their numerical representations by computing semantic embeddings. Once unified representations of all content items are obtained, the recommendation can be performed by computing an appropriate similarity metric between them without any additional learning. We demonstrate our approach on a synthetic multimodal nudging environment, where the inputs consist of tabular, textual, and visual data.", "authors": [ "Rachel M. Harrison", "Anton Bibin", "Anton Dereventsov" ], "published": "2023-09-02", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-reasoning-personalized-content", "arxiv_id": "2402.10133", "nips_id": null, "url_abs": "https://arxiv.org/abs/2402.10133v2", "url_pdf": "https://arxiv.org/pdf/2402.10133v2.pdf", "title": "Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem", "abstract": "Procedural content generation uses algorithmic techniques to create large amounts of new content for games at much lower production costs. In newer approaches, procedural content generation utilizes machine learning. However, these methods usually require expensive collection of large amounts of data, as well as the development and training of fairly complex learning models, which can be both extremely time-consuming and expensive. The core of our research is to explore whether we can lower the barrier to the use of personalized procedural content generation through a more practical and generalizable approach with large language models. Matching game content with player preferences benefits both players, who enjoy the game more, and developers, who increasingly depend on players enjoying the game before being able to monetize it. Therefore, this paper presents a novel approach to achieving personalization by using large language models to propose levels based on the gameplay data continuously collected from individual players. We compared the levels generated using our approach with levels generated with more traditional procedural generation techniques. Our easily reproducible method has proven viable in a production setting and outperformed levels generated by traditional methods in the probability that a player will not quit the game mid-level.", "authors": [ "Jure Demšar", "Davor Hafnar" ], "published": "2024-02-15", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-question-answering-over-financial", "arxiv_id": "2311.14722", "nips_id": null, "url_abs": "https://arxiv.org/abs/2311.14722v1", "url_pdf": "https://arxiv.org/pdf/2311.14722v1.pdf", "title": "Zero-Shot Question Answering over Financial Documents using Large Language Models", "abstract": "We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports. While LLMs have exhibited remarkable performance on various natural language and reasoning tasks, complex reasoning problems often rely on few-shot prompts that require carefully crafted examples. In contrast, our approach uses novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language. The generated program is then executed by a program interpreter, thus mitigating the limitations of LLM in performing accurate arithmetic calculations. We evaluate the proposed approach on three financial datasets using some of the recently developed generative pretrained transformer (GPT) models and perform comparisons with various zero-shot baselines. The experimental results demonstrate that our approach significantly improves the accuracy for all the LLMs over their respective baselines. We provide a detailed analysis of the results, generating insights to support our findings. The success of our approach demonstrates the enormous potential to extract complex domain specific numerical reasoning by designing zero-shot prompts to effectively exploit the knowledge embedded in LLMs.", "authors": [ "Sai Akhil Puranam", "Shashishekar Ramakrishna", "Sridhar Dasaratha", "Chetan Harsha", "Karmvir Singh Phogat" ], "published": "2023-11-19", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-prompting-and-few-shot-fine-tuning", "arxiv_id": "2412.13859", "nips_id": null, "url_abs": "https://arxiv.org/abs/2412.13859v1", "url_pdf": "https://arxiv.org/pdf/2412.13859v1.pdf", "title": "Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models", "abstract": "Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.", "authors": [ "Andreas Fischer", "Jean-Marc Spat", "Lars Vögtlin", "Michael Jungo", "Anna Scius-Bertrand" ], "published": "2024-12-18", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null }, { "id": "zero-shot-persuasive-chatbots-with-llm", "arxiv_id": "2407.03585", "nips_id": null, "url_abs": "https://arxiv.org/abs/2407.03585v3", "url_pdf": "https://arxiv.org/pdf/2407.03585v3.pdf", "title": "Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval", "abstract": "Persuasion plays a pivotal role in a wide range of applications from health intervention to the promotion of social good. Persuasive chatbots employed responsibly for social good can be an enabler of positive individual and social change. Existing methods rely on fine-tuning persuasive chatbots with task-specific training data which is costly, if not infeasible, to collect. Furthermore, they employ only a handful of pre-defined persuasion strategies. We propose PersuaBot, a zero-shot chatbot based on Large Language Models (LLMs) that is factual and more persuasive by leveraging many more nuanced strategies. PersuaBot uses an LLM to first generate natural responses, from which the strategies used are extracted. To combat hallucination of LLMs, Persuabot replace any unsubstantiated claims in the response with retrieved facts supporting the extracted strategies. We applied our chatbot, PersuaBot, to three significantly different domains needing persuasion skills: donation solicitation, recommendations, and health intervention. Our experiments on simulated and human conversations show that our zero-shot approach is more persuasive than prior work, while achieving factual accuracy surpassing state-of-the-art knowledge-oriented chatbots.", "authors": [ "Monica S. Lam", "Weiyan Shi", "Kazushi Ikeda", "Sina J. Semnani", "Yasutaka Nishimura", "Yudai Yamazaki", "Julio Vizcarra", "Roberto Legaspi", "Kazuaki Furumai" ], "published": "2024-07-04", "conference": null, "conference_url_abs": null, "conference_url_pdf": null, "proceeding": null } ] }{ "count": 24708, "next": "