Search Results for author: Yungi Kim

Found 8 papers, 4 papers with code

Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora

no code implementations15 Sep 2024 Yungi Kim, Hyunsoo Ha, Sukyung Lee, Jihoo Kim, Seonghoon Yang, Chanjun Park

With the increasing demand for substantial amounts of high-quality data to train large language models (LLMs), efficiently filtering large web corpora has become a critical challenge.

Language Modelling

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark

no code implementations31 May 2024 Chanjun Park, Hyeonwoo Kim, Dahyun Kim, Seonghwan Cho, Sanghoon Kim, Sukyung Lee, Yungi Kim, Hwalsuk Lee

This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating Large Language Models (LLMs) in Korean.

Diversity

SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models

no code implementations5 Apr 2024 Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park

We focus on integrating the Chain-of-Thought (CoT) and the Program-of-Thought (PoT) learning, hypothesizing that prioritizing the learning of mathematical reasoning ability is helpful for the amplification of problem-solving ability.

Mathematical Reasoning

Evalverse: Unified and Accessible Library for Large Language Model Evaluation

1 code implementation1 Apr 2024 Jihoo Kim, Wonho Song, Dahyun Kim, Yunsu Kim, Yungi Kim, Chanjun Park

This paper introduces Evalverse, a novel library that streamlines the evaluation of Large Language Models (LLMs) by unifying disparate evaluation tools into a single, user-friendly framework.

Language Modelling Large Language Model

Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

1 code implementation28 Mar 2024 Hyunbyung Park, Sukyung Lee, Gyoungjin Gim, Yungi Kim, Dahyun Kim, Chanjun Park

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core.

sDPO: Don't Use Your Data All at Once

no code implementations28 Mar 2024 Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park

As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important.

MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation

1 code implementation15 Dec 2023 Yungi Kim, Taeri Kim, Won-Yong Shin, Sang-Wook Kim

In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together.

Multimedia recommendation

Cannot find the paper you are looking for? You can Submit a new open access paper.