1 code implementation • 10 Aug 2024 • Jaehong Cho, Minsu Kim, Hyunmin Choi, Guseul Heo, Jongse Park
Recently, there has been an extensive research effort in building efficient large language model (LLM) inference serving systems.
no code implementations • 1 Aug 2024 • Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim
We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos.
no code implementations • 12 Jun 2024 • Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, Joanna Hong, Jeong Hun Yeo, Yong Man Ro
In this paper, we introduce a novel Face-to-Face spoken dialogue model.
no code implementations • 1 Jun 2024 • Minsu Kim, Walid Saad, Merouane Debbah, Choong Seon Hong
To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune.
1 code implementation • 31 May 2024 • Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay Malkin
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem.
no code implementations • 28 May 2024 • Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs).
no code implementations • 28 May 2024 • Seong-Hyeon Hwang, Minsu Kim, Steven Euijong Whang
We study the problem of robust data augmentation for regression tasks in the presence of noisy data.
no code implementations • 2 May 2024 • Minsu Kim, Giseop Kim, Sunwook Choi
Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause significant increases in costs including GPU memory consumption and computing latency, named diverging training costs issue.
no code implementations • 2 May 2024 • Minsu Kim, Joachim Schaeffer, Marc D. Berliner, Berta Pedret Sagnier, Rolf Findeisen, Richard D. Braatz
Our results demonstrate that uncertainty in ambient temperature results in violations of constraints on the voltage and temperature.
no code implementations • 19 Mar 2024 • Minsu Kim, James Thorne
This paper investigates the inherent knowledge in language models from the perspective of epistemological holism.
no code implementations • 18 Mar 2024 • Minsu Kim, Jinwoo Hwang, Guseul Heo, Seiyeon Cho, Divya Mahajan, Jongse Park
Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes.
2 code implementations • 11 Mar 2024 • Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jiwoo Son, Jinkyoo Park, Yoshua Bengio
This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a neural-guided probabilistic search algorithm for solving combinatorial optimization (CO).
1 code implementation • 25 Feb 2024 • Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro
We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text.
1 code implementation • 23 Feb 2024 • Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro
In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs.
Ranked #5 on Lipreading on LRS3-TED (using extra training data)
1 code implementation • 8 Feb 2024 • Nayoung Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park
Antibody design plays a pivotal role in advancing therapeutics.
1 code implementation • 7 Feb 2024 • Marcin Sendera, Minsu Kim, Sarthak Mittal, Pablo Lemos, Luca Scimeca, Jarrid Rector-Brooks, Alexandre Adam, Yoshua Bengio, Nikolay Malkin
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function.
2 code implementations • 5 Feb 2024 • Hyeonah Kim, Minsu Kim, Sanghyeok Choi, Jinkyoo Park
The challenge of discovering new molecules with desired properties is crucial in domains like drug discovery and material design.
no code implementations • 18 Jan 2024 • Minsu Kim, Jeong Hun Yeo, Se Jin Park, Hyeongseop Rha, Yong Man Ro
By using the visual speech units as the inputs of our system, we propose to pre-train a VSR model to predict corresponding text outputs on multilingual data constructed by merging several VSR databases.
no code implementations • 15 Dec 2023 • Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang
However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy.
1 code implementation • CVPR 2024 • Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro
To mitigate the problem of the absence of a parallel AV2AV translation dataset, we propose to train our spoken language translation system with the audio-only dataset of A2A.
no code implementations • 5 Oct 2023 • Hyosoon Jang, Minsu Kim, Sungsoo Ahn
In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions.
1 code implementation • 4 Oct 2023 • Minsu Kim, Joohwan Ko, Taeyoung Yun, Dinghuai Zhang, Ling Pan, Woochang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio
We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly.
2 code implementations • 4 Oct 2023 • Minsu Kim, Taeyoung Yun, Emmanuel Bengio, Dinghuai Zhang, Yoshua Bengio, Sungsoo Ahn, Jinkyoo Park
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards.
no code implementations • 20 Sep 2023 • Minsu Kim, Giseop Kim, Kyong Hwan Jin, Sunwook Choi
The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space.
no code implementations • 18 Sep 2023 • Minsu Kim, Walid Saad
The generalization and memorization performance of the proposed framework are theoretically analyzed.
1 code implementation • 15 Sep 2023 • Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro
Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.
no code implementations • 15 Sep 2023 • Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro
To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp.
1 code implementation • 4 Sep 2023 • Minsu Kim, Jaewon Lee, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin
Existing frameworks for image stitching often provide visually reasonable stitchings.
1 code implementation • 4 Sep 2023 • Minsu Kim, Yongjun Lee, Woo Kyoung Han, Kyong Hwan Jin
Trendy suggestions for learning-based elastic warps enable the deep image stitchings to align images exposed to large parallax errors.
no code implementations • 23 Aug 2023 • Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro
We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh.
no code implementations • ICCV 2023 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro
In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units.
no code implementations • 15 Aug 2023 • Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.
1 code implementation • 3 Aug 2023 • Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro
By setting both the inputs and outputs of our learning problem as speech units, we propose to train an encoder-decoder model in a many-to-many spoken language translation setting, namely Unit-to-Unit Translation (UTUT).
3 code implementations • 29 Jun 2023 • Federico Berto, Chuanbo Hua, Junyoung Park, Laurin Luttmann, Yining Ma, Fanchen Bu, Jiarui Wang, Haoran Ye, Minsu Kim, Sanghyeok Choi, Nayeli Gast Zepeda, André Hottung, Jianan Zhou, Jieyi Bi, Yu Hu, Fei Liu, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Davide Angioni, Wouter Kool, Zhiguang Cao, Qingfu Zhang, Joungho Kim, Jie Zhang, Kijung Shin, Cathy Wu, Sungsoo Ahn, Guojie Song, Changhyun Kwon, Kevin Tierney, Lin Xie, Jinkyoo Park
To fill this gap, we introduce RL4CO, a unified and extensive benchmark with in-depth library coverage of 23 state-of-the-art methods and more than 20 CO problems.
no code implementations • 28 Jun 2023 • Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro
The visual speaker embedding is derived from a single target face image and enables improved mapping of input text to the learned audio latent space by incorporating the speaker characteristics inherent in the audio.
1 code implementation • NeurIPS 2023 • Minsu Kim, Federico Berto, Sungsoo Ahn, Jinkyoo Park
The subsequent stage involves bootstrapping, which augments the training dataset with self-generated data labeled by a proxy score function.
1 code implementation • 5 Jun 2023 • Jiwoo Son, Minsu Kim, Hyeonah Kim, Jinkyoo Park
First, SML transforms the context embedding for subsequent adaptation of SAGE based on scale information.
1 code implementation • 5 Jun 2023 • Jiwoo Son, Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jinkyoo Park
Notably, our method achieves significant reductions of runtime, approximately 335 times, and cost values of about 53\% compared to a competitive heuristic (LKH3) in the case of 100 vehicles with 1, 000 cities of mTSP.
1 code implementation • 2 Jun 2023 • Hyeonah Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park
Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO).
1 code implementation • 31 May 2023 • Jeongsoo Choi, Minsu Kim, Yong Man Ro
Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.
no code implementations • 31 May 2023 • Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro
The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion.
no code implementations • 8 May 2023 • Jeong Hun Yeo, Minsu Kim, Yong Man Ro
Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • CVPR 2023 • Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, Kwanghoon Sohn
Modern data augmentation using a mixture-based technique can regularize the models from overfitting to the training data in various computer vision applications, but a proper data augmentation technique tailored for the part-based Visible-Infrared person Re-IDentification (VI-ReID) models remains unexplored.
1 code implementation • CVPR 2023 • Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro
Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models.
no code implementations • 27 Feb 2023 • Minsu Kim, Chae Won Kim, Yong Man Ro
The proposed DVFA can align the input transcription (i. e., sentence) with the talking face video without accessing the speech audio.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
3 code implementations • 17 Feb 2023 • Minsu Kim, Joanna Hong, Yong Man Ro
To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.
no code implementations • 16 Feb 2023 • Minsu Kim, Hyung-Il Kim, Yong Man Ro
As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements, and this makes the VSR models show degraded performance when they are applied to unseen speakers.
no code implementations • 2 Nov 2022 • Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro
It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time.
no code implementations • 21 Oct 2022 • Minsu Kim, Youngjoon Yu, Sungjune Park, Yong Man Ro
The proposed meta input can be optimized with a small number of testing data only by considering the relation between testing input data and its output prediction.
1 code implementation • 9 Aug 2022 • Minsu Kim, Hyunjun Kim, Yong Man Ro
In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding.
no code implementations • 19 Jul 2022 • Minsu Kim, Walid Saad, Mohammad Mozaffari, Merouane Debbah
In this paper, a green-quantized FL framework, which represents data with a finite precision level in both local training and uplink transmission, is proposed.
1 code implementation • 13 Jul 2022 • Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro
The enhanced audio features are fused with the visual features and taken to an encoder-decoder model composed of Conformer and Transformer for speech recognition.
1 code implementation • 2 Jul 2022 • Jinwoo Hwang, Minsu Kim, Daeun Kim, Seungho Nam, Yoonsung Kim, Dohee Kim, Hardik Sharma, Jongse Park
This paper presents CoVA, a novel cascade architecture that splits the cascade computation between compressed domain and pixel domain to address the decoding bottleneck, supporting both temporal and spatial queries.
no code implementations • 15 Jun 2022 • Joanna Hong, Minsu Kim, Yong Man Ro
Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject.
1 code implementation • 14 Jun 2022 • Daoguang Zan, Bei Chen, Dejian Yang, Zeqi Lin, Minsu Kim, Bei guan, Yongji Wang, Weizhu Chen, Jian-Guang Lou
Usually, expensive text-code paired data is essential for training a code generation model.
Ranked #130 on Code Generation on HumanEval
1 code implementation • 26 May 2022 • Minsu Kim, Junyoung Park, Jinkyoo Park
Deep reinforcement learning (DRL)-based combinatorial optimization (CO) methods (i. e., DRL-NCO) have shown significant merit over the conventional CO solvers as DRL-NCO is capable of learning CO solvers less relying on problem-specific expert domain knowledge (heuristic method) and supervised labeled data (supervised learning method).
4 code implementations • 26 May 2022 • Haeyeon Kim, Minsu Kim, Federico Berto, Joungho Kim, Jinkyoo Park
In this paper, we present DevFormer, a novel transformer-based architecture for addressing the complex and computationally demanding problem of hardware design optimization.
1 code implementation • The AAAI Conference on Artificial Intelligence (AAAI) 2022 • Minsu Kim, Jeong Hun Yeo, Yong Man Ro
With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.
Ranked #3 on Lipreading on CAS-VSR-W1k (LRW-1000)
1 code implementation • NeurIPS 2021 • Minsu Kim, Joanna Hong, Yong Man Ro
In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.
1 code implementation • ICCV 2021 • Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro
By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.
Ranked #4 on Lipreading on CAS-VSR-W1k (LRW-1000)
no code implementations • 29 Mar 2022 • HyunWook Park, Minsu Kim, Seongguk Kim, Keunwoo Kim, Haeyeon Kim, Taein Shin, Keeyoung Son, Boogyo Sim, Subin Kim, Seungtaek Jeong, Chulsoon Hwang, Joungho Kim
Therefore, without additional training, the trained network can solve new decap optimization problems.
1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 • Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro
Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.
no code implementations • 15 Nov 2021 • Minsu Kim, Walid Saad, Mohammad Mozaffari, Merouane Debbah
In this paper, a quantized FL framework, that represents data with a finite level of precision in both local training and uplink transmission, is proposed.
1 code implementation • NeurIPS 2021 • Minsu Kim, Jinkyoo Park, Joungho Kim
Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge.
no code implementations • ICCV 2021 • Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, Kwanghoon Sohn
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object and achieves competitive performance on fine-grained image recognition and vehicle re-identification.
no code implementations • 17 Jun 2021 • Minsu Kim, Jeonghun Park, Jemin Lee
We consider an optimization problem that maximizes the sum spectral efficiency of delay-tolerant users while satisfying the latency constraint of delay-constrained users, and propose a generalized power iteration (GPI) precoding algorithm that finds a principal precoding vector.
no code implementations • 14 Apr 2021 • Seongjun Kim, Minsu Kim, Jong Yeol Ryu, Jemin Lee, Tony Q. S. Quek
By considering the antenna tilt angle-based channel gain, we derive the network outage probability for both IS-BS and ES-BS schemes, and show the existence of the optimal tilt angle that minimizes the network outage probability after analyzing the conflict impact of the antenna tilt angle.
no code implementations • 17 Dec 2020 • Minsu Kim, Seongjun Kim, Jemin Lee
In this paper, we analyze the impact of a friendly unmanned aerial vehicle (UAV) jammer on UAV communications in the presence of multiple eavesdroppers.
1 code implementation • 15 Dec 2020 • Minsu Kim, Sunghun Joung, Seungryong Kim, Jungin Park, Ig-Jae Kim, Kwanghoon Sohn
Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner.
no code implementations • 12 Nov 2020 • Minsu Kim, Sungho Lee, Chanwon Park, Jemin Lee, Walid Saad
The age of information (AoI) is a recently proposed metric for quantifying data freshness in real-time status monitoring systems where timeliness is of importance.
no code implementations • 28 Oct 2020 • Minsu Kim, Sungho Lee, Chanwon Park, Jemin Lee
In this paper, we explore the data freshness in the Hyperledger Fabric Blockchain-enabled monitoring network (HeMN) by leveraging the AoI metric.
no code implementations • CVPR 2020 • Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn
To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.