Search Results for author: Minsu Kim

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

Sentence speech-recognition +1

Paper
Add Code

Quilt: Robust Data Segment Selection against Concept Drifts

no code implementations • 15 Dec 2023 • Minsu Kim, Seong-Hyeon Hwang, Steven Euijong Whang

However, we contend that explicitly utilizing the drifted data together leads to much better model accuracy and propose Quilt, a data-centric framework for identifying and selecting data segments that maximize model accuracy.

Paper
Add Code

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

1 code implementation • 5 Dec 2023 • Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro

To mitigate the problem of the absence of a parallel AV2AV translation dataset, we propose to train our spoken language translation system with the audio-only dataset of A2A.

Self-Supervised Learning Speech-to-Speech Translation +1

Paper
Code

Learning Energy Decompositions for Partial Inference of GFlowNets

no code implementations • 5 Oct 2023 • Hyosoon Jang, Minsu Kim, Sungsoo Ahn

In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions.

Paper
Add Code

Local Search GFlowNets

2 code implementations • 4 Oct 2023 • Minsu Kim, Taeyoung Yun, Emmanuel Bengio, Dinghuai Zhang, Yoshua Bengio, Sungsoo Ahn, Jinkyoo Park

Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards.

Paper
Code

Learning to Scale Logits for Temperature-Conditional GFlowNets

1 code implementation • 4 Oct 2023 • Minsu Kim, Joohwan Ko, Taeyoung Yun, Dinghuai Zhang, Ling Pan, Woochang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio

We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly.

Paper
Code

BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird's Eye View Map Construction

no code implementations • 20 Sep 2023 • Minsu Kim, Giseop Kim, Kyong Hwan Jin, Sunwook Choi

The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space.

Depth Estimation Sensor Fusion

Paper
Add Code

Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust?

no code implementations • 18 Sep 2023 • Minsu Kim, Walid Saad

The generalization and memorization performance of the proposed framework are theoretically analyzed.

Autonomous Vehicles Continual Learning +1

Paper
Add Code

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

no code implementations • 15 Sep 2023 • Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp.

Image Comprehension Language Modelling +1

Paper
Add Code

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

no code implementations • 15 Sep 2023 • Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.

Language Identification speech-recognition +1

Paper
Add Code

Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

1 code implementation • 4 Sep 2023 • Minsu Kim, Yongjun Lee, Woo Kyoung Han, Kyong Hwan Jin

Trendy suggestions for learning-based elastic warps enable the deep image stitchings to align images exposed to large parallax errors.

Image Inpainting Image Stitching

Paper
Code

Implicit Neural Image Stitching

1 code implementation • 4 Sep 2023 • Minsu Kim, Jaewon Lee, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin

Existing frameworks for image stitching often provide visually reasonable stitchings.

Image Stitching Super-Resolution

Paper
Code

DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion

no code implementations • 23 Aug 2023 • Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro

We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh.

3D Face Animation

Paper
Add Code

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

no code implementations • ICCV 2023 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units.

Lip Reading

Paper
Add Code

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

no code implementations • 15 Aug 2023 • Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.

Quantization speech-recognition +1

Paper
Add Code

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

1 code implementation • 3 Aug 2023 • Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST).

Representation Learning Speech-to-Speech Translation +4

Paper
Code

RL4CO: a Unified Reinforcement Learning for Combinatorial Optimization Library

1 code implementation • 29 Jun 2023 • Federico Berto, Chuanbo Hua, Junyoung Park, Minsu Kim, Hyeonah Kim, Jiwoo Son, Haeyeon Kim, Joungho Kim, Jinkyoo Park

To address these challenges, we introduce RL4CO, a unified Reinforcement Learning (RL) for Combinatorial Optimization (CO) library.

Combinatorial Optimization Computational Efficiency +4

264

Paper
Code

Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models

no code implementations • 28 Jun 2023 • Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

The visual speaker embedding is derived from a single target face image and enables improved mapping of input text to the learned audio latent space by incorporating the speaker characteristics inherent in the audio.

Face Generation

Paper
Add Code

Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization

1 code implementation • 5 Jun 2023 • Jiwoo Son, Minsu Kim, Hyeonah Kim, Jinkyoo Park

First, SML transforms the context embedding for subsequent adaptation of SAGE based on scale information.

Combinatorial Optimization Meta-Learning

Paper
Code

Equity-Transformer: Solving NP-hard Min-Max Routing Problems as Sequential Generation with Equity Context

1 code implementation • 5 Jun 2023 • Jiwoo Son, Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jinkyoo Park

Notably, our method achieves significant reductions of runtime, approximately 335 times, and cost values of about 53\% compared to a competitive heuristic (LKH3) in the case of 100 vehicles with 1, 000 cities of mTSP.

Decision Making Traveling Salesman Problem

Paper
Code

Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences

1 code implementation • NeurIPS 2023 • Minsu Kim, Federico Berto, Sungsoo Ahn, Jinkyoo Park

The subsequent stage involves bootstrapping, which augments the training dataset with self-generated data labeled by a proxy score function.

Paper
Code

Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

no code implementations • 2 Jun 2023 • Hyeonah Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park

Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO).

Combinatorial Optimization Drug Discovery +3

Paper
Add Code

Intelligible Lip-to-Speech Synthesis with Speech Units

1 code implementation • 31 May 2023 • Jeongsoo Choi, Minsu Kim, Yong Man Ro

Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.

Lip to Speech Synthesis Speech Synthesis

Paper
Code

Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

no code implementations • 31 May 2023 • Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro

The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion.

Talking Face Generation

Paper
Add Code

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

no code implementations • 8 May 2023 • Jeong Hun Yeo, Minsu Kim, Yong Man Ro

Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification

no code implementations • CVPR 2023 • Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, Kwanghoon Sohn

Modern data augmentation using a mixture-based technique can regularize the models from overfitting to the training data in various computer vision applications, but a proper data augmentation technique tailored for the part-based Visible-Infrared person Re-IDentification (VI-ReID) models remains unexplored.

Contrastive Learning Data Augmentation +1

Paper
Add Code

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

1 code implementation • CVPR 2023 • Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

no code implementations • 27 Feb 2023 • Minsu Kim, Chae Won Kim, Yong Man Ro

The proposed DVFA can align the input transcription (i. e., sentence) with the talking face video without accessing the speech audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

3 code implementations • 17 Feb 2023 • Minsu Kim, Joanna Hong, Yong Man Ro

To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.

Lip to Speech Synthesis Multi-Task Learning +1

Paper
Code

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

no code implementations • 16 Feb 2023 • Minsu Kim, Hyung-Il Kim, Yong Man Ro

As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements, and this makes the VSR models show degraded performance when they are applied to unseen speakers.

Sentence speech-recognition +1

Paper
Add Code

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

no code implementations • 2 Nov 2022 • Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time.

Audio-Visual Synchronization Representation Learning +1

Paper
Add Code

Meta Input: How to Leverage Off-the-Shelf Deep Neural Networks

no code implementations • 21 Oct 2022 • Minsu Kim, Youngjoon Yu, Sungjune Park, Yong Man Ro

The proposed meta input can be optimized with a small number of testing data only by considering the relation between testing input data and its output prediction.

Paper
Add Code

Speaker-adaptive Lip Reading with User-dependent Padding

1 code implementation • 9 Aug 2022 • Minsu Kim, Hyunjun Kim, Yong Man Ro

In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding.

Lip Reading speech-recognition +1

Paper
Code

Green, Quantized Federated Learning over Wireless Networks: An Energy-Efficient Design

no code implementations • 19 Jul 2022 • Minsu Kim, Walid Saad, Mohammad Mozaffari, Merouane Debbah

In this paper, a green-quantized FL framework, which represents data with a finite precision level in both local training and uplink transmission, is proposed.

Federated Learning Quantization

Paper
Add Code

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

1 code implementation • 13 Jul 2022 • Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro

The enhanced audio features are fused with the visual features and taken to an encoder-decoder model composed of Conformer and Transformer for speech recognition.

Audio-Visual Speech Recognition Noisy Speech Recognition +2

Paper
Code

CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics

1 code implementation • 2 Jul 2022 • Jinwoo Hwang, Minsu Kim, Daeun Kim, Seungho Nam, Yoonsung Kim, Dohee Kim, Hardik Sharma, Jongse Park

This paper presents CoVA, a novel cascade architecture that splits the cascade computation between compressed domain and pixel domain to address the decoding bottleneck, supporting both temporal and spatial queries.

Paper
Code

VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection

no code implementations • 15 Jun 2022 • Joanna Hong, Minsu Kim, Yong Man Ro

Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject.

feature selection Speech Synthesis

Paper
Add Code

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

1 code implementation • 14 Jun 2022 • Daoguang Zan, Bei Chen, Dejian Yang, Zeqi Lin, Minsu Kim, Bei guan, Yongji Wang, Weizhu Chen, Jian-Guang Lou

Usually, expensive text-code paired data is essential for training a code generation model.

Ranked #122 on Code Generation on HumanEval

Library-Oriented Code Generation

239

Paper
Code

DevFormer: A Symmetric Transformer for Context-Aware Device Placement

2 code implementations • 26 May 2022 • Haeyeon Kim, Minsu Kim, Federico Berto, Joungho Kim, Jinkyoo Park

In this paper, we present DevFormer, a novel transformer-based architecture for addressing the complex and computationally demanding problem of hardware design optimization.

Combinatorial Optimization Meta-Learning

Paper
Code

Sym-NCO: Leveraging Symmetricity for Neural Combinatorial Optimization

1 code implementation • 26 May 2022 • Minsu Kim, Junyoung Park, Jinkyoo Park

Deep reinforcement learning (DRL)-based combinatorial optimization (CO) methods (i. e., DRL-NCO) have shown significant merit over the conventional CO solvers as DRL-NCO is capable of learning CO solvers less relying on problem-specific expert domain knowledge (heuristic method) and supervised labeled data (supervised learning method).

Combinatorial Optimization Traveling Salesman Problem

Paper
Code

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

1 code implementation • The AAAI Conference on Artificial Intelligence (AAAI) 2022 • Minsu Kim, Jeong Hun Yeo, Yong Man Ro

With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.

Ranked #2 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Lip to Speech Synthesis with Visual Context Attentional GAN

1 code implementation • NeurIPS 2021 • Minsu Kim, Joanna Hong, Yong Man Ro

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.

Contrastive Learning Generative Adversarial Network +2

Paper
Code

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

1 code implementation • ICCV 2021 • Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.

Ranked #3 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

no code implementations • 29 Mar 2022 • HyunWook Park, Minsu Kim, Seongguk Kim, Keunwoo Kim, Haeyeon Kim, Taein Shin, Keeyoung Son, Boogyo Sim, Subin Kim, Seungtaek Jeong, Chulsoon Hwang, Joungho Kim

Therefore, without additional training, the trained network can solve new decap optimization problems.

Reinforcement Learning (RL)

Paper
Add Code

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 • Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro

Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.

Ranked #1 on Speaker-Specific Lip to Speech Synthesis on GRID corpus (mixed-speech)

Speaker-Specific Lip to Speech Synthesis

Paper
Code

On the Tradeoff between Energy, Precision, and Accuracy in Federated Quantized Neural Networks

no code implementations • 15 Nov 2021 • Minsu Kim, Walid Saad, Mohammad Mozaffari, Merouane Debbah

In this paper, a quantized FL framework, that represents data with a finite level of precision in both local training and uplink transmission, is proposed.

Federated Learning Quantization

Paper
Add Code

Learning Collaborative Policies to Solve NP-hard Routing Problems

1 code implementation • NeurIPS 2021 • Minsu Kim, Jinkyoo Park, Joungho Kim

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge.

Traveling Salesman Problem

Paper
Code

Learning Canonical 3D Object Representation for Fine-Grained Recognition

no code implementations • ICCV 2021 • Sunghun Joung, Seungryong Kim, Minsu Kim, Ig-Jae Kim, Kwanghoon Sohn

By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object and achieves competitive performance on fine-grained image recognition and vehicle re-identification.

3D Shape Reconstruction Fine-Grained Image Recognition +3

Paper
Add Code

Precoding Design for Multi-user MIMO Systems with Delay-Constrained and -Tolerant Users

no code implementations • 17 Jun 2021 • Minsu Kim, Jeonghun Park, Jemin Lee

We consider an optimization problem that maximizes the sum spectral efficiency of delay-tolerant users while satisfying the latency constraint of delay-constrained users, and propose a generalized power iteration (GPI) precoding algorithm that finds a principal precoding vector.

Paper
Add Code

Non-Terrestrial Networks for UAVs: Base Station Service Provisioning Schemes with Antenna Tilt

no code implementations • 14 Apr 2021 • Seongjun Kim, Minsu Kim, Jong Yeol Ryu, Jemin Lee, Tony Q. S. Quek

By considering the antenna tilt angle-based channel gain, we derive the network outage probability for both IS-BS and ES-BS schemes, and show the existence of the optimal tilt angle that minimizes the network outage probability after analyzing the conflict impact of the antenna tilt angle.

Paper
Add Code

Securing Communications with Friendly Unmanned Aerial Vehicle Jammers

no code implementations • 17 Dec 2020 • Minsu Kim, Seongjun Kim, Jemin Lee

In this paper, we analyze the impact of a friendly unmanned aerial vehicle (UAV) jammer on UAV communications in the presence of multiple eavesdroppers.

Paper
Add Code

Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation

1 code implementation • 15 Dec 2020 • Minsu Kim, Sunghun Joung, Seungryong Kim, Jungin Park, Ig-Jae Kim, Kwanghoon Sohn

Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner.

Clustering Domain Adaptation +2

Paper
Code

Ensuring Data Freshness for Blockchain-enabled Monitoring Networks

no code implementations • 12 Nov 2020 • Minsu Kim, Sungho Lee, Chanwon Park, Jemin Lee, Walid Saad

The age of information (AoI) is a recently proposed metric for quantifying data freshness in real-time status monitoring systems where timeliness is of importance.

Paper
Add Code

Age of Information Analysis in Hyperledger Fabric Blockchain-enabled Monitoring Networks

no code implementations • 28 Oct 2020 • Minsu Kim, Sungho Lee, Chanwon Park, Jemin Lee

In this paper, we explore the data freshness in the Hyperledger Fabric Blockchain-enabled monitoring network (HeMN) by leveraging the AoI metric.

Paper
Add Code

Cylindrical Convolutional Networks for Joint Object Detection and Viewpoint Estimation

no code implementations • CVPR 2020 • Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig-Jae Kim, Junghyun Cho, Kwanghoon Sohn

To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.

Object object-detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.