Search Results for author: Yong Jae Lee

Found 69 papers, 38 papers with code

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

no code implementations22 Mar 2024 Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

Based on this, we propose PruMerge, a novel adaptive visual token reduction approach, which largely reduces the number of visual tokens while maintaining comparable model performance.

Language Modelling Large Language Model +2

LLM Inference Unveiled: Survey and Roofline Model Insights

2 code implementations26 Feb 2024 Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques.

Knowledge Distillation Language Modelling +3

CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

1 code implementation20 Feb 2024 Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.

counterfactual Data Augmentation +2

Edit One for All: Interactive Batch Image Editing

no code implementations18 Jan 2024 Thao Nguyen, Utkarsh Ojha, Yuheng Li, Haotian Liu, Yong Jae Lee

With increased human control, it is now possible to edit an image in a plethora of ways; from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner.

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

no code implementations4 Dec 2023 Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi, Fanyi Xiao, Yong Jae Lee

We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts.

Domain Generalization Text-to-Image Generation

Making Large Multimodal Models Understand Arbitrary Visual Prompts

no code implementations1 Dec 2023 Mu Cai, Haotian Liu, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Dennis Park, Yong Jae Lee

Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain.

Visual Commonsense Reasoning Visual Prompting

Testing learning-enabled cyber-physical systems with Large-Language Models: A Formal Approach

no code implementations13 Nov 2023 Xi Zheng, Aloysius K. Mok, Ruzica Piskac, Yong Jae Lee, Bhaskar Krishnamachari, Dakai Zhu, Oleg Sokolsky, Insup Lee

The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations.

Autonomous Vehicles

A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

1 code implementation ICCV 2023 Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain.

Domain Generalization Knowledge Distillation +2

Investigating the Catastrophic Forgetting in Multimodal Large Language Models

no code implementations19 Sep 2023 Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still remains an inherent problem in multimodal LLMs (MLLM).

Image Classification Language Modelling +1

Visual Instruction Inversion: Image Editing via Visual Prompting

1 code implementation26 Jul 2023 Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

Given pairs of example that represent the "before" and "after" images of an edit, our goal is to learn a text-based editing direction that can be used to perform the same edit on new images.

Visual Prompting

Benchmarking and Analyzing Generative Data for Visual Recognition

no code implementations25 Jul 2023 Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu

Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition.

Benchmarking Retrieval

Generate Anything Anywhere in Any Scene

no code implementations29 Jun 2023 Yuheng Li, Haotian Liu, Yangming Wen, Yong Jae Lee

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields.

Data Augmentation Object

Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding

no code implementations9 Jun 2023 Mu Cai, Zeyi Huang, Yuheng Li, Haohan Wang, Yong Jae Lee

By leveraging the XML-based textual descriptions of SVG representations instead of raster images, we aim to bridge the gap between the visual and textual modalities, allowing LLMs to directly understand and manipulate images without the need for parameterized visual components.

Image Classification In-Context Learning +2

Visual Instruction Tuning

8 code implementations NeurIPS 2023 Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee

Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field.

visual instruction following Visual Question Answering +1

Segment Everything Everywhere All at Once

2 code implementations NeurIPS 2023 Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Image Segmentation Interactive Segmentation +4

InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning

no code implementations13 Mar 2023 Zhuoran Yu, Yin Li, Yong Jae Lee

Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i. e., close to the current training data.

Out-of-Distribution Detection

Towards Universal Fake Image Detectors that Generalize Across Generative Models

1 code implementation CVPR 2023 Utkarsh Ojha, Yuheng Li, Yong Jae Lee

In this work, we first show that the existing paradigm, which consists of training a deep network for real-vs-fake classification, fails to detect fake images from newer breeds of generative models when trained to detect GAN fake images.

Classification Language Modelling

EnergyMatch: Energy-based Pseudo-Labeling for Semi-Supervised Learning

no code implementations13 Jun 2022 Zhuoran Yu, Yin Li, Yong Jae Lee

However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable.

Out-of-Distribution Detection

The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization

1 code implementation9 Apr 2022 Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing

Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e. g., generalization across distributions) is valued.

BIG-bench Machine Learning Domain Generalization

End-to-End Instance Edge Detection

no code implementations6 Apr 2022 Xueyan Zou, Haotian Liu, Yong Jae Lee

We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection.

Edge Detection Instance Segmentation +5

GIRAFFE HD: A High-Resolution 3D-aware Generative Model

1 code implementation CVPR 2022 Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee

3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation.

Disentanglement Image Generation +2

Masked Discrimination for Self-Supervised Learning on Point Clouds

1 code implementation21 Mar 2022 Haotian Liu, Mu Cai, Yong Jae Lee

Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.

3D Shape Classification Binary Classification +4

The Two Dimensions of Worst-Case Training and Their Integrated Effect for Out-of-Domain Generalization

no code implementations CVPR 2022 Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing

Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e. g., generalization across distributions) is valued.

BIG-bench Machine Learning Domain Generalization

Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features

1 code implementation5 Nov 2021 Haohan Wang, Zeyi Huang, HANLIN ZHANG, Yong Jae Lee, Eric Xing

Machine learning has demonstrated remarkable prediction accuracy over i. i. d data, but the accuracy often drops when tested with data from another distribution.

BIG-bench Machine Learning

Progressive Temporal Feature Alignment Network for Video Inpainting

1 code implementation CVPR 2021 Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee

To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content.

Optical Flow Estimation Video Inpainting

Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

no code implementations5 Apr 2021 Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).

Disentanglement Object

Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains

no code implementations ICLR 2021 Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e. g., dogs and cars).

Disentanglement Object

YolactEdge: Real-time Instance Segmentation on the Edge

2 code implementations22 Dec 2020 Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, Yong Jae Lee

We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds.

Real-time Instance Segmentation Semantic Segmentation

Delving Deeper into Anti-aliasing in ConvNets

2 code implementations21 Aug 2020 Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee

Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.

Instance Segmentation Segmentation +1

YOLACT++: Better Real-time Instance Segmentation

36 code implementations3 Dec 2019 Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee

Then we produce instance masks by linearly combining the prototypes with the mask coefficients.

Ranked #15 on Real-time Instance Segmentation on MSCOCO (using extra training data)

Real-time Instance Segmentation Segmentation +1

Password-conditioned Anonymization and Deanonymization with Face Identity Transformers

1 code implementation26 Nov 2019 Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee

Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications.

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

3 code implementations CVPR 2020 Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation.

Conditional Image Generation Disentanglement

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Class-Imbalanced Data

1 code implementation NeurIPS 2020 Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, Yong Jae Lee

We propose a novel unsupervised generative model that learns to disentangle object identity from other low-level aspects in class-imbalanced data.

Object Representation Learning

YOLACT: Real-time Instance Segmentation

48 code implementations ICCV 2019 Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee

Then we produce instance masks by linearly combining the prototypes with the mask coefficients.

Ranked #21 on Real-time Instance Segmentation on MSCOCO (using extra training data)

Real-time Instance Segmentation Segmentation +2

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

1 code implementation CVPR 2019 Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories.

Conditional Image Generation Disentanglement +3

DOCK: Detecting Objects by transferring Common-sense Knowledge

no code implementations ECCV 2018 Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee

We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories.

Attribute Common Sense Reasoning +3

Learning to Anonymize Faces for Privacy Preserving Action Detection

1 code implementation ECCV 2018 Zhongzheng Ren, Yong Jae Lee, Michael S. Ryoo

The end result is a video anonymizer that performs pixel-level modifications to anonymize each person's face, with minimal effect on action detection performance.

Action Detection Privacy Preserving

Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks

no code implementations25 May 2017 Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee

Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest.

Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

no code implementations CVPR 2017 Fanyi Xiao, Leonid Sigal, Yong Jae Lee

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i. e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.

Sentence Visual Grounding

Identifying First-person Camera Wearers in Third-person Videos

no code implementations CVPR 2017 Chenyou Fan, Jang-Won Lee, Mingze Xu, Krishna Kumar Singh, Yong Jae Lee, David J. Crandall, Michael S. Ryoo

We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene.

Activity Recognition Object Tracking +1

Interspecies Knowledge Transfer for Facial Keypoint Detection

1 code implementation CVPR 2017 Maheen Rashid, Xiuye Gu, Yong Jae Lee

Instead of directly finetuning a network trained to detect keypoints on human faces to animal faces (which is sub-optimal since human and animal faces can look quite different), we propose to first adapt the animal images to the pre-trained human detection network by correcting for the differences in animal and human face shape.

Human Detection Keypoint Detection +1

End-to-End Localization and Ranking for Relative Attributes

no code implementations9 Aug 2016 Krishna Kumar Singh, Yong Jae Lee

We propose an end-to-end deep convolutional network to simultaneously localize and rank relative visual attributes, given only weakly-supervised pairwise image comparisons.

Attribute

Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals

no code implementations CVPR 2016 Fanyi Xiao, Yong Jae Lee

We present an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals---spatio-temporal tubes that localize the foreground objects---in an unannotated video.

Segmentation

Discovering the Spatial Extent of Relative Attributes

no code implementations ICCV 2015 Fanyi Xiao, Yong Jae Lee

We present a weakly-supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images.

Attribute

FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences

no code implementations CVPR 2015 Tinghui Zhou, Yong Jae Lee, Stella X. Yu, Alyosha A. Efros

Given a set of poorly aligned images of the same visual concept without any annotations, we propose an algorithm to jointly bring them into pixel-wise correspondence by estimating a FlowWeb representation of the image set.

Optical Flow Estimation

Predicting Important Objects for Egocentric Video Summarization

no code implementations18 May 2015 Yong Jae Lee, Kristen Grauman

Our results on two egocentric video datasets show the method's promise relative to existing techniques for saliency and summarization.

Event Detection Video Summarization

Weakly-supervised Discovery of Visual Pattern Configurations

no code implementations NeurIPS 2014 Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, Trevor Darrell

The increasing prominence of weakly labeled data nurtures a growing demand for object detection methods that can cope with minimal supervision.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.