Search Results for author: Kwonjoon Lee

Found 12 papers, 5 papers with code

Vamos: Versatile Action Models for Video Understanding

no code implementations • 22 Nov 2023 • Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

What makes good video representations for video understanding, such as anticipating future activities, or answering video-conditioned questions?

Ranked #2 on Zero-Shot Video Question Answer on EgoSchema (fullset)

Language Modelling Large Language Model +2

Paper
Add Code

Object-centric Video Representation for Long-term Action Anticipation

1 code implementation • 31 Oct 2023 • Ce Zhang, Changcheng Fu, Shijie Wang, Nakul Agarwal, Kwonjoon Lee, Chiho Choi, Chen Sun

To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.

Ranked #2 on Long Term Action Anticipation on Ego4D

Action Anticipation Human-Object Interaction Detection +4

Paper
Code

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

no code implementations • 9 Oct 2023 • Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang

We categorize the problem of VCR into visual commonsense understanding (VCU) and visual commonsense inference (VCI).

Image Captioning Visual Commonsense Reasoning

Paper
Add Code

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

no code implementations • 31 Jul 2023 • Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.

Action Anticipation counterfactual +1

Paper
Add Code

AdamsFormer for Spatial Action Localization in the Future

no code implementations • CVPR 2023 • Hyung-gun Chi, Kwonjoon Lee, Nakul Agarwal, Yi Xu, Karthik Ramani, Chiho Choi

SALF is challenging because it requires understanding the underlying physics of video observations to predict future action locations accurately.

Action Localization

Paper
Add Code

ViTGAN: Training GANs with Vision Transformers

3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Ranked #68 on Image Generation on CIFAR-10

Image Generation

505

Paper
Code

Dual Contradistinctive Generative Autoencoder

no code implementations • CVPR 2021 • Gaurav Parmar, Dacheng Li, Kwonjoon Lee, Zhuowen Tu

Our model, named dual contradistinctive generative autoencoder (DC-VAE), integrates an instance-level discriminative loss (maintaining the instance-level fidelity for the reconstruction/synthesis) with a set-level adversarial loss (encouraging the set-level fidelity for there construction/synthesis), both being contradistinctive.

Ranked #2 on Image Generation on LSUN Bedroom 128 x 128

Image Generation Image Reconstruction +1

Paper
Add Code

Unaligned Image-to-Sequence Transformation with Loop Consistency

no code implementations • ICLR 2020 • Siyang Wang, Justin Lazarow, Kwonjoon Lee, Zhuowen Tu

We tackle the problem of modeling sequential visual phenomena.

Paper
Add Code

Learning Instance Occlusion for Panoptic Segmentation

1 code implementation • CVPR 2020 • Justin Lazarow, Kwonjoon Lee, Kunyu Shi, Zhuowen Tu

Panoptic segmentation requires segments of both "things" (countable object instances) and "stuff" (uncountable and amorphous regions) within a single output.

Ranked #22 on Panoptic Segmentation on COCO test-dev

Instance Segmentation Panoptic Segmentation +2

Paper
Code

Meta-Learning with Differentiable Convex Optimization

7 code implementations • CVPR 2019 • Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto

We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks.

Ranked #12 on Few-Shot Image Classification on FC100 5-way (1-shot)