Search Results for author: Yu-Jhe Li

Found 24 papers, 3 papers with code

Multi-Person 3D Pose Estimation from Multi-View Uncalibrated Depth Cameras

no code implementations28 Jan 2024 Yu-Jhe Li, Yan Xu, Rawal Khirodkar, Jinhyung Park, Kris Kitani

In order to evaluate our proposed pipeline, we collect three video sets of RGBD videos recorded from multiple sparse-view depth cameras and ground truth 3D poses are manually annotated.

3D Human Pose Estimation 3D Pose Estimation +2

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

no code implementations21 Mar 2023 Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani

We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code.

Contrastive Learning Text to 3D

Azimuth Super-Resolution for FMCW Radar in Autonomous Driving

no code implementations CVPR 2023 Yu-Jhe Li, Shawn Hunt, Jinhyung Park, Matthew O’Toole, Kris Kitani

We also propose a hybrid super-resolution model (Hybrid-SR) combining our ADC-SR with a standard RAD super-resolution model, and show that performance can be improved by a large margin.

Autonomous Driving object-detection +2

Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

no code implementations16 Mar 2022 Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato

We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).

Domain Adaptation Knowledge Distillation

Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection

no code implementations CVPR 2022 Yu-Jhe Li, Jinhyung Park, Matthew O'Toole, Kris Kitani

To mitigate this problem, we propose the Self-Training Multimodal Vehicle Detection Network (ST-MVDNet) which leverages a Teacher-Student mutual learning framework and a simulated sensor noise model used in strong data augmentation for Lidar and Radar.

Autonomous Vehicles Data Augmentation

Cross-Domain Adaptive Teacher for Object Detection

2 code implementations CVPR 2022 Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vajda

To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.

Data Augmentation Domain Adaptation +3

Wide-Baseline Multi-Camera Calibration using Person Re-Identification

no code implementations CVPR 2021 Yan Xu, Yu-Jhe Li, Xinshuo Weng, Kris Kitani

We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios, e. g., cameras for construction sites, sports stadiums, and public spaces.

Camera Calibration Person Re-Identification

Visio-Temporal Attention for Multi-Camera Multi-Target Association

no code implementations ICCV 2021 Yu-Jhe Li, Xinshuo Weng, Yan Xu, Kris M. Kitani

We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.

Semantics-Guided Representation Learning with Applications to Visual Synthesis

no code implementations21 Oct 2020 Jia-Wei Yan, Ci-Siang Lin, Fu-En Yang, Yu-Jhe Li, Yu-Chiang Frank Wang

Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand and utilize the derived latent space for further applications such as visual synthesis or recognition.

Representation Learning

Transforming Multi-Concept Attention into Video Summarization

no code implementations2 Jun 2020 Yen-Ting Liu, Yu-Jhe Li, Yu-Chiang Frank Wang

Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input.

Video Summarization

Learning Shape Representations for Clothing Variations in Person Re-Identification

no code implementations16 Mar 2020 Yu-Jhe Li, Zhengyi Luo, Xinshuo Weng, Kris M. Kitani

To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.

Disentanglement Person Re-Identification

Learning Resolution-Invariant Deep Representations for Person Re-Identification

no code implementations25 Jul 2019 Yun-Chun Chen, Yu-Jhe Li, Xiaofei Du, Yu-Chiang Frank Wang

Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.

Image Super-Resolution Person Re-Identification

Dual-modality seq2seq network for audio-visual event localization

2 code implementations20 Feb 2019 Yan-Bo Lin, Yu-Jhe Li, Yu-Chiang Frank Wang

Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).

audio-visual event localization

Deep Learning for Malicious Flow Detection

no code implementations9 Feb 2018 Yun-Chun Chen, Yu-Jhe Li, Aragorn Tseng, Tsungnan Lin

We also conduct a partial flow experiment which shows the feasibility of real-time detection and a zero-shot learning experiment which justifies the generalization capability of deep learning in cyber security.

Zero-Shot Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.