MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

no code implementations24 Nov 2022 Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang

We present Mesh Pre-Training (MPT), a new pre-training framework that leverages 3D mesh data such as MoCap data for human pose and mesh reconstruction from a single image.

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation14 Jun 2022 Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Language Modelling Masked Language Modeling +6

Cross-modal Representation Learning for Zero-shot Action Recognition

no code implementations CVPR 2022 Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.

Action Recognition Representation Learning +1

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation CVPR 2022 Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Question Answering Video Captioning +2

Mutual Information Continuity-constrained Estimator

no code implementations29 Sep 2021 Tsun-An Hsieh, Cheng Yu, Ying Hung, Chung-Ching Lin, Yu Tsao

Accordingly, we propose Mutual Information Continuity-constrained Estimator (MICE).

Density Estimation

A 4-Element 800MHz-BW 29mW True-Time-Delay Spatial Signal Processor Enabling Fast Beam-Training with Data Communications

no code implementations2 Jun 2021 Chung-Ching Lin, Chase Puglisi, Veljko Boljanovic, Soumen Mohapatra, Han Yan, Erfan Ghaderi, Deukhyoun Heo, Danijela Cabric, Subhanshu Gupta

In this work, we demonstrate a true-time-delay (TTD) array with digitally reconfigurable delay elements enabling both fast beam-training at the receiver with wideband data communications.

VA-RED$^2$: Video Adaptive Redundancy Reduction

no code implementations ICLR 2021 Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both.

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

1 code implementation ECCV 2020 Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

Specifically, given a video frame, a policy network is used to decide what input resolution should be used for processing by the action recognition model, with the goal of improving both accuracy and efficiency.

Action Recognition

True-Time-Delay Arrays for Fast Beam Training in Wideband Millimeter-Wave Systems

no code implementations17 Jul 2020 Veljko Boljanovic, Han Yan, Chung-Ching Lin, Soumen Mohapatra, Deukhyoun Heo, Subhanshu Gupta, Danijela Cabric

We also propose a suitable algorithm that requires a single pilot to achieve high-accuracy estimation of angle of arrival.

Video Instance Segmentation Tracking With a Modified VAE Architecture

no code implementations CVPR 2020 Chung-Ching Lin, Ying Hung, Rogerio Feris, Linglin He

We propose a modified variational autoencoder (VAE) architecture built on top of Mask R-CNN for instance-level video segmentation and tracking.

Instance Segmentation object-detection +5

A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos

no code implementations CVPR 2018 Chung-Ching Lin, Ying Hung

This paper presents a prior-less method for tracking and clustering an unknown number of human faces and maintaining their individual identities in unconstrained videos.


Collaborative Human-AI (CHAI): Evidence-Based Interpretable Melanoma Classification in Dermoscopic Images

1 code implementation30 May 2018 Noel C. F. Codella, Chung-Ching Lin, Allan Halpern, Michael Hind, Rogerio Feris, John R. Smith

Quantitative relevance of results, according to non-expert similarity, as well as localized image regions, are also significantly improved.

General Classification

Adaptive As-Natural-As-Possible Image Stitching

no code implementations CVPR 2015 Chung-Ching Lin, Sharathchandra U. Pankanti, Karthikeyan Natesan Ramamurthy, Aleksandr Y. Aravkin

Computing the warp is fully automated and uses a combination of local homography and global similarity transformations, both of which are estimated with respect to the target.

Image Stitching

