Search Results for author: Ximeng Sun

Found 17 papers, 7 papers with code

A Temporally-Aware Interpolation Network for Video Frame Inpainting

1 code implementation20 Mar 2018 Ximeng Sun, Ryan Szeto, Jason J. Corso

We propose the first deep learning solution to video frame inpainting, a challenging instance of the general video inpainting problem with applications in video editing, manipulation, and forensics.

Video Editing Video Inpainting +1

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation3 Dec 2018 Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Video Generation

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations25 Dec 2018 Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Domain Agnostic Learning with Disentangled Representations

1 code implementation28 Apr 2019 Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.

General Classification Image Classification +1

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations31 Mar 2020 Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations2 Mar 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation ICCV 2021 Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Dynamic Network Quantization for Efficient Video Inference

1 code implementation ICCV 2021 Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

DIME-FM : DIstilling Multimodal and Efficient Foundation Models

no code implementations ICCV 2023 Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.

Image Classification

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations31 Mar 2023 Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

no code implementations3 Aug 2023 Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.

Label Budget Allocation in Multi-Task Learning

no code implementations24 Aug 2023 Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian

How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?

Multi-Task Learning

CLAMP: Contrastive LAnguage Model Prompt-tuning

no code implementations4 Dec 2023 Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim

Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.

Contrastive Learning Image Captioning +5

Koala: Key frame-conditioned long video-LLM

no code implementations5 Apr 2024 Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.