Search Results for author: Ximeng Sun

Found 17 papers, 7 papers with code

Koala: Key frame-conditioned long video-LLM

no code implementations • 5 Apr 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.

Action Recognition Question Answering +2

Paper
Add Code

CLAMP: Contrastive LAnguage Model Prompt-tuning

no code implementations • 4 Dec 2023 • Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim

Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.

Contrastive Learning Image Captioning +5

Paper
Add Code

Label Budget Allocation in Multi-Task Learning

no code implementations • 24 Aug 2023 • Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian

How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?

Multi-Task Learning

Paper
Add Code

DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition with Limited Annotations

no code implementations • 3 Aug 2023 • Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko

Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.

Paper
Add Code

DIME-FM: DIstilling Multimodal and Efficient Foundation Models

no code implementations • 31 Mar 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.

Image Classification

Paper
Add Code

DIME-FM : DIstilling Multimodal and Efficient Foundation Models

no code implementations • ICCV 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia

In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.

Image Classification

Paper
Add Code

DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations

1 code implementation • 20 Jun 2022 • Ximeng Sun, Ping Hu, Kate Saenko

Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications.

Ranked #2 on Multi-label Image Recognition with Partial Labels on PASCAL VOC 2007

Multi-label Image Recognition with Partial Labels

Paper
Code

Dynamic Network Quantization for Efficient Video Inference

1 code implementation • ICCV 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.

Quantization Video Recognition

Paper
Code

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.

Video Recognition

Paper
Code

Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.

Image Classification Quantization +2

Paper
Add Code

Revisiting Few-shot Activity Detection with Class Similarity Control

no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.

Action Detection Activity Detection +1

Paper
Add Code

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

2 code implementations • NeurIPS 2020 • Ximeng Sun, Rameswar Panda, Rogerio Feris, Kate Saenko

Multi-task learning is an open and challenging problem in computer vision.

Ranked #104 on Semantic Segmentation on NYU Depth v2

Multi-Task Learning Semantic Segmentation

110

Paper
Code

Weakly-supervised Compositional FeatureAggregation for Few-shot Recognition

no code implementations • 11 Jun 2019 • Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff

Learning from a few examples is a challenging task for machine learning.

Action Recognition Few-Shot Image Classification +1

Paper
Add Code

Domain Agnostic Learning with Disentangled Representations

1 code implementation • 28 Apr 2019 • Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko

Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.

Ranked #4 on Multi-target Domain Adaptation on DomainNet

General Classification Image Classification +1

140

Paper
Code

Similarity R-C3D for Few-shot Temporal Activity Detection

no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell

In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.

Action Detection Activity Detection

Paper
Add Code

TwoStreamVAN: Improving Motion Modeling in Video Generation

1 code implementation • 3 Dec 2018 • Ximeng Sun, Huijuan Xu, Kate Saenko

Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.

Video Generation

Paper
Code

A Temporally-Aware Interpolation Network for Video Frame Inpainting

1 code implementation • 20 Mar 2018 • Ximeng Sun, Ryan Szeto, Jason J. Corso

We propose the first deep learning solution to video frame inpainting, a challenging instance of the general video inpainting problem with applications in video editing, manipulation, and forensics.

Decoder Video Editing +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.