no code implementations • 5 Apr 2024 • Reuben Tan, Ximeng Sun, Ping Hu, Jui-Hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko
Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships.
no code implementations • 4 Dec 2023 • Piotr Teterwak, Ximeng Sun, Bryan A. Plummer, Kate Saenko, Ser-Nam Lim
Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way.
no code implementations • 24 Aug 2023 • Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian
How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?
no code implementations • 3 Aug 2023 • Ping Hu, Ximeng Sun, Stan Sclaroff, Kate Saenko
Previous works have focused on learning the alignment between textual and visual spaces to compensate for limited image labels, yet may suffer from reduced accuracy due to the scarcity of high-quality multi-label annotations.
no code implementations • 31 Mar 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia
We transfer the knowledge from the pre-trained CLIP-ViTL/14 model to a ViT-B/32 model, with only 40M public images and 28. 4M unpaired public sentences.
no code implementations • ICCV 2023 • Ximeng Sun, Pengchuan Zhang, Peizhao Zhang, Hardik Shah, Kate Saenko, Xide Xia
In this paper, we introduce a new distillation mechanism (DIME-FM) that allows us to transfer the knowledge contained in large VLFMs to smaller, customized foundation models using a relatively small amount of inexpensive, unpaired images and sentences.
1 code implementation • 20 Jun 2022 • Ximeng Sun, Ping Hu, Kate Saenko
Solving multi-label recognition (MLR) for images in the low-label regime is a challenging task with many real-world applications.
1 code implementation • ICCV 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko
Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition.
1 code implementation • ICCV 2021 • Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris
Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency.
no code implementations • 2 Mar 2021 • Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko
Second, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
no code implementations • 31 Mar 2020 • Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection based on proposal regression which detects the start and end time of the activities in untrimmed videos.
2 code implementations • NeurIPS 2020 • Ximeng Sun, Rameswar Panda, Rogerio Feris, Kate Saenko
Multi-task learning is an open and challenging problem in computer vision.
Ranked #104 on Semantic Segmentation on NYU Depth v2
no code implementations • 11 Jun 2019 • Ping Hu, Ximeng Sun, Kate Saenko, Stan Sclaroff
Learning from a few examples is a challenging task for machine learning.
1 code implementation • 28 Apr 2019 • Xingchao Peng, Zijun Huang, Ximeng Sun, Kate Saenko
Unsupervised model transfer has the potential to greatly improve the generalizability of deep models to novel domains.
Ranked #4 on Multi-target Domain Adaptation on DomainNet
no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.
1 code implementation • 3 Dec 2018 • Ximeng Sun, Huijuan Xu, Kate Saenko
Video generation is an inherently challenging task, as it requires modeling realistic temporal dynamics as well as spatial content.
1 code implementation • 20 Mar 2018 • Ximeng Sun, Ryan Szeto, Jason J. Corso
We propose the first deep learning solution to video frame inpainting, a challenging instance of the general video inpainting problem with applications in video editing, manipulation, and forensics.