no code implementations • 13 Jan 2025 • Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran
Given that video segmentation is a dense prediction task, we find preserving the spatial structure of the memories is essential so that the queries are split into global-level and patch-level groups.
1 code implementation • 22 Oct 2024 • Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny, Vikas Chandra
Given a light-weight LLM, our LongVU also scales effectively into a smaller size with state-of-the-art video understanding performance.
no code implementations • 7 Dec 2023 • Saksham Suri, Fanyi Xiao, Animesh Sinha, Sean Chang Culatana, Raghuraman Krishnamoorthi, Chenchen Zhu, Abhinav Shrivastava
In the long-tailed detection setting on LVIS, Gen2Det improves the performance on rare categories by a large margin while also significantly improving the performance on other categories, e. g. we see an improvement of 2. 13 Box AP and 1. 84 Mask AP over just training on real data on LVIS with Mask R-CNN.
no code implementations • 4 Dec 2023 • Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi, Fanyi Xiao, Yong Jae Lee
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images, addressing multiple challenges: class name ambiguity, lack of diversity in naive prompts, and domain shifts.
1 code implementation • CVPR 2024 • Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e. g., ~4 AP on COCO/LVIS) over other fast SAM models.
Ranked #3 on
Zero-Shot Instance Segmentation
on LVIS v1.0 val
1 code implementation • ICCV 2023 • Chenchen Zhu, Fanyi Xiao, Andres Alvarado, Yasmine Babaei, Jiabo Hu, Hichem El-Mohri, Sean Chang Culatana, Roshan Sumbaly, Zhicheng Yan
To bootstrap the research on EgoObjects, we present a suite of 4 benchmark tasks around the egocentric object understanding, including a novel instance level- and the classical category level object detection.
no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana
Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+3
2 code implementations • ICCV 2023 • Peize Sun, Shoufa Chen, Chenchen Zhu, Fanyi Xiao, Ping Luo, Saining Xie, Zhicheng Yan
In this paper, we propose a detector with the ability to predict both open-vocabulary objects and their part segmentation.
no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny
Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.
Open Vocabulary Semantic Segmentation
Open-Vocabulary Semantic Segmentation
+3
1 code implementation • 13 Dec 2022 • Lorenzo Pellegrini, Chenchen Zhu, Fanyi Xiao, Zhicheng Yan, Antonio Carta, Matthias De Lange, Vincenzo Lomonaco, Roshan Sumbaly, Pau Rodriguez, David Vazquez
Continual Learning, also known as Lifelong or Incremental Learning, has recently gained renewed interest among the Artificial Intelligence research community.
no code implementations • 24 May 2022 • Michael Dorkenwald, Fanyi Xiao, Biagio Brattoli, Joseph Tighe, Davide Modolo
We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos.
no code implementations • CVPR 2022 • Fanyi Xiao, Kaustav Kundu, Joseph Tighe, Davide Modolo
Most self-supervised video representation learning approaches focus on action recognition.
1 code implementation • 17 Jun 2021 • Fanyi Xiao, Joseph Tighe, Davide Modolo
We present MaCLR, a novel method to explicitly perform cross-modal self-supervised video representations learning from visual and motion modalities.
2 code implementations • 22 Dec 2020 • Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, Yong Jae Lee
We propose YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds.
no code implementations • 20 Sep 2020 • Ling Pei, Songpengcheng Xia, Lei Chu, Fanyi Xiao, Qi Wu, Wenxian Yu, Robert Qiu
Together with the rapid development of the Internet of Things (IoT), human activity recognition (HAR) using wearable Inertial Measurement Units (IMUs) becomes a promising technology for many research areas.
2 code implementations • 21 Aug 2020 • Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee
Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.
no code implementations • 4 Mar 2020 • Fanyi Xiao, Ling Pei, Lei Chu, Danping Zou, Wenxian Yu, Yifan Zhu, Tao Li
The experimental results show that the proposed method can surprisingly converge in a few iterations and achieve an accuracy of 91. 15% on a real IMU dataset, demonstrating the efficiency and effectiveness of the proposed method.
3 code implementations • 23 Jan 2020 • Fanyi Xiao, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer
We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual perception.
36 code implementations • 3 Dec 2019 • Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Ranked #14 on
Real-time Instance Segmentation
on MSCOCO
no code implementations • ICCV 2019 • Fanyi Xiao, Haotian Liu, Yong Jae Lee
We propose a novel approach that disentangles the identity and pose of objects for image generation.
1 code implementation • CVPR 2019 • Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz
In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos.
Ranked #8 on
Action Detection
on UCF101-24
48 code implementations • ICCV 2019 • Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Ranked #20 on
Real-time Instance Segmentation
on MSCOCO
no code implementations • ECCV 2018 • Fanyi Xiao, Yong Jae Lee
We introduce Spatial-Temporal Memory Networks for video object detection.
no code implementations • 25 May 2017 • Wenjian Hu, Krishna Kumar Singh, Fanyi Xiao, Jinyoung Han, Chen-Nee Chuah, Yong Jae Lee
Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest.
no code implementations • CVPR 2017 • Fanyi Xiao, Leonid Sigal, Yong Jae Lee
We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i. e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.
no code implementations • CVPR 2016 • Fanyi Xiao, Yong Jae Lee
We present an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals---spatio-temporal tubes that localize the foreground objects---in an unannotated video.
no code implementations • CVPR 2016 • Krishna Kumar Singh, Fanyi Xiao, Yong Jae Lee
The status quo approach to training object detectors requires expensive bounding box annotations.
no code implementations • ICCV 2015 • Fanyi Xiao, Yong Jae Lee
We present a weakly-supervised approach that discovers the spatial extent of relative attributes, given only pairs of ordered images.
no code implementations • CVPR 2014 • Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, Fanyi Xiao, Wenbo Liu, Jianzhuang Liu
We propose a very intuitive and simple approximation for the conventional spectral clustering methods.
no code implementations • 21 Oct 2013 • Fanyi Xiao, Ruikun Luo, Zhiding Yu
In this paper we propose a multi-task linear classifier learning problem called D-SVM (Dictionary SVM).