no code implementations • 4 Dec 2023 • Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin
PixelLM excels across various pixel-level image reasoning and understanding tasks, outperforming well-established methods in multiple benchmarks, including MUSE, single- and multi-referring segmentation.
no code implementations • 3 Oct 2023 • Xueqing Deng, Qi Fan, Xiaojie Jin, Linjie Yang, Peng Wang
Specifically, SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model.
1 code implementation • ICCV 2023 • Xiaozheng Zheng, Zhuo Su, Chao Wen, Zhou Xue, Xiaojie Jin
To bridge the physical and virtual worlds for rapidly developed VR/AR applications, the ability to realistically drive 3D full-body avatars is of great significance.
2 code implementations • 15 Jun 2023 • Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu
Due to the limited scale and quality of video-text training corpus, most vision-language foundation models employ image-text datasets for pretraining and primarily focus on modeling visually semantic representations while disregarding temporal semantic representations and correlations.
no code implementations • 24 May 2023 • Cheng-Ze Lu, Xiaojie Jin, Qibin Hou, Jun Hao Liew, Ming-Ming Cheng, Jiashi Feng
The study reveals that: 1) MIM can be viewed as an effective method to improve the model capacity when the scale of the training data is relatively small; 2) Strong reconstruction targets can endow the models with increased capacities on downstream tasks; 3) MIM pre-training is data-agnostic under most scenarios, which means that the strategy of sampling pre-training data is non-critical.
no code implementations • 22 May 2023 • Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng
Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
Ranked #1 on
TGIF-Frame
on TGIF-QA
(using extra training data)
no code implementations • 19 Jan 2023 • BoWen Zhang, Xiaojie Jin, Weibo Gong, Kai Xu, Zhao Zhang, Peng Wang, Xiaohui Shen, Jiashi Feng
State-of-the-art video-text retrieval (VTR) methods usually fully fine-tune the pre-trained model (e. g.
no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.
no code implementations • 15 Jan 2023 • Cheng-Ze Lu, Xiaojie Jin, Zhicheng Huang, Qibin Hou, Ming-Ming Cheng, Jiashi Feng
Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition.
1 code implementation • 16 Nov 2022 • Taojiannan Yang, Linjie Yang, Xiaojie Jin, Chen Chen
In this paper, we revisit these training-free metrics and find that: (1) the number of parameters (\#Param), which is the most straightforward training-free metric, is overlooked in previous works but is surprisingly effective, (2) recent training-free metrics largely rely on the \#Param information to rank networks.
no code implementations • 9 Nov 2022 • Zhao Zhang, Suiyi Zhao, Xiaojie Jin, Mingliang Xu, Yi Yang, Shuicheng Yan
In this paper, we present an embarrassingly simple yet effective solution to a seemingly impossible mission, low-light image enhancement (LLIE) without access to any task-related data.
no code implementations • 4 Nov 2022 • Bo wang, Zhao Zhang, Mingbo Zhao, Xiaojie Jin, Mingliang Xu, Meng Wang
To obtain rich features, we use the Swin Transformer to calculate multi-level features, and then feed them into a novel dynamic multi-sight embedding module to exploit both global structure and local texture of input images.
no code implementations • 12 Sep 2022 • Sen Pei, Shixiong Xu, Ye Yuan, Jiashi Feng, Xiaohui Shen, Xiaojie Jin
To the best of our knowledge, this is the first time the incremental learning settings are introduced to video highlights detection, which in turn relieves the burden of training video inputs and promotes the scalability of conventional neural networks in proportion to both the size of the dataset and the quantity of domains.
1 code implementation • 27 Jul 2022 • Yaojie Shen, Libo Zhang, Kai Xu, Xiaojie Jin
First we learn the embedding of video transitions through a video transition classification task.
1 code implementation • 27 Jul 2022 • Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng
The target encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.
no code implementations • 3 Nov 2021 • Yi-Wen Chen, Xiaojie Jin, Xiaohui Shen, Ming-Hsuan Yang
Video salient object detection aims to find the most visually distinctive objects in a video.
3 code implementations • NeurIPS 2021 • Bo Liu, Xingchao Liu, Xiaojie Jin, Peter Stone, Qiang Liu
The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks.
1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo
Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #169 on
Image Classification
on ImageNet
no code implementations • 27 Apr 2021 • Chaosheng Dong, Xiaojie Jin, Weihao Gao, Yijia Wang, Hongyi Zhang, Xiang Wu, Jianchao Yang, Xiaobing Liu
Deep learning models in large-scale machine learning systems are often continuously trained with enormous data from production environments.
6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng
In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).
Ranked #83 on
Semantic Segmentation
on ADE20K
1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.
5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng
In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.
Ranked #409 on
Image Classification
on ImageNet
1 code implementation • 10 Nov 2020 • Zongheng Tang, Yue Liao, Si Liu, Guanbin Li, Xiaojie Jin, Hongxu Jiang, Qian Yu, Dong Xu
HC-STVG is a video grounding task that requires both spatial (where) and temporal (when) localization.
2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille
However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.
Ranked #60 on
Neural Architecture Search
on ImageNet
no code implementations • 30 Dec 2019 • Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng
In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy.
1 code implementation • ICLR 2020 • Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang
We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms.
Ranked #61 on
Neural Architecture Search
on ImageNet
no code implementations • ICLR 2020 • Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng
The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs).
no code implementations • ICLR 2018 • Xiaojie Jin, Yingzhen Yang, Ning Xu, Jianchao Yang, Jiashi Feng, Shuicheng Yan
We present a new approach and a novel architecture, termed WSNet, for learning compact and efficient deep neural networks.
no code implementations • ICML 2018 • Xiaojie Jin, Yingzhen Yang, Ning Xu, Jianchao Yang, Nebojsa Jojic, Jiashi Feng, Shuicheng Yan
We present a new approach and a novel architecture, termed WSNet, for learning compact and efficient deep neural networks.
no code implementations • NeurIPS 2017 • Xiaojie Jin, Huaxin Xiao, Xiaohui Shen, Jimei Yang, Zhe Lin, Yunpeng Chen, Zequn Jie, Jiashi Feng, Shuicheng Yan
The ability of predicting the future is important for intelligent systems, e. g. autonomous vehicles and robots to plan early and make decisions accordingly.
18 code implementations • NeurIPS 2017 • Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng
In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally.
no code implementations • CVPR 2017 • Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, Wei Liu
To overcome this issue, we propose a deep self-taught learning approach, which makes the detector learn the object-level features reliable for acquiring tight positive samples and afterwards re-train itself based on them.
Weakly Supervised Object Detection
Weakly-Supervised Object Localization
no code implementations • NeurIPS 2016 • Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu, Shuicheng Yan
Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal.
no code implementations • 24 Jan 2017 • Yunpeng Chen, Xiaojie Jin, Jiashi Feng, Shuicheng Yan
Learning rich and diverse representations is critical for the performance of deep convolutional neural networks (CNNs).
no code implementations • ICCV 2017 • Xiaojie Jin, Xin Li, Huaxin Xiao, Xiaohui Shen, Zhe Lin, Jimei Yang, Yunpeng Chen, Jian Dong, Luoqi Liu, Zequn Jie, Jiashi Feng, Shuicheng Yan
In this way, the network can effectively learn to capture video dynamics and temporal context, which are critical clues for video scene parsing, without requiring extra manual annotations.
no code implementations • 27 Aug 2016 • Xiaojie Jin, Yunpeng Chen, Jiashi Feng, Zequn Jie, Shuicheng Yan
In this paper, we consider the scene parsing problem and propose a novel Multi-Path Feedback recurrent neural network (MPF-RNN) for parsing scene images.
no code implementations • 19 Jul 2016 • Xiaojie Jin, Xiao-Tong Yuan, Jiashi Feng, Shuicheng Yan
In this paper, we propose an iterative hard thresholding (IHT) approach to train Skinny Deep Neural Networks (SDNNs).
no code implementations • 19 Jul 2016 • Xiaojie Jin, Yunpeng Chen, Jian Dong, Jiashi Feng, Shuicheng Yan
In this paper, we propose a layer-wise discriminative learning method to enhance the discriminative capability of a deep network by allowing its layers to work collaboratively for classification.
1 code implementation • 22 Dec 2015 • Xiaojie Jin, Chunyan Xu, Jiashi Feng, Yunchao Wei, Junjun Xiong, Shuicheng Yan
Rectified linear activation units are important components for state-of-the-art deep convolutional networks.