1 code implementation • 14 Apr 2025 • Tao Zhang, Xiangtai Li, Zilong Huang, Yanwei Li, Weixian Lei, Xueqing Deng, Shihao Chen, Shunping Ji, Jiashi Feng
We present Pixel-SAIL, a single transformer for pixel-wise MLLM tasks.
no code implementations • 14 Apr 2025 • Weixian Lei, Jiacong Wang, Haochen Wang, Xiangtai Li, Jun Hao Liew, Jiashi Feng, Zilong Huang
This paper introduces SAIL, a single transformer unified multimodal large language model (MLLM) that integrates raw pixel encoding and language decoding within a singular architecture.
no code implementations • 11 Apr 2025 • Team Seawead, Ceyuan Yang, Zhijie Lin, Yang Zhao, Shanchuan Lin, Zhibei Ma, Haoyuan Guo, Hao Chen, Lu Qi, Sen Wang, Feng Cheng, Feilong Zuo Xuejiao Zeng, Ziyan Yang, Fangyuan Kong, Zhiwu Qing, Fei Xiao, Meng Wei, Tuyen Hoang, Siyu Zhang, Peihao Zhu, Qi Zhao, Jiangqiao Yan, Liangke Gui, Sheng Bi, Jiashi Li, Yuxi Ren, Rui Wang, Huixia Li, Xuefeng Xiao, Shu Liu, Feng Ling, Heng Zhang, Houmin Wei, Huafeng Kuang, Jerry Duncan, Junda Zhang, Junru Zheng, Li Sun, Manlin Zhang, Renfei Sun, Xiaobin Zhuang, Xiaojie Li, Xin Xia, Xuyan Chi, Yanghua Peng, Yuping Wang, Yuxuan Wang, Zhongkai Zhao, Zhuo Chen, Zuquan Song, Zhenheng Yang, Jiashi Feng, Jianchao Yang, Lu Jiang
This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model.
no code implementations • 11 Apr 2025 • Tianwei Xiong, Jun Hao Liew, Zilong Huang, Jiashi Feng, Xihui Liu
In autoregressive (AR) image generation, visual tokenizers compress images into compact discrete latent tokens, enabling efficient training of downstream autoregressive models for visual generation via next-token prediction.
1 code implementation • 1 Apr 2025 • Haobo Yuan, Tao Zhang, Xiangtai Li, Lu Qi, Zilong Huang, Shilin Xu, Jiashi Feng, Ming-Hsuan Yang
Referring video object segmentation (RVOS) is a challenging task that requires the model to segment the object in a video given the language description.
no code implementations • 20 Mar 2025 • Yuqing Wang, Zhijie Lin, Yao Teng, Yuanzhi Zhu, Shuhuai Ren, Jiashi Feng, Xihui Liu
Autoregressive visual generation models typically rely on tokenizers to compress images into tokens that can be predicted sequentially.
no code implementations • 17 Feb 2025 • Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, YiWen Chen, Zhongcong Xu, Jun Hao Liew, Xiaoyang Guo, Fayao Liu, Jiashi Feng, Guosheng Lin
With the explosive growth of 3D content creation, there is an increasing demand for automatically converting static 3D models into articulation-ready versions that support realistic animation.
1 code implementation • 21 Jan 2025 • Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang
The model is trained on a joint dataset of video depth and unlabeled images, similar to Depth Anything V2.
no code implementations • 16 Jan 2025 • Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin
This work explores whether a deep generative model can learn complex knowledge solely from visual input, in contrast to the prevalent focus on text-based models like large language models (LLMs).
1 code implementation • 7 Jan 2025 • Haobo Yuan, Xiangtai Li, Tao Zhang, Zilong Huang, Shilin Xu, Shunping Ji, Yunhai Tong, Lu Qi, Jiashi Feng, Ming-Hsuan Yang
This work presents Sa2VA, the first unified model for dense grounded understanding of both images and videos.
Ranked #1 on
Referring Video Object Segmentation
on MeViS
1 code implementation • arXiv 2024 • Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, Ping Tan
However, the widely adopted uniform point sampling strategy in Shape VAE training often leads to a significant loss of geometric details, limiting the quality of shape reconstruction and downstream generation tasks.
no code implementations • 19 Dec 2024 • Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu
Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process.
1 code implementation • 18 Dec 2024 • Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks.
1 code implementation • 7 Nov 2024 • Luting Wang, Yang Zhao, Zijian Zhang, Jiashi Feng, Si Liu, Bingyi Kang
Currently, pixel reconstruction (e. g., VQGAN) dominates the training objective for image tokenizers.
1 code implementation • 5 Nov 2024 • Zilong Huang, Qinghao Ye, Bingyi Kang, Jiashi Feng, Haoqi Fan
Due to the absence of the text encoding as contrastive target, SuperClass does not require a text encoder and does not need to maintain a large batch size as CLIP does.
no code implementations • 4 Nov 2024 • Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, Jiashi Feng
Our scaling experiments show perfect generalization within the distribution, measurable scaling behavior for combinatorial generalization, but failure in out-of-distribution scenarios.
1 code implementation • 4 Nov 2024 • Yang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang, Shiji Song, Jiashi Feng, Gao Huang
MLLMs have demonstrated remarkable comprehension and reasoning capabilities with complex language and visual data.
1 code implementation • 14 Oct 2024 • Tianwei Xiong, Yuqing Wang, Daquan Zhou, Zhijie Lin, Jiashi Feng, Xihui Liu
The efficacy of video generation models heavily depends on the quality of their training datasets.
no code implementations • 3 Oct 2024 • Yuqing Wang, Tianwei Xiong, Daquan Zhou, Zhijie Lin, Yang Zhao, Bingyi Kang, Jiashi Feng, Xihui Liu
Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of natural language processing, while the exploration of autoregressive LLMs for video generation is limited to generating short videos of several seconds.
no code implementations • 29 Sep 2024 • Zhongcong Xu, Chaoyue Song, Guoxian Song, Jianfeng Zhang, Jun Hao Liew, Hongyi Xu, You Xie, Linjie Luo, Guosheng Lin, Jiashi Feng, Mike Zheng Shou
Although generating reasonable results, existing methods often overlook the need for regional supervision in crucial areas such as the face and hands, and neglect the explicit modeling for motion blur, leading to unrealistic low-quality synthesis.
no code implementations • 30 Jun 2024 • Yiqin Wang, Haoji Zhang, Yansong Tang, Yong liu, Jiashi Feng, Jifeng Dai, Xiaojie Jin
This paper describes our champion solution to the LOVEU Challenge @ CVPR'24, Track 1 (Long Video VQA).
2 code implementations • 13 Jun 2024 • Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
This work presents Depth Anything V2.
1 code implementation • 12 Jun 2024 • Haoji Zhang, Yiqin Wang, Yansong Tang, Yong liu, Jiashi Feng, Jifeng Dai, Xiaojie Jin
Our model is able to process extremely long video streams in real-time and respond to user queries simultaneously.
Ranked #1 on
Zero-Shot Video Question Answer
on MSRVTT-QA
1 code implementation • 28 May 2024 • Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
In this paper, we aim to incorporate the sub-quadratic modeling capability of Gated Linear Attention (GLA) into the 2D diffusion backbone.
1 code implementation • 22 May 2024 • Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng
Accuracy and speed are critical in image editing tasks.
1 code implementation • 13 May 2024 • Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng
We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models.
1 code implementation • 2 May 2024 • Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou
This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation.
1 code implementation • arXiv 2024 • Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng
PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.
1 code implementation • 9 Apr 2024 • Fan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu Zhang, Huichao Zhang, Xiaofeng Yang, Xiu Li, Jiashi Feng, Guosheng Lin
In detail, we first propose a novel multi-view conditioned diffusion model which extracts 3d prior from the synthesized multi-view images to synthesize high-fidelity novel view images and then introduce a novel iterative-update strategy to adopt it to provide precise guidance to refine the coarse generated results through a fast optimization process.
1 code implementation • 14 Feb 2024 • Ze Ma, Daquan Zhou, Chun-Hsiao Yeh, Xue-She Wang, Xiuyu Li, Huanrui Yang, Zhen Dong, Kurt Keutzer, Jiashi Feng
To achieve this, we propose three novel components that are essential for high-quality identity preservation and stable video generation: 1) a noise initialization method with 3D Gaussian Noise Prior for better inter-frame stability; 2) an ID module based on extended Textual Inversion trained with the cropped identity to disentangle the ID information from the background 3) Face VCD and Tiled VCD modules to reinforce faces and upscale the video to higher resolution while preserving the identity's features.
5 code implementations • CVPR 2024 • Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.
Ranked #4 on
Monocular Depth Estimation
on ETH3D
no code implementations • 9 Jan 2024 • Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.
no code implementations • CVPR 2024 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation especially when the relative distance is longer between visual and text tokens.
1 code implementation • 22 Dec 2023 • Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang
Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Ranked #2 on
Semantic Segmentation
on Cityscapes test
1 code implementation • CVPR 2024 • Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang
While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.
no code implementations • 21 Dec 2023 • Chenxu Zhang, Chao Wang, Jianfeng Zhang, Hongyi Xu, Guoxian Song, You Xie, Linjie Luo, Yapeng Tian, Xiaohu Guo, Jiashi Feng
The generation of emotional talking faces from a single portrait image remains a significant challenge.
1 code implementation • 19 Dec 2023 • Jiachun Pan, Hanshu Yan, Jun Hao Liew, Jiashi Feng, Vincent Y. F. Tan
However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models.
no code implementations • 12 Dec 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.
Ranked #2 on
Question Answering
on NExT-QA (Open-ended VideoQA)
1 code implementation • CVPR 2024 • Zhongwei Ren, Zhicheng Huang, Yunchao Wei, Yao Zhao, Dongmei Fu, Jiashi Feng, Xiaojie Jin
PixelLM excels across various pixel-level image reasoning and understanding tasks, outperforming well-established methods in multiple benchmarks, including MUSE, single- and multi-referring segmentation.
no code implementations • 29 Nov 2023 • Jianfeng Zhang, Xuanmeng Zhang, Huichao Zhang, Jun Hao Liew, Chenxu Zhang, Yi Yang, Jiashi Feng
We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions.
3 code implementations • CVPR 2024 • Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
1 code implementation • 14 Nov 2023 • Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng
Large Language Models (LLMs) have significantly advanced natural language processing, demonstrating exceptional reasoning, tool usage, and memory capabilities.
no code implementations • 12 Nov 2023 • Chenyu Wang, Zhen Dong, Daquan Zhou, Zhenhua Zhu, Yu Wang, Jiashi Feng, Kurt Keutzer
On the hardware side, we modify the datapath of current PIM accelerators to accommodate epitomes and implement a feature map reuse technique to reduce computation cost.
no code implementations • 12 Nov 2023 • Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou
For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.
1 code implementation • 8 Oct 2023 • Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen
Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction.
no code implementations • ICCV 2023 • Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng
We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries.
no code implementations • 8 Sep 2023 • Yupeng Zhou, Daquan Zhou, Zuo-Liang Zhu, Yaxing Wang, Qibin Hou, Jiashi Feng
In this work, we identify that a crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning between the prompt and the output image.
no code implementations • 2 Sep 2023 • Hanshu Yan, Jun Hao Liew, Long Mai, Shanchuan Lin, Jiashi Feng
The flexibility of these techniques enables the editing of arbitrary regions within the frame.
no code implementations • 28 Aug 2023 • Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng
In this report, we present MagicEdit, a surprisingly simple yet effective solution to the text-guided video editing task.
no code implementations • 28 Aug 2023 • Jianfeng Zhang, Hanshu Yan, Zhongcong Xu, Jiashi Feng, Jun Hao Liew
This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars.
1 code implementation • ICCV 2023 • Daquan Zhou, Kai Wang, Jianyang Gu, Xiangyu Peng, Dongze Lian, Yifan Zhang, Yang You, Jiashi Feng
Extensive experiments demonstrate that DQ is able to generate condensed small datasets for training unseen network architectures with state-of-the-art compression ratios for lossless model training.
1 code implementation • 20 Jul 2023 • Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts.
1 code implementation • 17 Jul 2023 • Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang
Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.
1 code implementation • 15 Jun 2023 • Sihan Chen, Xingjian He, Handong Li, Xiaojie Jin, Jiashi Feng, Jing Liu
Due to the limited scale and quality of video-text training corpus, most vision-language foundation models employ image-text datasets for pretraining and primarily focus on modeling visually semantic representations while disregarding temporal semantic representations and correlations.
Ranked #1 on
TGIF-Frame
on TGIF-QA
(using extra training data)
no code implementations • 24 May 2023 • Cheng-Ze Lu, Xiaojie Jin, Qibin Hou, Jun Hao Liew, Ming-Ming Cheng, Jiashi Feng
The study reveals that: 1) MIM can be viewed as an effective method to improve the model capacity when the scale of the training data is relatively small; 2) Strong reconstruction targets can endow the models with increased capacities on downstream tasks; 3) MIM pre-training is data-agnostic under most scenarios, which means that the strategy of sampling pre-training data is non-critical.
no code implementations • 22 May 2023 • Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng
Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.
Ranked #1 on
Visual Question Answering (VQA)
on MSVD-QA
(using extra training data)
no code implementations • 3 Apr 2023 • Yabo Zhang, ZiHao Wang, Jun Hao Liew, Jingjia Huang, Manyu Zhu, Jiashi Feng, WangMeng Zuo
In this work, we investigate performing semantic segmentation solely through the training on image-sentence pairs.
no code implementations • 1 Apr 2023 • Shuning Chang, Pichao Wang, Fan Wang, Jiashi Feng, Mike Zheng Show
Specifically, one branch focuses on detection representation for actor detection, and the other one for action recognition.
no code implementations • CVPR 2023 • Hongyi Xu, Guoxian Song, Zihang Jiang, Jianfeng Zhang, Yichun Shi, Jing Liu, WanChun Ma, Jiashi Feng, Linjie Luo
We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses.
no code implementations • 24 Mar 2023 • Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo
Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings.
1 code implementation • CVPR 2023 • Jiacheng Wei, Hao Wang, Jiashi Feng, Guosheng Lin, Kim-Hui Yap
We conduct extensive experiments to analyze each of our proposed components and show the efficacy of our framework in generating high-fidelity 3D textured and text-relevant shapes.
1 code implementation • ICCV 2023 • Kunyang Han, Yong liu, Jun Hao Liew, Henghui Ding, Yunchao Wei, Jiajun Liu, Yitong Wang, Yansong Tang, Yujiu Yang, Jiashi Feng, Yao Zhao
Recent advancements in pre-trained vision-language models, such as CLIP, have enabled the segmentation of arbitrary concepts solely from textual inputs, a process commonly referred to as open-vocabulary semantic segmentation (OVS).
Knowledge Distillation
Open Vocabulary Semantic Segmentation
+4
1 code implementation • CVPR 2023 • Ruyang Liu, Jingjia Huang, Ge Li, Jiashi Feng, Xinglong Wu, Thomas H. Li
In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.
Ranked #7 on
Video Retrieval
on MSR-VTT-1kA
(using extra training data)
1 code implementation • CVPR 2024 • Xiaojie Jin, BoWen Zhang, Weibo Gong, Kai Xu, Xueqing Deng, Peng Wang, Zhao Zhang, Xiaohui Shen, Jiashi Feng
The first is a Temporal Adaptation Module that is incorporated in the video branch to introduce global and local temporal contexts.
no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.
no code implementations • 15 Jan 2023 • Cheng-Ze Lu, Xiaojie Jin, Zhicheng Huang, Qibin Hou, Ming-Ming Cheng, Jiashi Feng
Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition.
1 code implementation • 21 Dec 2022 • Jingjia Huang, Yuanqi Chen, Jiashi Feng, Xinglong Wu
Semi-supervised learning based methods are current SOTA solutions to the noisy-label learning problem, which rely on learning an unsupervised label cleaner first to divide the training samples into a labeled set for clean data and an unlabeled set for noise data.
Ranked #3 on
Image Classification
on Clothing1M
(using extra training data)
no code implementations • 13 Dec 2022 • Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou
While some prior works have applied such image GANs to unconditional 2D portrait video generation and static 3D portrait synthesis, there are few works successfully extending GANs for generating 3D-aware portrait videos.
no code implementations • CVPR 2023 • Xingyi Yang, Daquan Zhou, Jiashi Feng, Xinchao Wang
Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms.
1 code implementation • 26 Nov 2022 • Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng
Specifically, we decompose the generative 3D human synthesis into pose-guided mapping and canonical representation with predefined human pose and shape, such that the canonical representation can be explicitly driven to different poses and shapes with the guidance of a 3D parametric human model SMPL.
1 code implementation • NeurIPS 2023 • Yifan Zhang, Daquan Zhou, Bryan Hooi, Kai Wang, Jiashi Feng
Specifically, GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model, resulting in the creation of photo-realistic images with new content.
2 code implementations • 22 Nov 2022 • Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng, Jiashi Feng
This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features.
no code implementations • 20 Nov 2022 • Daquan Zhou, Weimin WANG, Hanshu Yan, Weiwei Lv, Yizhe Zhu, Jiashi Feng
In specific, unlike existing works that directly train video models in the RGB space, we use a pre-trained VAE to map video clips into a low-dimensional latent space and learn the distribution of videos' latent codes via a diffusion model.
Ranked #10 on
Text-to-Video Generation
on UCF-101
2 code implementations • 28 Oct 2022 • Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
Unlike style transfer, where an image is stylized according to the reference style without changing the image content, semantic blending mixes two different concepts in a semantic manner to synthesize a novel concept while preserving the spatial layout and geometry.
no code implementations • 24 Oct 2022 • Kaixin Wang, Kuangqi Zhou, Jiashi Feng, Bryan Hooi, Xinchao Wang
In Reinforcement Learning (RL), Laplacian Representation (LapRep) is a task-agnostic state representation that encodes the geometry of the environment.
8 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang
By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
Ranked #2 on
Domain Generalization
on ImageNet-C
(using extra training data)
1 code implementation • 17 Oct 2022 • Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang
With the proposed SSF, our model obtains 2. 46% (90. 72% vs. 88. 54%) and 11. 48% (73. 10% vs. 65. 57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0. 3M parameters.
1 code implementation • 2 Oct 2022 • Hao Wang, Guosheng Lin, Ana García del Molino, Anran Wang, Jiashi Feng, Zhiqi Shen
In this paper we present a novel multi-attribute face manipulation method based on textual descriptions.
1 code implementation • 1 Aug 2022 • Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng
Unsupervised generation of clothed virtual humans with various appearance and animatable poses is important for creating 3D human avatars and other AR/VR applications.
1 code implementation • 27 Jul 2022 • Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng
The momentum encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.
1 code implementation • CVPR 2023 • Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji
We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.
Ranked #1 on
Video Question Answering
on LSMDC-FiB
1 code implementation • 28 May 2022 • Jianfei Yang, Xiangyu Peng, Kai Wang, Zheng Zhu, Jiashi Feng, Lihua Xie, Yang You
Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain.
1 code implementation • 27 May 2022 • Jiawei Du, Daquan Zhou, Jiashi Feng, Vincent Y. F. Tan, Joey Tianyi Zhou
Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights.
no code implementations • 23 May 2022 • Kuangqi Zhou, Kaixin Wang, Jiashi Feng, Jian Tang, Tingyang Xu, Xinchao Wang
However, existing best deep AL methods are mostly developed for a single type of learning task (e. g., single-label classification), and hence may not perform well in molecular property prediction that involves various task types.
2 code implementations • 26 Apr 2022 • Daquan Zhou, Zhiding Yu, Enze Xie, Chaowei Xiao, Anima Anandkumar, Jiashi Feng, Jose M. Alvarez
Our study is motivated by the intriguing properties of the emerging visual grouping in Vision Transformers, which indicates that self-attention may promote robustness through improved mid-level representations.
Ranked #4 on
Domain Generalization
on ImageNet-R
(using extra training data)
1 code implementation • CVPR 2022 • Kehong Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, Jing Huang, Michael Bi Mi, Jiashi Feng, Xinchao Wang
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses.
Ranked #42 on
3D Human Pose Estimation
on MPI-INF-3DHP
1 code implementation • ICLR 2022 • Shoukang Hu, Ruochen Wang, Lanqing Hong, Zhenguo Li, Cho-Jui Hsieh, Jiashi Feng
Efficient performance estimation of architectures drawn from large search spaces is essential to Neural Architecture Search.
1 code implementation • 15 Feb 2022 • Tao Wang, Jun Hao Liew, Yu Li, Yunpeng Chen, Jiashi Feng
Unlike the original per grid cell object masks, SODAR is implicitly supervised to learn mask representations that encode geometric structure of nearby objects and complement adjacent representations with context.
no code implementations • 30 Jan 2022 • Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor
The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces.
no code implementations • 12 Jan 2022 • Hanshu Yan, Jingfeng Zhang, Jiashi Feng, Masashi Sugiyama, Vincent Y. F. Tan
Secondly, to robustify DIDs, we propose an adversarial training strategy, hybrid adversarial training ({\sc HAT}), that jointly trains DIDs with adversarial and non-adversarial noisy data to ensure that the reconstruction quality is high and the denoisers around non-adversarial data are locally smooth.
no code implementations • 16 Dec 2021 • Jian Liang, Dapeng Hu, Jiashi Feng, Ran He
To achieve bilateral adaptation in the target domain, we further maximize localized mutual information to align known samples with the source classifier and employ an entropic loss to push unknown samples far away from the source classification boundary, respectively.
Ranked #7 on
Universal Domain Adaptation
on VisDA2017
1 code implementation • CVPR 2022 • Yujun Shi, Kuangqi Zhou, Jian Liang, Zihang Jiang, Jiashi Feng, Philip Torr, Song Bai, Vincent Y. F. Tan
Specifically, we experimentally show that directly encouraging CIL Learner at the initial phase to output similar representations as the model jointly trained on all classes can greatly boost the CIL performance.
no code implementations • 8 Dec 2021 • Mingfei Chen, Jianfeng Zhang, Xiangyu Xu, Lijuan Liu, Yujun Cai, Jiashi Feng, Shuicheng Yan
Meanwhile, for achieving higher rendering efficiency, we introduce a progressive rendering pipeline through geometry guidance, which leverages the geometric feature volume and the predicted density values to progressively reduce the number of sampling points and speed up the rendering process.
1 code implementation • NeurIPS 2021 • Pan Zhou, Hanshu Yan, Xiaotong Yuan, Jiashi Feng, Shuicheng Yan
Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-{\L}ojasiewicz condition which has been observed/proved in neural networks.
1 code implementation • CVPR 2022 • Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost.
18 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan
Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.
Ranked #9 on
Semantic Segmentation
on DensePASS
2 code implementations • NeurIPS 2021 • Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng
Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks.
Ranked #3 on
3D Multi-Person Pose Estimation
on Panoptic
(using extra training data)
1 code implementation • 9 Oct 2021 • Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, Jiashi Feng
Deep long-tailed learning, one of the most challenging problems in visual recognition, aims to train well-performing deep models from a large number of images that follow a long-tailed class distribution.
1 code implementation • ICLR 2022 • Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Y. F. Tan
Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization.
no code implementations • NeurIPS Workshop ImageNet_PPF 2021 • Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng
Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.
1 code implementation • ICCV 2021 • Tao Wang, Li Yuan, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result.
no code implementations • 13 Sep 2021 • Kaichen Zhou, Lanqing Hong, Shoukang Hu, Fengwei Zhou, Binxin Ru, Jiashi Feng, Zhenguo Li
In view of these, we propose DHA, which achieves joint optimization of Data augmentation policy, Hyper-parameter and Architecture.
1 code implementation • ICCV 2021 • Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu
We present Voxel Transformer (VoTr), a novel and effective voxel-based Transformer backbone for 3D object detection from point clouds.
Ranked #3 on
3D Object Detection
on waymo vehicle
(L1 mAP metric)
no code implementations • 8 Aug 2021 • Tian Yu Liu, Jiashi Feng
Brain tumor is a common and fatal form of cancer which affects both adults and children.
2 code implementations • 20 Jul 2021 • Yifan Zhang, Bryan Hooi, Lanqing Hong, Jiashi Feng
Existing long-tailed recognition methods, aiming to train class-balanced models from long-tailed data, generally assume the models would be evaluated on the uniform test class distribution.
Ranked #9 on
Long-tail Learning
on iNaturalist 2018
1 code implementation • 12 Jul 2021 • Kaixin Wang, Kuangqi Zhou, Qixin Zhang, Jie Shao, Bryan Hooi, Jiashi Feng
It enables learning high-quality Laplacian representations that faithfully approximate the ground truth.
1 code implementation • 5 Jul 2021 • Meng-Jiun Chiou, Henghui Ding, Hanshu Yan, Changhu Wang, Roger Zimmermann, Jiashi Feng
Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects.
Ranked #3 on
Unbiased Scene Graph Generation
on Visual Genome
7 code implementations • 24 Jun 2021 • Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan
Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.
Ranked #1 on
Image Classification
on VizWiz-Classification
3 code implementations • 23 Jun 2021 • Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng
By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.
1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng
In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.
2 code implementations • NeurIPS 2021 • Mi Luo, Fei Chen, Dapeng Hu, Yifan Zhang, Jian Liang, Jiashi Feng
Motivated by the above findings, we propose a novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #176 on
Image Classification
on ImageNet
no code implementations • 31 May 2021 • Xiaoguang Tu, Yingtian Zou, Jian Zhao, Wenjie Ai, Jian Dong, Yuan YAO, Zhikang Wang, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng
Video generation from a single face image is an interesting problem and usually tackled by utilizing Generative Adversarial Networks (GANs) to integrate information from the input face image and a sequence of sparse facial landmarks.
1 code implementation • 26 May 2021 • Si Liu, Wentao Jiang, Chen Gao, Ran He, Jiashi Feng, Bo Li, Shuicheng Yan
In this paper, we address the makeup transfer and removal tasks simultaneously, which aim to transfer the makeup from a reference image to a source image and remove the makeup from the with-makeup image respectively.
1 code implementation • 25 May 2021 • Meng-Jiun Chiou, Chun-Yu Liao, Li-Wei Wang, Roger Zimmermann, Jiashi Feng
Detecting human-object interactions (HOI) is an important step toward a comprehensive visual understanding of machines.
Ranked #3 on
Human-Object Interaction Anticipation
on VidHOI
no code implementations • 12 May 2021 • Xiaoguang Tu, Jian Zhao, Qiankun Liu, Wenjie Ai, Guodong Guo, Zhifeng Li, Wei Liu, Jiashi Feng
First, MDFR is a well-designed encoder-decoder architecture which extracts feature representation from an input face image with arbitrary low-quality factors and restores it to a high-quality counterpart.
1 code implementation • CVPR 2021 • Yujun Shi, Li Yuan, Yunpeng Chen, Jiashi Feng
Continual learning tackles the setting of learning different tasks sequentially.
1 code implementation • CVPR 2021 • Kehong Gong, Jianfeng Zhang, Jiashi Feng
To address this problem, we present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity and thus improve generalization of the trained 2D-to-3D pose estimator.
Ranked #1 on
Monocular 3D Human Pose Estimation
on Human3.6M
(Use Video Sequence metric)
1 code implementation • CVPR 2021 • Jianfeng Zhang, Dongdong Yu, Jun Hao Liew, Xuecheng Nie, Jiashi Feng
In this work, we present a single-stage model, Body Meshes as Points (BMP), to simplify the pipeline and lift both efficiency and performance.
Ranked #9 on
3D Multi-Person Pose Estimation
on MuPoTS-3D
3D Human Shape Estimation
3D Multi-Person Pose Estimation
+1
no code implementations • ICLR 2022 • Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng
Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.
7 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng
In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).
Ranked #3 on
Efficient ViTs
on ImageNet-1K (With LV-ViT-S)
3 code implementations • CVPR 2022 • Jian Liang, Dapeng Hu, Jiashi Feng, Ran He
To ease the burden of labeling, unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target).
no code implementations • 30 Mar 2021 • Shuning Chang, Pichao Wang, Fan Wang, Hao Li, Jiashi Feng
Temporal action proposal generation (TAPG) is a fundamental and challenging task in video understanding, especially in temporal action detection.
1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.
5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng
In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.
Ranked #464 on
Image Classification
on ImageNet
2 code implementations • CVPR 2021 • Qibin Hou, Daquan Zhou, Jiashi Feng
Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e. g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps.
1 code implementation • NeurIPS 2021 • Yifan Zhang, Bryan Hooi, Dapeng Hu, Jian Liang, Jiashi Feng
In this paper, we investigate whether applying contrastive learning to fine-tuning would bring further benefits, and analytically find that optimizing the contrastive loss benefits both discriminative representation learning and model optimization during fine-tuning.
2 code implementations • 10 Feb 2021 • Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent Y. F. Tan, Masashi Sugiyama
By comparing \textit{non-robust} (normally trained) and \textit{robustified} (adversarially trained) models, we observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts.
13 code implementations • ICCV 2021 • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan
To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.
Ranked #437 on
Image Classification
on ImageNet
no code implementations • 11 Jan 2021 • Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan
Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images.
no code implementations • 1 Jan 2021 • Bingyi Kang, Shie Mannor, Jiashi Feng
Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.
no code implementations • 1 Jan 2021 • Kaichen Zhou, Lanqing Hong, Fengwei Zhou, Binxin Ru, Zhenguo Li, Trigoni Niki, Jiashi Feng
Our method performs co-optimization of the neural architectures, training hyper-parameters and data augmentation policies in an end-to-end fashion without the need of model retraining.
1 code implementation • 1 Jan 2021 • Tao Wang, Jun Hao Liew, Yu Li, Yunpeng Chen, Jiashi Feng
Recently proposed one-stage instance segmentation models (\emph{e. g.}, SOLO) learn to directly predict location-specific object mask with fully-convolutional networks.
no code implementations • ICLR 2021 • Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, Jiashi Feng
Motivated by this question, we conduct a series of studies on the performance of self-supervised contrastive learning and supervised learning methods over multiple datasets where training instance distributions vary from a balanced one to a long-tailed one.
Ranked #40 on
Long-tail Learning
on CIFAR-10-LT (ρ=10)
2 code implementations • 14 Dec 2020 • Jian Liang, Dapeng Hu, Yunbo Wang, Ran He, Jiashi Feng
Furthermore, we propose a new labeling transfer strategy, which separates the target data into two splits based on the confidence of predictions (labeling information), and then employ semi-supervised learning to improve the accuracy of less-confident predictions in the target domain.
Ranked #5 on
Source-Free Domain Adaptation
on VisDA-2017
1 code implementation • NeurIPS 2020 • Guilin Li, Junlei Zhang, Yunhe Wang, Chuanjian Liu, Matthias Tan, Yunfeng Lin, Wei zhang, Jiashi Feng, Tong Zhang
In particular, we propose a novel joint-training framework to train plain CNN by leveraging the gradients of the ResNet counterpart.
no code implementations • 11 Nov 2020 • Li Yuan, Will Xiao, Giorgia Dellaferrera, Gabriel Kreiman, Francis E. H. Tay, Jiashi Feng, Margaret S. Livingstone
Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.
2 code implementations • NeurIPS 2020 • Kaixin Wang, Bingyi Kang, Jie Shao, Jiashi Feng
Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments.
no code implementations • 16 Oct 2020 • Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yunpeng Chen, Xuecheng Nie, Tao Wang, Jiashi Feng, Shuicheng Yan
Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes.
no code implementations • 16 Oct 2020 • Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.
no code implementations • 16 Oct 2020 • Li Yuan, Shuning Chang, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Xuecheng Nie, Francis E. H. Tay, Jiashi Feng, Shuicheng Yan
This paper presents our solution to ACM MM challenge: Large-scale Human-centric Video Analysis in Complex Events\cite{lin2020human}; specifically, here we focus on Track3: Crowd Pose Tracking in Complex Events.
no code implementations • NeurIPS 2020 • Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E
The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.
1 code implementation • 10 Sep 2020 • Meng-Jiun Chiou, Roger Zimmermann, Jiashi Feng
Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years.
no code implementations • 23 Aug 2020 • Pengfei Ge, Chuan-Xian Ren, Jiashi Feng, Shuicheng Yan
By performing variational inference on the objective function of Dual-AAE, we derive a new reconstruction loss which can be optimized by training a pair of Auto-encoders.
8 code implementations • NeurIPS 2020 • Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
1 code implementation • 6 Aug 2020 • Zi-Hang Jiang, Bingyi Kang, Kuangqi Zhou, Jiashi Feng
To be specific, we devise a simple and efficient meta-reweighting strategy to adapt the sample representations and generate soft attention to refine the representation such that the relevant features from the query and support samples can be extracted for a better few-shot classification.
1 code implementation • ECCV 2020 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng
Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals.
no code implementations • ECCV 2020 • Chenyang Si, Xuecheng Nie, Wei Wang, Liang Wang, Tieniu Tan, Jiashi Feng
Self-supervised learning (SSL) has been proved very effective at learning representations from unlabeled data in the image domain.
2 code implementations • CVPR 2021 • Jian Liang, Dapeng Hu, Jiashi Feng
ATDOC alleviates the classifier bias by introducing an auxiliary classifier for target data only, to improve the quality of pseudo labels.
4 code implementations • ECCV 2020 • Zhou Daquan, Qibin Hou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
In this paper, we rethink the necessity of such design changes and find it may bring risks of information loss and gradient confusion.
no code implementations • 4 Jul 2020 • Jianan Li, Jiashi Feng
The performance of 3D object detection models over point clouds highly depends on their capability of modeling local geometric patterns.
no code implementations • NeurIPS 2020 • Jianfeng Zhang, Xuecheng Nie, Jiashi Feng
In this work, we propose a novel framework, Inference Stage Optimization (ISO), for improving the generalizability of 3D pose models when source and target data come from different pose distributions.
Ranked #131 on
3D Human Pose Estimation
on 3DPW
(PA-MPJPE metric)
2 code implementations • CVPR 2020 • Yu Li, Tao Wang, Bingyi Kang, Sheng Tang, Chunfeng Wang, Jintao Li, Jiashi Feng
Solving long-tail large vocabulary object detection with deep learning based models is a challenging and demanding task, which is however under-explored. In this work, we provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution.
no code implementations • 14 Jun 2020 • Kuangqi Zhou, Qibin Hou, Zun Li, Jiashi Feng
In this paper, we propose a novel multi-miner framework to perform a region mining process that adapts to diverse object sizes and is thus able to mine more integral and finer object regions.
2 code implementations • 12 Jun 2020 • Kuangqi Zhou, Yanfei Dong, Kaixin Wang, Wee Sun Lee, Bryan Hooi, Huan Xu, Jiashi Feng
In this work, we study performance degradation of GCNs by experimentally examining how stacking only TRANs or PROPs works.
no code implementations • CVPR 2020 • Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, Li-Wei Wang
Few-shot learning (FSL) has attracted increasing attention in recent years but remains challenging, due to the intrinsic difficulty in learning to generalize from a few examples.
Ranked #1 on
Few-Shot Image Classification
on ImageNet (1-shot)
1 code implementation • 24 Apr 2020 • Jiawei Du, Hanshu Yan, Vincent Y. F. Tan, Joey Tianyi Zhou, Rick Siow Mong Goh, Jiashi Feng
However, similar to existing preprocessing-based methods, the randomized process will degrade the prediction accuracy.
2 code implementations • CVPR 2020 • Qibin Hou, Li Zhang, Ming-Ming Cheng, Jiashi Feng
Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing.
Ranked #32 on
Semantic Segmentation
on Cityscapes test
no code implementations • 30 Mar 2020 • Dapeng Hu, Jian Liang, Qibin Hou, Hanshu Yan, Yunpeng Chen, Shuicheng Yan, Jiashi Feng
To successfully align the multi-modal data structures across domains, the following works exploit discriminative information in the adversarial training process, e. g., using multiple class-wise discriminators and introducing conditional information in input or output of the domain discriminator.
1 code implementation • ECCV 2020 • Jian Liang, Yunbo Wang, Dapeng Hu, Ran He, Jiashi Feng
On one hand, negative transfer results in misclassification of target samples to the classes only present in the source domain.
Ranked #2 on
Partial Domain Adaptation
on ImageNet-Caltech
no code implementations • 25 Feb 2020 • Zun Li, Congyan Lang, Junhao Liew, Qibin Hou, Yidong Li, Jiashi Feng
Feature pyramid network (FPN) based models, which fuse the semantics and salient details in a progressive manner, have been proven highly effective in salient object detection.
3 code implementations • ICML 2020 • Jian Liang, Dapeng Hu, Jiashi Feng
Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Ranked #2 on
Domain Adaptation
on SVNH-to-MNIST
1 code implementation • ICLR 2020 • Weihao Yu, Zi-Hang Jiang, Yanfei Dong, Jiashi Feng
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
Ranked #1 on
Logical Reasoning Question Answering
on ReClor
4 code implementations • 9 Feb 2020 • Razvan V. Marinescu, Neil P. Oxtoby, Alexandra L. Young, Esther E. Bron, Arthur W. Toga, Michael W. Weiner, Frederik Barkhof, Nick C. Fox, Arman Eshaghi, Tina Toni, Marcin Salaterski, Veronika Lunina, Manon Ansart, Stanley Durrleman, Pascal Lu, Samuel Iddi, Dan Li, Wesley K. Thompson, Michael C. Donohue, Aviv Nahon, Yarden Levy, Dan Halbersberg, Mariya Cohen, Huiling Liao, Tengfei Li, Kaixian Yu, Hongtu Zhu, Jose G. Tamez-Pena, Aya Ismail, Timothy Wood, Hector Corrada Bravo, Minh Nguyen, Nanbo Sun, Jiashi Feng, B. T. Thomas Yeo, Gang Chen, Ke Qi, Shiyang Chen, Deqiang Qiu, Ionut Buciuman, Alex Kelner, Raluca Pop, Denisa Rimocea, Mostafa M. Ghazi, Mads Nielsen, Sebastien Ourselin, Lauge Sorensen, Vikram Venkatraghavan, Keli Liu, Christina Rabe, Paul Manser, Steven M. Hill, James Howlett, Zhiyue Huang, Steven Kiddle, Sach Mukherjee, Anais Rouanet, Bernd Taschler, Brian D. M. Tom, Simon R. White, Noel Faux, Suman Sedai, Javier de Velasco Oriol, Edgar E. V. Clemente, Karol Estrada, Leon Aksman, Andre Altmann, Cynthia M. Stonnington, Yalin Wang, Jianfeng Wu, Vivek Devadas, Clementine Fourrier, Lars Lau Raket, Aristeidis Sotiras, Guray Erus, Jimit Doshi, Christos Davatzikos, Jacob Vogel, Andrew Doyle, Angela Tam, Alex Diaz-Papkovich, Emmanuel Jammeh, Igor Koval, Paul Moore, Terry J. Lyons, John Gallacher, Jussi Tohka, Robert Ciszek, Bruno Jedynak, Kruti Pandya, Murat Bilgel, William Engels, Joseph Cole, Polina Golland, Stefan Klein, Daniel C. Alexander
TADPOLE's unique results suggest that current prediction algorithms provide sufficient accuracy to exploit biomarkers related to clinical diagnosis and ventricle volume, for cohort refinement in clinical trials for Alzheimer's disease.
no code implementations • 22 Jan 2020 • Mi Luo, Fei Chen, Pengxiang Cheng, Zhenhua Dong, Xiuqiang He, Jiashi Feng, Zhenguo Li
Recommender systems often face heterogeneous datasets containing highly personalized historical data of users, where no single model could give the best recommendation for every user.
1 code implementation • CVPR 2020 • Yue Liao, Si Liu, Fei Wang, Yanjie Chen, Chen Qian, Jiashi Feng
Human and object points are the center of the detection boxes, and the interaction point is the midpoint of the human and object points.
Ranked #25 on
Human-Object Interaction Detection
on V-COCO
no code implementations • 30 Dec 2019 • Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng
In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy.
no code implementations • 16 Dec 2019 • Hao Du, Jiashi Feng, Mengling Feng
In clinical practice, human radiologists actually review medical images with high resolution monitors and zoom into region of interests (ROIs) for a close-up examination.
no code implementations • 10 Dec 2019 • Shoufa Chen, Yunpeng Chen, Shuicheng Yan, Jiashi Feng
We demonstrate the effectiveness of our search strategy by conducting extensive experiments.
no code implementations • NeurIPS 2019 • Pan Zhou, Xiao-Tong Yuan, Huan Xu, Shuicheng Yan, Jiashi Feng
We address the problem of meta-learning which learns a prior over hypothesis from a sample of meta-training tasks for fast adaptation on meta-testing tasks.
1 code implementation • 29 Oct 2019 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven Hoi, Jiashi Feng
In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-tail training data based on the LVIS [5] dataset, and find a major cause is the inaccurate classification of object proposals.
4 code implementations • ICLR 2020 • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.
Ranked #3 on
Long-tail learning with class descriptors
on CUB-LT
2 code implementations • ICLR 2020 • Hanshu Yan, Jiawei Du, Vincent Y. F. Tan, Jiashi Feng
We then provide an insightful understanding of this phenomenon by exploiting a certain desirable property of the flow of a continuous-time ODE, namely that integral curves are non-intersecting.
no code implementations • 6 Oct 2019 • Haoyuan Cao, Shining Yu, Jiashi Feng
Although CNN has reached satisfactory performance in image-related tasks, using CNN to process videos is much more challenging due to the enormous size of raw video streams.
1 code implementation • 27 Sep 2019 • Mingjie Sun, Jimin Xiao, Eng Gee Lim, Yanchu Xie, Jiashi Feng
In this paper, we aim to tackle the task of semi-supervised video object segmentation across a sequence of frames where only the ground-truth segmentation of the first frame is provided.
1 code implementation • 26 Sep 2019 • Guilin Li, Xing Zhang, Zitong Wang, Matthias Tan, Jiashi Feng, Zhenguo Li, Tong Zhang
Recently, the efficiency of automatic neural architecture design has been significantly improved by gradient-based search methods such as DARTS.
2 code implementations • CVPR 2020 • Li Yuan, Francis E. H. Tay, Guilin Li, Tao Wang, Jiashi Feng
Without any extra computation cost, Tf-KD achieves up to 0. 65\% improvement on ImageNet over well-established baseline models, which is superior to label smoothing regularization.
no code implementations • 25 Sep 2019 • Yujun Shi, Benben Liao, Guangyong Chen, Yun Liu, Ming-Ming Cheng, Jiashi Feng
Then, we show by experiments that DNNs under standard training rely heavily on optimizing the non-robust component in achieving decent performance.
no code implementations • 25 Sep 2019 • Mengmi Zhang, Tao Wang, Joo Hwee Lim, Jiashi Feng
Without tampering with the performance on initial tasks, our method learns novel concepts given a few training examples of each class in new tasks.
no code implementations • 25 Sep 2019 • Dapeng Hu, Jian Liang*, Qibin Hou, Hanshu Yan, Jiashi Feng
Previous adversarial learning methods condition domain alignment only on pseudo labels, but noisy and inaccurate pseudo labels may perturb the multi-class distribution embedded in probabilistic predictions, hence bringing insufficient alleviation to the latent mismatch problem.
no code implementations • 16 Sep 2019 • Shenglan Liu, Yang Yu, Yang Liu, Hong Qiao, Lin Feng, Jiashi Feng
Manifold learning now plays a very important role in machine learning and many relevant applications.
1 code implementation • CVPR 2020 • Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, Shuicheng Yan
In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image.
1 code implementation • ICCV 2019 • Xuecheng Nie, Jianfeng Zhang, Shuicheng Yan, Jiashi Feng
Based on SPR, we develop the SPM model that can directly predict structured poses for multiple persons in a single stage, and thus offer a more compact pipeline and attractive efficiency advantage over two-stage methods.
Ranked #3 on
Keypoint Detection
on MPII Multi-Person
no code implementations • ICCV 2019 • Xuecheng Nie, Yuncheng Li, Linjie Luo, Ning Zhang, Jiashi Feng
Existing video-based human pose estimation methods extensively apply large networks onto every frame in the video to localize body joints, which suffer high computational cost and hardly meet the low-latency requirement in realistic applications.
Ranked #4 on
2D Human Pose Estimation
on JHMDB (2D poses only)
5 code implementations • ICCV 2019 • Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, Jiashi Feng
In this paper, we tackle the challenging few-shot segmentation problem from a metric learning perspective and present PANet, a novel prototype alignment network to better utilize the information of the support set.
1 code implementation • CVPR 2020 • Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng
In this work, we propose a new \emph{global} similarity metric, termed as \emph{central similarity}, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy.
no code implementations • ICLR 2020 • Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng
The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs).
no code implementations • 15 Jun 2019 • Hongsong Wang, Jiashi Feng
Action anticipation, which aims to recognize the action with a partial observation, becomes increasingly popular due to a wide range of applications.
1 code implementation • 15 Jun 2019 • Hongsong Wang, Jian Dong, Bin Cheng, Jiashi Feng
We therefore propose a novel Position-Velocity Recurrent Encoder-Decoder (PVRED) for human motion prediction, which makes full use of pose velocities and temporal positional information.
no code implementations • 13 Jun 2019 • Hanshu Yan, Xuan Chen, Vincent Y. F. Tan, Wenhan Yang, Joe Wu, Jiashi Feng
They jointly facilitate unsupervised learning of a noise model for various noise types.
3 code implementations • CVPR 2019 • Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng
To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.
1 code implementation • ICLR 2020 • Jiawei Du, Hu Zhang, Joey Tianyi Zhou, Yi Yang, Jiashi Feng
Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries.
no code implementations • 6 Jun 2019 • Yujun Shi, Benben Liao, Guangyong Chen, Yun Liu, Ming-Ming Cheng, Jiashi Feng
Despite many previous works studying the reason behind such adversarial behavior, the relationship between the generalization performance and adversarial behavior of DNNs is still little understood.
no code implementations • 3 Jun 2019 • Yuan Hu, Yingtian Zou, Jiashi Feng
In this work, we address a new finer-grained task, termed panoptic edge detection (PED), which aims at predicting semantic-level boundaries for stuff categories and instance-level boundaries for instance categories, in order to provide more comprehensive and unified scene understanding from the perspective of edges. We then propose a versatile framework, Panoptic Edge Network (PEN), which aggregates different tasks of object detection, semantic and instance edge detection into a single holistic network with multiple branches.
no code implementations • 3 Jun 2019 • Jayashree Karlekar, Jiashi Feng, Zi Sian Wong, Sugiri Pranata
However, deploying such high performing models to resource constraint devices or real-time applications is challenging.
no code implementations • 26 May 2019 • Hanyang Kong, Jian Zhao, Xiaoguang Tu, Junliang Xing, ShengMei Shen, Jiashi Feng
Recent deep learning based face recognition methods have achieved great performance, but it still remains challenging to recognize very low-resolution query face like 28x28 pixels when CCTV camera is far from the captured subject.
1 code implementation • 23 May 2019 • Mengmi Zhang, Tao Wang, Joo Hwee Lim, Gabriel Kreiman, Jiashi Feng
In each classification task, our method learns a set of variational prototypes with their means and variances, where embedding of the samples from the same class can be represented in a prototypical distribution and class-representative prototypes are separated apart.
5 code implementations • CVPR 2019 • Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang
We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway.
Ranked #1 on
RGB Salient Object Detection
on SOD
no code implementations • 19 Apr 2019 • Yingtian Zou, Jiashi Feng
Extensive experiments on few-shot classification and regression problems clearly demonstrate the superiority of HML over fine-tuning and state-of-the-art meta learning approaches in terms of generalization across heterogeneous tasks.
no code implementations • 17 Apr 2019 • Li Yuan, Francis EH Tay, Ping Li, Li Zhou, Jiashi Feng
The evaluator defines a learnable information preserving metric between original video and summary video and "supervises" the selector to identify the most informative frames to form the summary video.
Ranked #7 on
Unsupervised Video Summarization
on TvSum