1 code implementation • 19 Aug 2024 • Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han
We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.
1 code implementation • 11 Jul 2024 • Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen
We further propose multimodal attention sink mechanism to enable the generation of stories with up to 25 sequences (only 10 for training) in a highly efficient autoregressive manner.
1 code implementation • 26 Jun 2024 • Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia
Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy.
Ranked #11 on Arithmetic Reasoning on GSM8K (using extra training data)
no code implementations • 20 Jun 2024 • Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes.
1 code implementation • CVPR 2024 • Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module to greatly enhance the adaptivity of sparse CNNs at minimal computational cost.
Ranked #5 on 3D Semantic Segmentation on SemanticKITTI (val mIoU metric)
no code implementations • 29 Feb 2024 • Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.
4 code implementations • 25 Jan 2024 • Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).
no code implementations • 13 Jan 2024 • Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng
Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.
no code implementations • CVPR 2024 • Lin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying Shan
We empirically show that sparse attention not only reduces computational demands but also enhances model performance in both NLP and multi-modal tasks.
no code implementations • CVPR 2024 • Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia
We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.
1 code implementation • 5 Oct 2023 • Shuai Yang, Yukang Chen, Luozhou Wang, Shu Liu, Yingcong Chen
Denoising Diffusion Probabilistic Models (DDPMs) have garnered popularity for data generation across various domains.
4 code implementations • 21 Sep 2023 • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.
1 code implementation • ICCV 2023 • Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia
Therefore, we abandon the mask attention design and resort to an auxiliary center regression task instead.
1 code implementation • 8 Aug 2023 • Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez
For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.
Ranked #8 on 3D Object Detection on nuScenes
2 code implementations • CVPR 2024 • Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia
In this work, we propose a new segmentation task -- reasoning segmentation.
1 code implementation • ICCV 2023 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi
Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category.
2 code implementations • CVPR 2023 • Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia
In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones.
Ranked #1 on Semantic Segmentation on KITTI Semantic Segmentation
2 code implementations • CVPR 2023 • Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Ranked #1 on 3D Multi-Object Tracking on nuScenes LiDAR only
1 code implementation • ICCV 2023 • Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez
For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.
no code implementations • 28 Sep 2022 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Zhuotao Tian, Xiao Tan, Xiaojuan Qi
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
2 code implementations • CVPR 2023 • Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
Recent advance in 2D CNNs has revealed that large kernels are important.
1 code implementation • CVPR 2022 • Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.
2 code implementations • CVPR 2022 • Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia
In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.
1 code implementation • 11 Apr 2022 • Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen, Xiangyu Zhang, Bernard Ghanem, Jian Sun
TNAS performs a modified bi-level Breadth-First Search in the proposed trees to discover a high-performance architecture.
2 code implementations • CVPR 2021 • Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia
However, this option traditionally hurts the detection performance much.
no code implementations • 26 Aug 2021 • Ruihang Chu, Yukang Chen, Tao Kong, Lu Qi, Lei LI
Separating 3D point clouds into individual instances is an important task for 3D vision.
no code implementations • 18 Aug 2021 • Pengfei Hou, Ying Jin, Yukang Chen
Differentiable architecture search (DARTS) marks a milestone in Neural Architecture Search (NAS), boasting simplicity and small search costs.
1 code implementation • 17 Aug 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.
1 code implementation • CVPR 2021 • Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei LI, Jiaya Jia
We propose Scale-aware AutoAug to learn data augmentation policies for object detection.
no code implementations • CUHK Course IERG5350 2020 • Yukang Chen, Ruihang Chu
In this project, we plan to develop a reinforcement learning model for the beginning of Starcraft II game, instead of the full-length game.
no code implementations • 6 Oct 2020 • Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun
In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation.
4 code implementations • 26 Apr 2020 • Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
1 code implementation • CVPR 2020 • Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun
To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.
no code implementations • 13 Mar 2020 • Lu Qi, Yi Wang, Yukang Chen, Yingcong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia
In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features.
no code implementations • CVPR 2019 • Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang
This architecture achieves a competitive result on CIFAR-10.
2 code implementations • NeurIPS 2019 • Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, Jian Sun
In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection.
1 code implementation • 17 Jan 2019 • Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, Xinggang Wang
In our implementations, architectures are first searched on a small dataset, e. g., CIFAR-10.
no code implementations • 23 Nov 2018 • Yukang Chen, Gaofeng Meng, Qian Zhang, Xinbang Zhang, Liangchen Song, Shiming Xiang, Chunhong Pan
Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices.
1 code implementation • 1 Aug 2018 • Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang
To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RE- NAS), which is an evolutionary method with the reinforced mutation for NAS.