Search Results for author: Yukang Chen

Found 39 papers, 26 papers with code

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

1 code implementation19 Aug 2024 Fuzhao Xue, Yukang Chen, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.

Video Captioning Video Understanding

SEED-Story: Multimodal Long Story Generation with Large Language Model

1 code implementation11 Jul 2024 Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen

We further propose multimodal attention sink mechanism to enable the generation of stories with up to 25 sequences (only 10 for training) in a highly efficient autoregressive manner.

Image Generation Language Modelling +3

Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

1 code implementation26 Jun 2024 Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia

Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy.

Ranked #11 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +2

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs

no code implementations20 Jun 2024 Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes.

Decision Making

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

1 code implementation CVPR 2024 Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia

This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module to greatly enhance the adaptivity of sparse CNNs at minimal computational cost.

Ranked #5 on 3D Semantic Segmentation on SemanticKITTI (val mIoU metric)

3D Semantic Segmentation LIDAR Semantic Segmentation

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

no code implementations29 Feb 2024 Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent.

Minecraft reinforcement-learning +2

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

4 code implementations25 Jan 2024 Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

no code implementations13 Jan 2024 Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.

4k Position

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

no code implementations CVPR 2024 Lin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying Shan

We empirically show that sparse attention not only reduces computational demands but also enhances model performance in both NLP and multi-modal tasks.

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

no code implementations CVPR 2024 Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia

We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.

Denoising Diffusion Step-aware Models

1 code implementation5 Oct 2023 Shuai Yang, Yukang Chen, Luozhou Wang, Shu Liu, Yingcong Chen

Denoising Diffusion Probabilistic Models (DDPMs) have garnered popularity for data generation across various domains.

Denoising

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

4 code implementations21 Sep 2023 Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.

4k Instruction Following +3

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

1 code implementation8 Aug 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +3

IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation

1 code implementation ICCV 2023 Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi

Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category.

6D Pose Estimation

Spherical Transformer for LiDAR-based 3D Recognition

2 code implementations CVPR 2023 Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, Jiaya Jia

In this work, we study the varying-sparsity distribution of LiDAR points and present SphereFormer to directly aggregate information from dense close points to the sparse distant ones.

3D Object Detection 3D Semantic Segmentation +3

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection

1 code implementation ICCV 2023 Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez

For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall.

3D Object Detection Autonomous Driving +3

Spatial Pruned Sparse Convolution for Efficient 3D Object Detection

no code implementations28 Sep 2022 Jianhui Liu, Yukang Chen, Xiaoqing Ye, Zhuotao Tian, Xiao Tan, Xiaojuan Qi

3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.

3D Object Detection Object +1

Voxel Field Fusion for 3D Object Detection

1 code implementation CVPR 2022 Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.

3D Object Detection Data Augmentation +2

Focal Sparse Convolutional Networks for 3D Object Detection

2 code implementations CVPR 2022 Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we introduce two new modules to enhance the capability of Sparse CNNs, both are based on making feature sparsity learnable with position-wise importance prediction.

3D Object Detection Object +1

Single-DARTS: Towards Stable Architecture Search

no code implementations18 Aug 2021 Pengfei Hou, Ying Jin, Yukang Chen

Differentiable architecture search (DARTS) marks a milestone in Neural Architecture Search (NAS), boasting simplicity and small search costs.

Neural Architecture Search

Fully Convolutional Networks for Panoptic Segmentation with Point-based Supervision

1 code implementation17 Aug 2021 Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia

In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.

Panoptic Segmentation Segmentation +1

Reinforcement Learning for the Beginning of Starcraft II Game

no code implementations CUHK Course IERG5350 2020 Yukang Chen, Ruihang Chu

In this project, we plan to develop a reinforcement learning model for the beginning of Starcraft II game, instead of the full-length game.

reinforcement-learning Reinforcement Learning +3

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track

no code implementations6 Oct 2020 Zeming Li, Yuchen Ma, Yukang Chen, Xiangyu Zhang, Jian Sun

In this report, we present our object detection/instance segmentation system, MegDetV2, which works in a two-pass fashion, first to detect instances then to obtain segmentation.

Instance Segmentation object-detection +3

Dynamic Scale Training for Object Detection

4 code implementations26 Apr 2020 Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu Qi, Jian Sun, Jiaya Jia

We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.

Instance Segmentation Model Optimization +4

Learning Dynamic Routing for Semantic Segmentation

1 code implementation CVPR 2020 Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun

To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.

Segmentation Semantic Segmentation

PointINS: Point-based Instance Segmentation

no code implementations13 Mar 2020 Lu Qi, Yi Wang, Yukang Chen, Yingcong Chen, Xiangyu Zhang, Jian Sun, Jiaya Jia

In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features.

Instance Segmentation Object Detection +3

DetNAS: Backbone Search for Object Detection

2 code implementations NeurIPS 2019 Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, Jian Sun

In this work, we present DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection.

General Classification Image Classification +4

Joint Neural Architecture Search and Quantization

no code implementations23 Nov 2018 Yukang Chen, Gaofeng Meng, Qian Zhang, Xinbang Zhang, Liangchen Song, Shiming Xiang, Chunhong Pan

Here our goal is to automatically find a compact neural network model with high performance that is suitable for mobile devices.

Model Compression Neural Architecture Search +1

Reinforced Evolutionary Neural Architecture Search

1 code implementation1 Aug 2018 Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, Xinggang Wang

To address this issue, we propose the Reinforced Evolutionary Neural Architecture Search (RE- NAS), which is an evolutionary method with the reinforced mutation for NAS.

Neural Architecture Search Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.