SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

1 code implementation22 Apr 2024 Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

YOLO-World: Real-Time Open-Vocabulary Object Detection

1 code implementation30 Jan 2024 Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools.

Instance Segmentation Language Modelling +4

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

2 code implementations27 Nov 2023 Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan

1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep.

 Ranked #1 on Object Detection on COCO 2017 (mAP metric)

Image Classification Object Detection +3

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

1 code implementation NeurIPS 2023 Cheng Cheng, Lin Song, Ruoyi Xue, Hang Wang, Hongbin Sun, Yixiao Ge, Ying Shan

Without bells and whistles, our approach outperforms the state-of-the-art online few-shot learning method by an average of 3. 6\% on eight image classification datasets with higher inference speed.

Few-Shot Learning Image Classification +3

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

1 code implementation8 Oct 2023 Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song

In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e. g., describing object property, category, and relationship).

Language Modelling Large Language Model +4

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

no code implementations30 Jun 2023 Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie

In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.

3D Object Detection Depth Estimation +3

BoxSnake: Polygonal Instance Segmentation with Box Supervision

1 code implementation ICCV 2023 Rui Yang, Lin Song, Yixiao Ge, Xiu Li

Box-supervised instance segmentation has gained much attention as it requires only simple box annotations instead of costly mask or polygon annotations.

Box-supervised Instance Segmentation Segmentation +1

Dynamic Grained Encoder for Vision Transformers

1 code implementation NeurIPS 2021 Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng

Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region.

Image Classification Language Modelling +2

Safety Embedded Stochastic Optimal Control of Networked Multi-Agent Systems via Barrier States

no code implementations8 Oct 2022 Lin Song, Pan Zhao, Neng Wan, Naira Hovakimyan

This paper presents a novel approach for achieving safe stochastic optimal control in networked multi-agent systems (MASs).

DBQ-SSD: Dynamic Ball Query for Efficient 3D Object Detection

1 code implementation22 Jul 2022 Jinrong Yang, Lin Song, Songtao Liu, Weixin Mao, Zeming Li, Xiaoping Li, Hongbin Sun, Jian Sun, Nanning Zheng

Many point-based 3D detectors adopt point-feature sampling strategies to drop some points for efficient inference.

3D Object Detection object-detection

Simplified Analysis on Filtering Sensitivity Trade-offs in Continuous- and Discrete-Time Systems

no code implementations8 Apr 2022 Neng Wan, Dapeng Li, Lin Song, Naira Hovakimyan

A simplified analysis is performed on the Bode-type filtering sensitivity trade-off integrals, which capture the sensitivity characteristics of the estimate and estimation error with respect to the process input and estimated signal in continuous- and discrete-time linear time-invariant filtering systems.

Generalization of Safe Optimal Control Actions on Networked Multi-Agent Systems

no code implementations21 Sep 2021 Lin Song, Neng Wan, Aditya Gahlawat, Chuyuan Tao, Naira Hovakimyan, Evangelos A. Theodorou

The control action composition is achieved by taking a weighted mixture of the existing controllers according to the contribution of each component task.

Fine-Grained Dynamic Head for Object Detection

1 code implementation NeurIPS 2020 Lin Song, Yanwei Li, Zhengkai Jiang, Zeming Li, Hongbin Sun, Jian Sun, Nanning Zheng

To this end, we propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance, which further releases the ability of multi-scale feature representation.

Object object-detection +1

Compositionality of Linearly Solvable Optimal Control in Networked Multi-Agent Systems

no code implementations28 Sep 2020 Lin Song, Neng Wan, Aditya Gahlawat, Naira Hovakimyan, Evangelos A. Theodorou

The proposed approach achieves both the compositionality and optimality of control actions simultaneously within the cooperative MAS framework in both discrete- and continuous-time in a sample-efficient manner, which reduces the burden of re-computation of the optimal control solutions for the new task on the MASs.

Contraction $\mathcal{L}_1$-Adaptive Control using Gaussian Processes

no code implementations8 Sep 2020 Aditya Gahlawat, Arun Lakshmanan, Lin Song, Andrew Patterson, Zhuohuan Wu, Naira Hovakimyan, Evangelos Theodorou

We present $\mathcal{CL}_1$-$\mathcal{GP}$, a control framework that enables safe simultaneous learning and control for systems subject to uncertainties.

Gaussian Processes

Learning Dynamic Routing for Semantic Segmentation

1 code implementation CVPR 2020 Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun

To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space.

Segmentation Semantic Segmentation

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection

no code implementations CVPR 2019 Lin Song, Shiwei Zhang, Gang Yu, Hongbin Sun

In this paper, we define these ambiguous samples as "transitional states", and propose a Transition-Aware Context Network (TACNet) to distinguish transitional states.

Action Detection

