no code implementations • 23 Apr 2025 • Ying Li, Xiaobao Wei, Xiaowei Chi, Yuming Li, Zhongyu Zhao, Hao Wang, Ningning Ma, Ming Lu, Shanghang Zhang
Based on the action tree and visual guidance, ManipDreamer significantly boosts the instruction-following ability and visual quality.
no code implementations • CVPR 2025 • Jiajun Cao, Yuan Zhang, Tao Huang, Ming Lu, Qizhe Zhang, Ruichuan An, Ningning Ma, Shanghang Zhang
Visual encoders are fundamental components in vision-language models (VLMs), each showcasing unique strengths derived from various pre-trained visual foundation models.
no code implementations • 23 Nov 2024 • Xiaobao Wei, Qingpo Wuwu, Zhongyu Zhao, Zhuangzhe Wu, Nan Huang, Ming Lu, Ningning Ma, Shanghang Zhang
To address this, we propose Explicit Motion Decomposition (EMD), which models the motions of dynamic objects by introducing learnable motion embeddings to the Gaussians, enhancing the decomposition in street scenes.
no code implementations • 7 Feb 2024 • Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang
First, a depth estimation (DE) scheme leverages relative depth information to realize the effective feature lifting from 2D to 3D spaces.
1 code implementation • ICCV 2023 • Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang
In this paper, we propose a novel training strategy called SupFusion, which provides an auxiliary feature level supervision for effective LiDAR-Camera fusion and significantly boosts detection performance.
no code implementations • CVPR 2023 • Leheng Li, Qing Lian, Luozhou Wang, Ningning Ma, Ying-Cong Chen
This work explores the use of 3D generative models to synthesize training data for 3D vision tasks.
25 code implementations • CVPR 2021 • Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun
We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology.
Ranked #49 on
Semantic Segmentation
on Cityscapes val
5 code implementations • CVPR 2021 • Ningning Ma, Xiangyu Zhang, Ming Liu, Jian Sun
We present a simple, effective, and general activation function we term ACON which learns to activate the neurons or not.
7 code implementations • ECCV 2020 • Ningning Ma, Xiangyu Zhang, Jian Sun
We present a conceptually simple but effective funnel activation for image recognition tasks, called Funnel activation (FReLU), that extends ReLU and PReLU to a 2D activation by adding a negligible overhead of spatial condition.
2 code implementations • ECCV 2020 • Ningning Ma, Xiangyu Zhang, Jiawei Huang, Jian Sun
WeightNet is easy and memory-conserving to train, on the kernel space instead of the feature space.
35 code implementations • ECCV 2018 • Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, Jian Sun
Datasets, Transforms and Models specific to Computer Vision
Ranked #948 on
Image Classification
on ImageNet