no code implementations • 18 Nov 2024 • Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny Chen, Jintai Chen, Jian Wu
Our idea is simple: We tackle the monocular depth estimation (MDE) task with an autoregressive prediction paradigm, based on two core designs.
no code implementations • 26 Sep 2024 • Song Wang, Zhongdao Wang, Jiawei Yu, Wentong Li, Bailan Feng, Junbo Chen, Jianke Zhu
In this paper, we conduct a comprehensive evaluation of existing semantic occupancy prediction models from a reliability perspective for the first time.
1 code implementation • 8 Jul 2024 • Qijun Gan, Wentong Li, Jinwei Ren, Jianke Zhu
Reconstructing high-fidelity hand models with intricate textures plays a crucial role in enhancing human-object interaction and advancing real-world applications.
1 code implementation • 2 Jul 2024 • Wentong Li, Yuqian Yuan, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang
However, the visual tokens are redundant and can be considerably increased when dealing with high-resolution images, impairing the efficiency of MLLMs significantly.
Ranked #135 on Visual Question Answering on MM-Vet
1 code implementation • 24 May 2024 • Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu
Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving.
1 code implementation • CVPR 2024 • Song Wang, Jiawei Yu, Wentong Li, Wenyu Liu, Xiaolu Liu, Junbo Chen, Jianke Zhu
Furthermore, the voxels in the boundary region are more challenging to differentiate than those in the interior.
1 code implementation • CVPR 2024 • Xiaolu Liu, Song Wang, Wentong Li, Ruizi Yang, Junbo Chen, Jianke Zhu
Currently, high-definition (HD) map construction leans towards a lightweight online generation tendency, which aims to preserve timely and reliable road scene information.
2 code implementations • CVPR 2024 • Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang, Jianke Zhu
In this paper, we propose Osprey, a mask-text instruction tuning approach, to extend MLLMs by incorporating fine-grained mask regions into language instruction, aiming at achieving pixel-wise visual understanding.
1 code implementation • NeurIPS 2023 • Wentong Li, Yuqian Yuan, Song Wang, Wenyu Liu, Dongqi Tang, Jian Liu, Jianke Zhu, Lei Zhang
In this work, we formulate the affinity modeling as an affinity propagation process, and propose a local and a global pairwise affinity terms to generate accurate soft pseudo labels.
1 code implementation • ICCV 2023 • Wentong Li, Yuqian Yuan, Song Wang, Jianke Zhu, Jianshu Li, Jian Liu, Lei Zhang
Weakly-supervised image segmentation has recently attracted increasing research attentions, aiming to avoid the expensive pixel-wise labeling.
1 code implementation • CVPR 2023 • Song Wang, Wentong Li, Wenyu Liu, Xiaolu Liu, Jianke Zhu
To mitigate the defects caused by lacking semantic cues in LiDAR data, we present an online Camera-to-LiDAR distillation scheme to facilitate the semantic learning from image to point cloud.
2 code implementations • 3 Dec 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Risheng Yu, Xiansheng Hua, Lei Zhang
In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention.
3 code implementations • 13 Oct 2022 • Xue Yang, Gefan Zhang, Wentong Li, Xuehui Wang, Yue Zhou, Junchi Yan
Oriented object detection emerges in many applications from aerial images to autonomous driving, while many existing detection benchmarks are annotated with horizontal bounding box only which is also less costive than fine-grained rotated box, leading to a gap between the readily available training corpus and the rising demand for oriented object detection.
1 code implementation • 19 Jul 2022 • Wentong Li, Wenyu Liu, Jianke Zhu, Miaomiao Cui, Xiansheng Hua, Lei Zhang
A simple mask supervised SOLOv2 model is adapted to predict the instance-aware mask map as the level set for each instance.
2 code implementations • 4 Jul 2022 • Wenyu Liu, Wentong Li, Jianke Zhu, Miaomiao Cui, Xuansong Xie, Lei Zhang
With DIAL-Filters, we design both unsupervised and supervised frameworks for nighttime driving-scene segmentation, which can be trained in an end-to-end manner.
no code implementations • 7 Dec 2021 • Wentong Li, Yijie Chen, Wenyu Liu, Jianke Zhu
Instead of learning the pairwise affinity, the level set method with the carefully designed energy functions treats the object segmentation as curve evolution, which is able to accurately recover the object's boundaries and prevent the interference from the indistinguishable background and similar objects.
no code implementations • 2 Jun 2021 • Hantang Liu, Wentong Li, Jianke Zhu
Although enjoying the merits of promising results on the semantic parsing, deep learning methods cannot directly make use of the architectural rules, which play an important role for man-made structures.
2 code implementations • CVPR 2022 • Wentong Li, Yijie Chen, Kaixuan Hu, Jianke Zhu
In contrast to the generic object, aerial targets are often non-axis aligned with arbitrary orientations having the cluttered surroundings.
Ranked #8 on Oriented Object Detection on DOTA 1.0
no code implementations • 18 Jun 2015 • Jian Xu, Kuang-Chih Lee, Wentong Li, Hang Qi, Quan Lu
In this paper, we propose a smart pacing approach in which the delivery pace of each campaign is learned from both offline and online data to achieve smooth delivery and optimal performance goals.