Search Results for author: Wenwei Zhang

Found 56 papers, 49 papers with code

F-LMM: Grounding Frozen Large Multimodal Models

1 code implementation9 Jun 2024 Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.

General Knowledge Instruction Following +5

ANAH: Analytical Annotation of Hallucinations in Large Language Models

1 code implementation30 May 2024 Ziwei Ji, Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications.

Generative Question Answering Hallucination +1

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

1 code implementation23 May 2024 Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments.

Autonomous Driving Benchmarking +3

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

1 code implementation20 May 2024 Hongwei Liu, Zilong Zheng, Yuxuan Qiao, Haodong Duan, Zhiwei Fei, Fengzhe Zhou, Wenwei Zhang, Songyang Zhang, Dahua Lin, Kai Chen

MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills.

College Mathematics GSM8K +1

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

1 code implementation8 May 2024 Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods.

Autonomous Driving LIDAR Semantic Segmentation +2

InternLM2 Technical Report

2 code implementations26 Mar 2024 Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

4k Long-Context Understanding

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

1 code implementation25 Mar 2024 Lingdong Kong, Xiang Xu, Jun Cen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.

Data Augmentation Scene Understanding

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

1 code implementation19 Mar 2024 Zehui Chen, Kuikun Liu, Qiuchen Wang, Wenwei Zhang, Jiangning Liu, Dahua Lin, Kai Chen, Feng Zhao

Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.

Hallucination

CriticBench: Evaluating Large Language Models as Critic

1 code implementation21 Feb 2024 Tian Lan, Wenwei Zhang, Chen Xu, Heyan Huang, Dahua Lin, Kai Chen, Xian-Ling Mao

Critique ability are crucial in the scalable oversight and self-improvement of Large Language Models (LLMs).

Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

no code implementations20 Feb 2024 Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin

The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs).

Data Augmentation

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

1 code implementation9 Feb 2024 Huaiyuan Ying, Shuo Zhang, Linyang Li, Zhejian Zhou, Yunfan Shao, Zhaoye Fei, Yichuan Ma, Jiawei Hong, Kuikun Liu, Ziyi Wang, Yudong Wang, Zijian Wu, Shuaibin Li, Fengzhe Zhou, Hongwei Liu, Songyang Zhang, Wenwei Zhang, Hang Yan, Xipeng Qiu, Jiayu Wang, Kai Chen, Dahua Lin

We further explore how to use LEAN to solve math problems and study its performance under the setting of multi-task learning which shows the possibility of using LEAN as a unified platform for solving and proving in math.

Data Augmentation GSM8K +3

Can AI Assistants Know What They Don't Know?

1 code implementation24 Jan 2024 Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, ShiMin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu

To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets.

Math Open-Domain Question Answering +1

CLIM: Contrastive Language-Image Mosaic for Region Representation

1 code implementation18 Dec 2023 Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy

Our experimental results demonstrate that CLIM improves different baseline open-vocabulary object detectors by a large margin on both OV-COCO and OV-LVIS benchmarks.

Object object-detection +1

Mixed Pseudo Labels for Semi-Supervised Object Detection

1 code implementation12 Dec 2023 Zeming Chen, Wenwei Zhang, Xinjiang Wang, Kai Chen, Zhi Wang

While the pseudo-label method has demonstrated considerable success in semi-supervised object detection tasks, this paper uncovers notable limitations within this approach.

 Ranked #1 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Object object-detection +3

Fake Alignment: Are LLMs Really Aligned Well?

1 code implementation10 Nov 2023 Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, Wenwei Zhang, Xingjun Ma, Yu-Gang Jiang, Yu Qiao, Yingchun Wang

The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety.

Multiple-choice

OV-PARTS: Towards Open-Vocabulary Part Segmentation

1 code implementation NeurIPS 2023 Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang

Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

1 code implementation2 Oct 2023 Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen Change Loy

We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training.

Novel Object Detection Object +5

CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction

1 code implementation2 Oct 2023 Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Xiangtai Li, Wentao Liu, Chen Change Loy

However, when transferring the vision-language alignment of CLIP from global image representation to local region representation for the open-vocabulary dense prediction tasks, CLIP ViTs suffer from the domain shift from full images to local image regions.

Image Classification Image Segmentation +7

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

no code implementations18 Sep 2023 Chenming Zhu, Wenwei Zhang, Tai Wang, Xihui Liu, Kai Chen

Instead of leveraging 2D images, we propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.

3D Object Detection 3D Open-Vocabulary Object Detection +4

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

1 code implementation14 Sep 2023 Zeqi Xiao, Tai Wang, Jingbo Wang, Jinkun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang

Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution.

Language Modelling Large Language Model

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

1 code implementation8 May 2023 Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, Kai Chen

To further enhance the ability to chat with humans of the MultiModal-GPT, we utilize language-only instruction-following data to train the MultiModal-GPT jointly.

Instruction Following Language Modelling

Transformer-Based Visual Segmentation: A Survey

2 code implementations19 Apr 2023 Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks.

Autonomous Driving Point Cloud Segmentation +1

RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions

1 code implementation13 Apr 2023 Shaoyuan Xie, Lingdong Kong, Wenwei Zhang, Jiawei Ren, Liang Pan, Kai Chen, Ziwei Liu

Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness.

Robust Camera Only 3D Object Detection

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

1 code implementation CVPR 2023 Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset.

3D Object Detection object-detection

Aligning Bag of Regions for Open-Vocabulary Object Detection

1 code implementation CVPR 2023 Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy

The embeddings of regions in a bag are treated as embeddings of words in a sentence, and they are sent to the text encoder of a VLM to obtain the bag-of-regions embedding, which is learned to be aligned to the corresponding features extracted by a frozen VLM.

Ranked #8 on Open Vocabulary Object Detection on MSCOCO (using extra training data)

Object object-detection +2

RTMDet: An Empirical Study of Designing Real-Time Object Detectors

10 code implementations14 Dec 2022 Chengqi Lyu, Wenwei Zhang, Haian Huang, Yue Zhou, Yudong Wang, Yanyi Liu, Shilong Zhang, Kai Chen

In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection.

Object object-detection +7

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

1 code implementation26 Jul 2022 Tai Wang, Qing Lian, Chenming Zhu, Xinge Zhu, Wenwei Zhang

In this technical report, we present our solution, dubbed MV-FCOS3D++, for the Camera-Only 3D Detection track in Waymo Open Dataset Challenge 2022.

object-detection Object Detection +1

MMRotate: A Rotated Object Detection Benchmark using PyTorch

1 code implementation28 Apr 2022 Yue Zhou, Xue Yang, Gefan Zhang, Jiabao Wang, Yanyi Liu, Liping Hou, Xue Jiang, Xingzhao Liu, Junchi Yan, Chengqi Lyu, Wenwei Zhang, Kai Chen

We present an open-source toolbox, named MMRotate, which provides a coherent algorithm framework of training, inferring, and evaluation for the popular rotated object detection algorithm based on deep learning.

Object object-detection +1

Dense Siamese Network for Dense Unsupervised Learning

1 code implementation21 Mar 2022 Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy

It also extracts a batch of region embeddings that correspond to some sub-regions in the overlapped area to be contrasted for region consistency.

Self-Supervised Learning Unsupervised Semantic Segmentation

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

1 code implementation14 Aug 2021 Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

Integrated Satellite-HAP-Terrestrial Networks for Dual-Band Connectivity

no code implementations6 Jul 2021 Wenwei Zhang, Ruoqi Deng, Boya Di, Lingyang Song

The recent development of high-altitude platforms (HAPs) has attracted increasing attention since they can serve as a promising communication method to assist satellite-terrestrial networks.

K-Net: Towards Unified Image Segmentation

1 code implementation NeurIPS 2021 Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy

The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class.

Image Segmentation Instance Segmentation +2

Exploring Data Augmentation for Multi-Modality 3D Object Detection

8 code implementations23 Dec 2020 Wenwei Zhang, Zhe Wang, Chen Change Loy

Due to the fact that multi-modality data augmentation must maintain consistency between point cloud and images, recent methods in this field typically use relatively insufficient data augmentation.

3D Object Detection Autonomous Driving +3

More Information Supervised Probabilistic Deep Face Embedding Learning

no code implementations ICML 2020 Ying Huang, Shangfeng Qiu, Wenwei Zhang, Xianghui Luo, Jinzhuo Wang

Researches using margin based comparison loss demonstrate the effectiveness of penalizing the distance between face feature and their corresponding class centers.

Face Recognition Open Set Learning

Deep Frequent Spatial Temporal Learning for Face Anti-Spoofing

no code implementations20 Jan 2020 Ying Huang, Wenwei Zhang, Jinzhuo Wang

Face anti-spoofing is crucial for the security of face recognition system, by avoiding invaded with presentation attack.

Face Anti-Spoofing Face Recognition

EcoNAS: Finding Proxies for Economical Neural Architecture Search

no code implementations CVPR 2020 Dongzhan Zhou, Xinchi Zhou, Wenwei Zhang, Chen Change Loy, Shuai Yi, Xuesen Zhang, Wanli Ouyang

While many methods have been proposed to improve the efficiency of NAS, the search progress is still laborious because training and evaluating plausible architectures over large search space is time-consuming.

Neural Architecture Search

Side-Aware Boundary Localization for More Precise Object Detection

3 code implementations ECCV 2020 Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin

To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket.

Object object-detection +2

Robust Multi-Modality Multi-Object Tracking

1 code implementation ICCV 2019 Wenwei Zhang, Hui Zhou, Shuyang Sun, Zhe Wang, Jianping Shi, Chen Change Loy

Multi-sensor perception is crucial to ensure the reliability and accuracy in autonomous driving system, while multi-object tracking (MOT) improves that by tracing sequential movement of dynamic objects.

Autonomous Driving Multi-Object Tracking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.