Search Results for author: Conghui He

Found 49 papers, 35 papers with code

KeyVideoLLM: Towards Large-scale Video Keyframe Selection

no code implementations3 Jul 2024 Hao Liang, Jiapeng Li, Tianyi Bai, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important.

Data Compression Management +3

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

1 code implementation24 Jun 2024 Tong Zhu, Xiaoye Qu, Daize Dong, Jiacheng Ruan, Jingqi Tong, Conghui He, Yu Cheng

Motivated by this limit, we investigate building MoE models from existing dense large language models.

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

no code implementations4 Jun 2024 Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin

The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications.

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

1 code implementation28 May 2024 Bin Wang, Linke Ouyang, Fan Wu, Wenchang Ning, Xiao Han, Zhiyuan Zhao, Jiahui Peng, Yiying Jiang, Dahua Lin, Conghui He

In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs.

Diversity

A Survey of Multimodal Large Language Model from A Data-centric Perspective

1 code implementation26 May 2024 Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Ping Huang, Jiulong Shan, Conghui He, Binhang Yuan, Wentao Zhang

Multimodal large language models (MLLMs) enhance the capabilities of standard large language models by integrating and processing data from multiple modalities, including text, vision, audio, video, and 3D environments.

Language Modelling Large Language Model +1

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

no code implementations29 Apr 2024 Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture.

Common Sense Reasoning Multiple-choice

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

1 code implementation23 Apr 2024 Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios.

Diversity Image Augmentation

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

1 code implementation CVPR 2024 Weijia Li, Haote Yang, Zhenghao Hu, Juepeng Zheng, Gui-Song Xia, Conghui He

3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications.

3D Reconstruction

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation

1 code implementation CVPR 2024 Junyan Ye, Qiyan Luo, Jinhua Yu, Huaping Zhong, Zhimeng Zheng, Conghui He, Weijia Li

This paper aims at achieving fine-grained building attribute segmentation in a cross-view scenario, i. e., using satellite and street-view image pairs.

Attribute Semantic Segmentation

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

1 code implementation29 Mar 2024 Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs.

Hallucination Language Modelling +2

InternLM2 Technical Report

2 code implementations26 Mar 2024 Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

4k Long-Context Understanding

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations

1 code implementation21 Mar 2024 Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He

We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense.

Benchmarking Memorization

LOCR: Location-Guided Transformer for Optical Character Recognition

no code implementations4 Mar 2024 Yu Sun, Dongzhan Zhou, Chen Lin, Conghui He, Wanli Ouyang, Han-sen Zhong

Academic documents are packed with texts, equations, tables, and figures, requiring comprehensive understanding for accurate Optical Character Recognition (OCR).

Marketing Optical Character Recognition +1

LongWanjuan: Towards Systematic Measurement for Long Text Quality

1 code implementation21 Feb 2024 Kai Lv, Xiaoran Liu, Qipeng Guo, Hang Yan, Conghui He, Xipeng Qiu, Dahua Lin

The quality of training data are crucial for enhancing the long-text capabilities of foundation models.

Diversity Language Modelling

Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora

no code implementations26 Jan 2024 Zhaoye Fei, Yunfan Shao, Linyang Li, Zhiyuan Zeng, Conghui He, Hang Yan, Dahua Lin, Xipeng Qiu

Large language models have demonstrated remarkable potential in various tasks, however, there remains a significant scarcity of open-source models and data for specific domains.

Language Modelling Large Language Model

Parrot Captions Teach CLIP to Spot Text

1 code implementation21 Dec 2023 Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou

Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.

Representation Learning text similarity +1

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

1 code implementation CVPR 2024 Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary.

Hallucination

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

1 code implementation28 Nov 2023 Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images.

Hallucination

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

1 code implementation21 Nov 2023 Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin

In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.

Descriptive visual instruction following +2

IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts

1 code implementation9 Oct 2023 Bohan Zeng, Shanglin Li, Yutang Feng, Ling Yang, Hong Li, Sicheng Gao, Jiaming Liu, Conghui He, Wentao Zhang, Jianzhuang Liu, Baochang Zhang, Shuicheng Yan

However, the appearance of 3D objects produced by these text-to-3D models is unpredictable, and it is hard for the single-image-to-3D methods to deal with complex images, thus posing a challenge in generating appearance-controllable 3D objects.

3D Generation Image to 3D +2

MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models

no code implementations21 Sep 2023 Yidong Liu, FuKai Shang, Fang Wang, Rui Xu, Jun Wang, Wei Li, Yao Li, Conghui He

With the advancement of deep learning technologies, general-purpose large models such as GPT-4 have demonstrated exceptional capabilities across various domains.

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

1 code implementation25 Aug 2023 Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost.

Benchmarking

VIGC: Visual Instruction Generation and Correction

2 code implementations24 Aug 2023 Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He

A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.

Hallucination Image Captioning +1

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

1 code implementation21 Aug 2023 Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin

The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

3 code implementations28 Apr 2023 Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao

This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.

Instruction Following Optical Character Recognition (OCR) +7

V3Det: Vast Vocabulary Visual Detection Dataset

1 code implementation ICCV 2023 Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin

2) Hierarchical Category Organization: The vast vocabulary of V3Det is organized by a hierarchical category tree which annotates the inclusion relationship among categories, encouraging the exploration of category relationships in vast and open vocabulary object detection.

Chatbot Object +2

SEPT: Towards Scalable and Efficient Visual Pre-Training

no code implementations11 Dec 2022 Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li, Conghui He, Lin Wang

To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.

Retrieval

OmniCity: Omnipotent City Understanding with Multi-level and Multi-view Images

no code implementations CVPR 2023 Weijia Li, Yawen Lai, Linning Xu, Yuanbo Xiangli, Jinhua Yu, Conghui He, Gui-Song Xia, Dahua Lin

More precisely, the OmniCity contains multi-view satellite images as well as street-level panorama and mono-view images, constituting over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.

Instance Segmentation Segmentation +1

Exploring the Interactive Guidance for Unified and Effective Image Matting

1 code implementation17 May 2022 Dinghao Yang, Bin Wang, Weijia Li, Yiqi Lin, Conghui He

Although avoiding the extensive labors of trimap annotation, existing methods still suffer from two limitations: (1) For the single image with multiple objects, it is essential to provide extra interaction information to help determining the matting target; (2) For transparent objects, the accurate regression of alpha matte from RGB image is much more difficult compared with the opaque ones.

Foreground Segmentation Image Matting +1

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark

2 code implementations21 Mar 2022 Li Chen, Chonghao Sima, Yang Li, Zehan Zheng, Jiajie Xu, Xiangwei Geng, Hongyang Li, Conghui He, Jianping Shi, Yu Qiao, Junchi Yan

Methods for 3D lane detection have been recently proposed to address the issue of inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.).

3D Lane Detection Autonomous Driving +1

INTERN: A New Learning Paradigm Towards General Vision

no code implementations16 Nov 2021 Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Influence Selection for Active Learning

1 code implementation ICCV 2021 Zhuoming Liu, Hao Ding, Huaping Zhong, Weijia Li, Jifeng Dai, Conghui He

To obtain the Influence of the unlabeled sample in the active learning scenario, we design the Untrained Unlabeled sample Influence Calculation(UUIC) to estimate the unlabeled sample's expected gradient with which we calculate its Influence.

Active Learning Diversity

3D Building Reconstruction From Monocular Remote Sensing Images

no code implementations ICCV 2021 Weijia Li, Lingxuan Meng, Jinwang Wang, Conghui He, Gui-Song Xia, Dahua Lin

3D building reconstruction from monocular remote sensing imagery is an important research problem and an economic solution to large-scale city modeling, compared with reconstruction from LiDAR data and multi-view imagery.

3D Reconstruction Model Optimization

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-Based Point Clouds

no code implementations20 Nov 2020 Tai Wang, Conghui He, Zhe Wang, Jianping Shi, Dahua Lin

Recent years have witnessed the rapid progress of perception algorithms on top of LiDAR, a widely adopted sensor for autonomous driving systems.

Autonomous Driving

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

no code implementations16 Mar 2019 Jiarui Fang, Liandeng Li, Haohuan Fu, Jinlei Jiang, Wenlai Zhao, Conghui He, Xin You, Guangwen Yang

Second, we propose a set of optimization strategies for redesigning a variety of neural network layers based on Caffe.

Cannot find the paper you are looking for? You can Submit a new open access paper.