Search Results for author: Hao Feng

Found 53 papers, 21 papers with code

LiRCDepth: Lightweight Radar-Camera Depth Estimation via Knowledge Distillation and Uncertainty Guidance

1 code implementation20 Dec 2024 Huawei Sun, Nastassia Vysotskaya, Tobias Sukianto, Hao Feng, Julius Ott, Xiangyuan Peng, Lorenzo Servadei, Robert Wille

Recently, radar-camera fusion algorithms have gained significant attention as radar sensors provide geometric information that complements the limitations of cameras.

Computational Efficiency Depth Estimation +1

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

no code implementations20 Oct 2024 Jinda Jia, Cong Xie, Hanlin Lu, Daoce Wang, Hao Feng, Chengming Zhang, Baixi Sun, Haibin Lin, Zhi Zhang, Xin Liu, Dingwen Tao

Recent years have witnessed a clear trend towards language models with an ever-increasing number of parameters, as well as the growing training overhead and memory usage.

Quantization

EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models

no code implementations20 Oct 2024 Junhao Hu, Wenrui Huang, Haoyi Wang, Weidong Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie

Large Language Models (LLMs) are critical for a wide range of applications, but serving them efficiently becomes increasingly challenging as inputs become more complex.

Chunking Few-Shot Learning +1

GET-UP: GEomeTric-aware Depth Estimation with Radar Points UPsampling

1 code implementation2 Sep 2024 Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille

However, existing algorithms process the inherently noisy and sparse radar data by projecting 3D points onto the image plane for pixel-level feature extraction, overlooking the valuable geometric information contained within the radar point cloud.

Autonomous Driving Depth Estimation +1

AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding

1 code implementation30 Aug 2024 Yonghui Wang, Wengang Zhou, Hao Feng, Houqiang Li

We hypothesize that the requisite number of visual tokens for the model is contingent upon both the resolution and content of the input image.

Language Modelling Large Language Model +2

LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation

1 code implementation25 Aug 2024 Keyi Zhou, Li Li, Wengang Zhou, Yonghui Wang, Hao Feng, Houqiang Li

In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context.

Lane Detection

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

no code implementations5 Jul 2024 Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao

To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training.

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

1 code implementation2 Jul 2024 Jinghui Lu, Haiyang Yu, Yanjie Wang, YongJie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao liu, Can Huang

Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks.

document understanding Key Information Extraction +6

RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation

no code implementations27 Jun 2024 Zhaokang Liao, Hao Feng, Shaokai Liu, Wengang Zhou, Houqiang Li

Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification.

Data Augmentation Local Distortion +1

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

1 code implementation3 Jun 2024 Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, YongJie Ye, Hao liu, Wengang Zhou, Houqiang Li, Can Huang

In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts.

Language Modelling Question Answering +3

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

1 code implementation20 May 2024 Jingqun Tang, Qi Liu, YongJie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao liu, Xiang Bai, Can Huang

Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding.

Benchmarking Question Answering +4

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

no code implementations19 Apr 2024 Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.

Hallucination Hallucination Evaluation +2

Progressive Multi-modal Conditional Prompt Tuning

1 code implementation18 Apr 2024 Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li

Initialization is responsible for encoding images and text using a VLM, followed by a feature filter that selects text features similar to image.

TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding

1 code implementation15 Apr 2024 Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li

The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.

Question Answering Visual Question Answering (VQA)

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

1 code implementation29 Feb 2024 Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

In this work, we present DeepEraser, an effective deep network for generic text removal.

UCE-FID: Using Large Unlabeled, Medium Crowdsourced-Labeled, and Small Expert-Labeled Tweets for Foodborne Illness Detection

no code implementations2 Dec 2023 Ruofan Hu, Dongyu Zhang, Dandan Tao, Huayi Zhang, Hao Feng, Elke Rundensteiner

To overcome these challenges, we propose EGAL, a deep learning framework for foodborne illness detection that uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.

Scalable AI Generative Content for Vehicular Network Semantic Communication

no code implementations23 Nov 2023 Hao Feng, Yi Yang, Zhu Han

Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.

Decoder Semantic Communication

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

1 code implementation22 Nov 2023 Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li

Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.

document understanding Instruction Following +3

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding

no code implementations20 Nov 2023 Hao Feng, Qi Liu, Hao liu, Jingqun Tang, Wengang Zhou, Houqiang Li, Can Huang

This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.

document understanding Language Modeling +3

Progressive Recurrent Network for Shadow Removal

no code implementations1 Nov 2023 Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li

To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).

Image Shadow Removal Shadow Removal

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

no code implementations19 Aug 2023 Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang

However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.

Instruction Following Text Detection +1

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning

no code implementations ICCV 2023 Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li

To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.

Representation Learning

Mobile Supply: The Last Piece of Jigsaw of Recommender System

no code implementations7 Aug 2023 Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu

In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply.

Recommendation Systems Re-Ranking

ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

no code implementations18 Jul 2023 Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan

We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.

Decision Making Recommendation Systems +1

Multi-Task Cross-Modality Attention-Fusion for 2D Object Detection

no code implementations17 Jul 2023 Huawei Sun, Hao Feng, Georg Stettinger, Lorenzo Servadei, Robert Wille

In addition, we introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection, which includes two new fusion blocks.

Autonomous Driving Object +2

Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions

no code implementations16 Jun 2023 Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai

We provide a gradient backpropagation highway for low-rank adapters which eliminates the need for expensive backpropagation through the frozen pre-trained model, resulting in substantial savings of training memory and training time.

Transfer Learning

Active RIS-Assisted mmWave Indoor Signal Enhancement Based on Transparent RIS

no code implementations16 May 2023 Hao Feng, Yuping Zhao

In this paper, a novel RIS-assisted mmWave indoor enhancement scheme is proposed, in which a transparent RIS is deployed on the glass to enhance mmWave indoor signals, and three assisted transmission scenarios, namely passive RIS (PRIS), active RIS (ARIS), and a novel hybrid RIS (HRIS) are proposed.

Model-Based Monitoring and State Estimation for Digital Twins: The Kalman Filter

no code implementations29 Apr 2023 Hao Feng, Cláudio Gomes, Peter Gorm Larsen

A digital twin (DT) monitors states of the physical twin (PT) counterpart and provides a number of benefits such as advanced visualizations, fault detection capabilities, and reduced maintenance cost.

Anomaly Detection Fault Detection

mmWave RIS Phase Shift Feedback Based on Knowledge Base Autoencoder Framework

no code implementations27 Apr 2023 Hao Feng, Yuting Xu, Yuping Zhao

Then the knowledge base vectors index is obtained by calculating the similarity between feature vectors and knowledge base vectors and transmitted to the RIS.

Diffusion Probabilistic Model Based Accurate and High-Degree-of-Freedom Metasurface Inverse Design

no code implementations25 Apr 2023 Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li

Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials.

Evolutionary Algorithms

DocMAE: Document Image Rectification via Self-supervised Representation Learning

1 code implementation20 Apr 2023 Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu

Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.

Representation Learning Self-Supervised Learning

Deep Unrestricted Document Image Rectification

1 code implementation18 Apr 2023 Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li

To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.

Local Distortion

PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation

1 code implementation20 Feb 2023 Mingzhe Liu, Han Huang, Hao Feng, Leilei Sun, Bowen Du, Yanjie Fu

Our proposed framework provides a conditional feature extraction module first to extract the coarse yet effective spatiotemporal dependencies from conditional information as the global context prior.

Imputation Missing Values +1

Geometric Representation Learning for Document Image Rectification

2 code implementations15 Oct 2022 Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.

Representation Learning

Utilizing Explainable AI for improving the Performance of Neural Networks

no code implementations7 Oct 2022 Huawei Sun, Lorenzo Servadei, Hao Feng, Michael Stephan, Robert Wille, Avik Santra

To address this, Explainable Artificial Intelligence (XAI) has been developing as a field that aims to improve the transparency of the model and increase their trustworthiness.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

Towards Efficient Modularity in Industrial Drying: A Combinatorial Optimization Viewpoint

no code implementations5 Oct 2022 Alisina Bayati, Amber Srivastava, Amir Malvandi, Hao Feng, Srinivasa Salapaka

The industrial drying process consumes approximately 12% of the total energy used in manufacturing, with the potential for a 40% reduction in energy usage through improved process controls and the development of new drying technologies.

Combinatorial Optimization

Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search

no code implementations4 Oct 2022 Ziyang Liu, Chaokun Wang, Hao Feng, Lingfei Wu, Liqun Yang

In this paper, we design an efficient knowledge distillation framework for e-commerce relevance matching to integrate the respective advantages of Transformer-style models and classical relevance matching models.

Knowledge Distillation

TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks

no code implementations LREC 2022 Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke Rundensteiner

To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks.

slot-filling Slot Filling

DocScanner: Robust Document Image Rectification with Progressive Learning

3 code implementations28 Oct 2021 Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li

The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.

Optical Character Recognition (OCR)

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction

2 code implementations25 Oct 2021 Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li

Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.

Optical Character Recognition (OCR)

Rethinking Temperature in Graph Contrastive Learning

no code implementations29 Sep 2021 Ziyang Liu, Hao Feng, Chaokun Wang

In this paper, we investigate and discuss what a good representation should be for a general loss (InfoNCE) in graph contrastive learning.

Contrastive Learning Inductive Learning +1

ES-Net: Erasing Salient Parts to Learn More in Re-Identification

no code implementations10 Mar 2021 Dong Shen, Shuai Zhao, Jinming Hu, Hao Feng, Deng Cai, Xiaofei He

In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image.

Complementary Pseudo Labels For Unsupervised Domain Adaptation On Person Re-identification

no code implementations29 Jan 2021 Hao Feng, Minghao Chen, Jinming Hu, Dong Shen, Haifeng Liu, Deng Cai

In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels.

Person Re-Identification Unsupervised Domain Adaptation

High-Performance Discriminative Tracking With Transformers

no code implementations ICCV 2021 Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, Hanqing Lu

End-to-end discriminative trackers improve the state of the art significantly, yet the improvement in robustness and efficiency is restricted by the conventional discriminative model, i. e., least-squares based regression.

Decoder Object +2

STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

1 code implementation18 Jun 2020 Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang

We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion.

Taxonomy Expansion

Discovering Protagonist of Sentiment with Aspect Reconstructed Capsule Network

no code implementations23 Dec 2019 Chi Xu, Hao Feng, Guoxin Yu, Min Yang, Xiting Wang, Xiang Ao

In this paper, we aim to improve ATSA by discovering the potential aspect terms of the predicted sentiment polarity when the aspect terms of a test sentence are unknown.

Sentence Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.