1 code implementation • 31 Dec 2024 • Ling Fu, Biao Yang, Zhebin Kuang, Jiajun Song, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Mingxin Huang, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai
Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently.
1 code implementation • 20 Dec 2024 • Huawei Sun, Nastassia Vysotskaya, Tobias Sukianto, Hao Feng, Julius Ott, Xiangyuan Peng, Lorenzo Servadei, Robert Wille
Recently, radar-camera fusion algorithms have gained significant attention as radar sensors provide geometric information that complements the limitations of cameras.
no code implementations • 20 Oct 2024 • Jinda Jia, Cong Xie, Hanlin Lu, Daoce Wang, Hao Feng, Chengming Zhang, Baixi Sun, Haibin Lin, Zhi Zhang, Xin Liu, Dingwen Tao
Recent years have witnessed a clear trend towards language models with an ever-increasing number of parameters, as well as the growing training overhead and memory usage.
no code implementations • 20 Oct 2024 • Junhao Hu, Wenrui Huang, Haoyi Wang, Weidong Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie
Large Language Models (LLMs) are critical for a wide range of applications, but serving them efficiently becomes increasingly challenging as inputs become more complex.
1 code implementation • 2 Sep 2024 • Huawei Sun, Zixu Wang, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille
However, existing algorithms process the inherently noisy and sparse radar data by projecting 3D points onto the image plane for pixel-level feature extraction, overlooking the valuable geometric information contained within the radar point cloud.
1 code implementation • 30 Aug 2024 • Yonghui Wang, Wengang Zhou, Hao Feng, Houqiang Li
We hypothesize that the requisite number of visual tokens for the model is contingent upon both the resolution and content of the input image.
1 code implementation • 25 Aug 2024 • Keyi Zhou, Li Li, Wengang Zhou, Yonghui Wang, Hao Feng, Houqiang Li
In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context.
no code implementations • 5 Jul 2024 • Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao
To mitigate this, we introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training.
1 code implementation • 2 Jul 2024 • Jinghui Lu, Haiyang Yu, Yanjie Wang, YongJie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao liu, Can Huang
Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks.
1 code implementation • 30 Jun 2024 • Huawei Sun, Hao Feng, Julius Ott, Lorenzo Servadei, Robert Wille
Depth estimation is critical in autonomous driving for interpreting 3D scenes accurately.
no code implementations • 27 Jun 2024 • Zhaokang Liao, Hao Feng, Shaokai Liu, Wengang Zhou, Houqiang Li
Existing rectification methods are limited to central fisheye images, while this paper proposes a novel method that extends to deviated fisheye image rectification.
1 code implementation • 3 Jun 2024 • Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu, Lei Liao, YongJie Ye, Hao liu, Wengang Zhou, Houqiang Li, Can Huang
In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts.
1 code implementation • 20 May 2024 • Jingqun Tang, Qi Liu, YongJie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao liu, Xiang Bai, Can Huang
Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding.
no code implementations • 19 Apr 2024 • Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang
Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.
1 code implementation • 18 Apr 2024 • Xiaoyu Qiu, Hao Feng, Yuechen Wang, Wengang Zhou, Houqiang Li
Initialization is responsible for encoding images and text using a VLM, followed by a feature filter that selects text features similar to image.
no code implementations • 16 Apr 2024 • Hao Feng, Yuanzhe Jia, Ruijia Xu, Mukesh Prasad, Ali Anaissi, Ali Braytee
Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts.
1 code implementation • 15 Apr 2024 • Bozhi Luan, Hao Feng, Hong Chen, Yonghui Wang, Wengang Zhou, Houqiang Li
The image overview stage provides a comprehensive understanding of the global scene information, and the coarse localization stage approximates the image area containing the answer based on the question asked.
no code implementations • 9 Apr 2024 • Huawei Sun, Hao Feng, Gianfranco Mauro, Julius Ott, Georg Stettinger, Lorenzo Servadei, Robert Wille
Radar and camera fusion yields robustness in perception tasks by leveraging the strength of both sensors.
1 code implementation • 29 Feb 2024 • Hao Feng, Wendi Wang, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li
In this work, we present DeepEraser, an effective deep network for generic text removal.
no code implementations • 2 Dec 2023 • Ruofan Hu, Dongyu Zhang, Dandan Tao, Huayi Zhang, Hao Feng, Elke Rundensteiner
To overcome these challenges, we propose EGAL, a deep learning framework for foodborne illness detection that uses small expert-labeled tweets augmented by crowdsourced-labeled and massive unlabeled data.
no code implementations • 23 Nov 2023 • Hao Feng, Yi Yang, Zhu Han
Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.
1 code implementation • 22 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Keyi Zhou, Houqiang Li
Moreover, we curate a collection of text-rich images and prompt the text-only GPT-4 to generate 12K high-quality conversations, featuring textual locations within text-rich scenarios.
no code implementations • 20 Nov 2023 • Hao Feng, Qi Liu, Hao liu, Jingqun Tang, Wengang Zhou, Houqiang Li, Can Huang
This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2, 560$\times$2, 560 resolution.
no code implementations • 1 Nov 2023 • Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li
To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet).
no code implementations • ICCV 2023 • Huijie Yao, Wengang Zhou, Hao Feng, Hezhen Hu, Hao Zhou, Houqiang Li
Technically, IP-SLT consists of feature extraction, prototype initialization, and iterative prototype refinement.
Ranked #6 on Sign Language Translation on CSL-Daily
no code implementations • 19 Aug 2023 • Hao Feng, Zijian Wang, Jingqun Tang, Jinghui Lu, Wengang Zhou, Houqiang Li, Can Huang
However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored.
no code implementations • ICCV 2023 • Hao Feng, Wendi Wang, Jiajun Deng, Wengang Zhou, Li Li, Houqiang Li
To make the best of such rectification cues, we introduce SimFIR, a simple framework for fisheye image rectification based on self-supervised representation learning.
no code implementations • 7 Aug 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu
In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply.
no code implementations • 18 Jul 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan
We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.
no code implementations • 17 Jul 2023 • Huawei Sun, Hao Feng, Georg Stettinger, Lorenzo Servadei, Robert Wille
In addition, we introduce a Multi-Task Cross-Modality Attention-Fusion Network (MCAF-Net) for object detection, which includes two new fusion blocks.
no code implementations • 16 Jun 2023 • Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai
We provide a gradient backpropagation highway for low-rank adapters which eliminates the need for expensive backpropagation through the frozen pre-trained model, resulting in substantial savings of training memory and training time.
no code implementations • 16 May 2023 • Hao Feng, Yuping Zhao
In this paper, a novel RIS-assisted mmWave indoor enhancement scheme is proposed, in which a transparent RIS is deployed on the glass to enhance mmWave indoor signals, and three assisted transmission scenarios, namely passive RIS (PRIS), active RIS (ARIS), and a novel hybrid RIS (HRIS) are proposed.
no code implementations • 29 Apr 2023 • Hao Feng, Cláudio Gomes, Peter Gorm Larsen
A digital twin (DT) monitors states of the physical twin (PT) counterpart and provides a number of benefits such as advanced visualizations, fault detection capabilities, and reduced maintenance cost.
no code implementations • 27 Apr 2023 • Hao Feng, Yuting Xu, Yuping Zhao
Then the knowledge base vectors index is obtained by calculating the similarity between feature vectors and knowledge base vectors and transmitted to the RIS.
no code implementations • 25 Apr 2023 • Zezhou Zhang, Chuanchuan Yang, Yifeng Qin, Hao Feng, Jiqiang Feng, Hongbin Li
Inverse design methods based on optimization algorithms, such as evolutionary algorithms, and topological optimizations, have been introduced to design metamaterials.
1 code implementation • 20 Apr 2023 • Shaokai Liu, Hao Feng, Wengang Zhou, Houqiang Li, Cong Liu, Feng Wu
Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored.
1 code implementation • 18 Apr 2023 • Hao Feng, Shaokai Liu, Jiajun Deng, Wengang Zhou, Houqiang Li
To our best knowledge, this is the first learning-based method for the rectification of unrestricted document images.
Ranked #1 on Local Distortion on DocUNet
1 code implementation • 20 Feb 2023 • Mingzhe Liu, Han Huang, Hao Feng, Leilei Sun, Bowen Du, Yanjie Fu
Our proposed framework provides a conditional feature extraction module first to extract the coarse yet effective spatiotemporal dependencies from conditional information as the global context prior.
1 code implementation • 21 Jan 2023 • Hao Feng, Keyi Zhou, Wengang Zhou, Yufei Yin, Jiajun Deng, Qi Sun, Houqiang Li
It maintains a single estimate of the contour that is progressively deformed toward the object boundary.
Ranked #1 on Semantic Contour Prediction on Sbd val
2 code implementations • 15 Oct 2022 • Hao Feng, Wengang Zhou, Jiajun Deng, Yuechen Wang, Houqiang Li
In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one.
no code implementations • 7 Oct 2022 • Huawei Sun, Lorenzo Servadei, Hao Feng, Michael Stephan, Robert Wille, Avik Santra
To address this, Explainable Artificial Intelligence (XAI) has been developing as a field that aims to improve the transparency of the model and increase their trustworthiness.
Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
no code implementations • 5 Oct 2022 • Alisina Bayati, Amber Srivastava, Amir Malvandi, Hao Feng, Srinivasa Salapaka
The industrial drying process consumes approximately 12% of the total energy used in manufacturing, with the potential for a 40% reduction in energy usage through improved process controls and the development of new drying technologies.
no code implementations • 4 Oct 2022 • Ziyang Liu, Chaokun Wang, Hao Feng, Lingfei Wu, Liqun Yang
In this paper, we design an efficient knowledge distillation framework for e-commerce relevance matching to integrate the respective advantages of Transformer-style models and classical relevance matching models.
no code implementations • LREC 2022 • Ruofan Hu, Dongyu Zhang, Dandan Tao, Thomas Hartvigsen, Hao Feng, Elke Rundensteiner
To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks.
no code implementations • 31 Mar 2022 • Souvik Hazra, Hao Feng, Gamze Naz Kiprit, Michael Stephan, Lorenzo Servadei, Robert Wille, Robert Weigel, Avik Santra
Gesture recognition is one of the most intuitive ways of interaction and has gathered particular attention for human computer interaction.
3 code implementations • 28 Oct 2021 • Hao Feng, Wengang Zhou, Jiajun Deng, Qi Tian, Houqiang Li
The iterative refinements make DocScanner converge to a robust and superior rectification performance, while the lightweight recurrent architecture ensures the running efficiency.
2 code implementations • 25 Oct 2021 • Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li
Specifically, DocTr consists of a geometric unwarping transformer and an illumination correction transformer.
no code implementations • 29 Sep 2021 • Ziyang Liu, Hao Feng, Chaokun Wang
In this paper, we investigate and discuss what a good representation should be for a general loss (InfoNCE) in graph contrastive learning.
no code implementations • 10 Mar 2021 • Dong Shen, Shuai Zhao, Jinming Hu, Hao Feng, Deng Cai, Xiaofei He
In this paper, we propose a novel network, Erasing-Salient Net (ES-Net), to learn comprehensive features by erasing the salient areas in an image.
no code implementations • 29 Jan 2021 • Hao Feng, Minghao Chen, Jinming Hu, Dong Shen, Haifeng Liu, Deng Cai
In this paper, to complement these low recall neighbor pseudo labels, we propose a joint learning framework to learn better feature embeddings via high precision neighbor pseudo labels and high recall group pseudo labels.
no code implementations • ICCV 2021 • Bin Yu, Ming Tang, Linyu Zheng, Guibo Zhu, Jinqiao Wang, Hao Feng, Xuetao Feng, Hanqing Lu
End-to-end discriminative trackers improve the state of the art significantly, yet the improvement in robustness and efficiency is restricted by the conventional discriminative model, i. e., least-squares based regression.
1 code implementation • 18 Jun 2020 • Yue Yu, Yinghao Li, Jiaming Shen, Hao Feng, Jimeng Sun, Chao Zhang
We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion.
no code implementations • 23 Dec 2019 • Chi Xu, Hao Feng, Guoxin Yu, Min Yang, Xiting Wang, Xiang Ao
In this paper, we aim to improve ATSA by discovering the potential aspect terms of the predicted sentiment polarity when the aspect terms of a test sentence are unknown.