Search Results for author: Haoran Wei

Found 20 papers, 9 papers with code

Ensemble Chinese End-to-End Spoken Language Understanding for Abnormal Event Detection from audio stream

no code implementations19 Oct 2020 Haoran Wei, Fei Tao, Runze Su, Sen yang, Ji Liu

Previous end-to-end SLU models are primarily used for English environment due to lacking large scale SLU dataset in Chines, and use only one ASR model to extract features from speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Arbitrary-Oriented Object Detection in Remote Sensing Images Based on Polar Coordinates

no code implementations IEEE Access 2020 Lin Zhou, Haoran Wei, Hao Li, Wenzhe Zhao, Yi Zhang, Yue Zhang

In this article, we introduce the polar coordinate system to the deep learning detector for the first time, and propose an anchor free Polar Remote Sensing Object Detector (P-RSDet), which can achieve competitive detection accuracy via using simpler object representation model and less regression parameters.

Object object-detection +4

WB-DETR: Transformer-Based Detector Without Backbone

no code implementations ICCV 2021 Fanfan Liu, Haoran Wei, Wenzhe Zhao, Guozhen Li, Jingquan Peng, Zihao Li

In this paper, we propose WB-DETR (DETR-based detector Without Backbone) to prove that the reliance on CNN features extraction for a transformer-based detector is not necessary.

object-detection Object Detection

Non-Homogeneous Haze Removal via Artificial Scene Prior and Bidimensional Graph Reasoning

1 code implementation5 Apr 2021 Haoran Wei, Qingbo Wu, Hui Li, King Ngi Ngan, Hongliang Li, Fanman Meng, Linfeng Xu

In this paper, we propose a Non-Homogeneous Haze Removal Network (NHRN) via artificial scene prior and bidimensional graph reasoning.

Image Dehazing Single Image Dehazing

ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition

1 code implementation19 Aug 2021 Xia Gong, Yuxiang Zhu, Haidi Zhu, Haoran Wei

This paper propose a traditional Chinese music dataset for training model and performance evaluation, named ChMusic.

Information Retrieval Instrument Recognition +2

Non-Parametric Online Learning from Human Feedback for Neural Machine Translation

1 code implementation23 Sep 2021 Dongqi Wang, Haoran Wei, Zhirui Zhang, ShuJian Huang, Jun Xie, Jiajun Chen

We study the problem of online learning with human feedback in the human-in-the-loop machine translation, in which the human translators revise the machine-generated translations and then the corrected translations are used to improve the neural machine translation (NMT) system.

Machine Translation NMT +1

Continuous Human Action Detection Based on Wearable Inertial Data

no code implementations11 Dec 2021 Xia Gong, Yan Lu, Haoran Wei

Human action detection is a hot topic, which is widely used in video surveillance, human machine interface, healthcare monitoring, gaming, dancing training and musical instrument teaching.

Action Detection Gesture Recognition

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

no code implementations3 Nov 2022 Li Li, Dongxing Xu, Haoran Wei, Yanhua Long

Exploiting effective target modeling units is very important and has always been a concern in end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual Information for Real-time Semantic Segmentation

no code implementations2 Dec 2022 Haoran Wei, Xu Liu, Shouchun Xu, Zhongjian Dai, Yaping Dai, Xiangyang Xu

In this method, the multi-rate depth-wise dilated convolutions take a simpler role in feature extraction: performing simple semantic-based morphological filtering with one desired receptive field in the second step based on each concise feature map of region form provided by the first step, to improve their efficiency.

Real-Time Semantic Segmentation

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

1 code implementation7 Jun 2023 Tao Zhang, Xingye Tian, Haoran Wei, Yu Wu, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation.

Autonomous Driving Segmentation +2

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

no code implementations20 Jun 2023 Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

Moreover, we propose to train the Aformer in a multi-pass manner, and investigate three cross-information fusion methods to effectively combine the information from both general and accent encoders.

Accented Speech Recognition speech-recognition

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations18 Jul 2023 Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modelling +1

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation20 Sep 2023 Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

 Ranked #1 on Visual Question Answering on MMBench (GPT-3.5 score metric)

multimodal generation Visual Question Answering +2

Merlin:Empowering Multimodal LLMs with Foresight Minds

no code implementations30 Nov 2023 En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao

Then, FIT requires MLLMs to first predict trajectories of related objects and then reason about potential future events based on them.

Visual Question Answering

Small Language Model Meets with Reinforced Vision Vocabulary

no code implementations23 Jan 2024 Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, En Yu, Jianjian Sun, Chunrui Han, Xiangyu Zhang

In Vary-toy, we introduce an improved vision vocabulary, allowing the model to not only possess all features of Vary but also gather more generality.

Language Modelling Large Language Model +3

Cannot find the paper you are looking for? You can Submit a new open access paper.