Search Results for author: Dong Wang

Found 322 papers, 118 papers with code

Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language

1 code implementation1 Feb 2025 Turi Abu, Ying Shi, Thomas Fang Zheng, Dong Wang

We present a novel Automatic Speech Recognition (ASR) dataset for the Oromo language, a widely spoken language in Ethiopia and neighboring regions.

Automatic Speech Recognition (ASR)

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

no code implementations27 Jan 2025 Delin Qu, Haoming Song, Qizhi Chen, Yuanqi Yao, Xinyi Ye, Yan Ding, Zhigang Wang, Jiayuan Gu, Bin Zhao, Dong Wang, Xuelong Li

Specifically, we introduce Ego3D Position Encoding to inject 3D information into the input observations of the visual-language-action model, and propose Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control.

Robot Manipulation

Gradient-Free Adversarial Purification with Diffusion Models

no code implementations23 Jan 2025 Xuelong Dai, Dong Wang, Duan Mingxing, Bin Xiao

In this paper, we propose an effective and efficient adversarial defense method that counters both perturbation-based and unrestricted adversarial attacks.

Adversarial Purification Super-Resolution

Modality Interactive Mixture-of-Experts for Fake News Detection

no code implementations21 Jan 2025 Yifan Liu, Yaokun Liu, Zelin Li, Ruichen Yao, Yang Zhang, Dong Wang

The proliferation of fake news on social media platforms disproportionately impacts vulnerable populations, eroding trust, exacerbating inequality, and amplifying harmful narratives.

Fake News Detection Misinformation

A Foundational Generative Model for Breast Ultrasound Image Analysis

no code implementations12 Jan 2025 Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, LiWei Wang

Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models.

Prognosis

SUTrack: Towards Simple and Unified Single Object Tracking

1 code implementation26 Dec 2024 Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu

It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session.

Object Tracking

TimeRAG: BOOSTING LLM Time Series Forecasting via Retrieval-Augmented Generation

no code implementations21 Dec 2024 Silin Yang, Dong Wang, Haoqi Zheng, Ruochun Jin

Experiments on datasets from various domains show that the integration of RAG improved the prediction accuracy of the original model by 2. 97% on average.

Dynamic Time Warping RAG +3

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models

no code implementations18 Dec 2024 Xinghang Li, Peiyan Li, Minghuan Liu, Dong Wang, Jirong Liu, Bingyi Kang, Xiao Ma, Tao Kong, Hanbo Zhang, Huaping Liu

The obtained results convince us firmly to explain why we need VLA and develop a new family of VLAs, RoboVLMs, which require very few manual designs and achieve a new state-of-the-art performance in three simulation tasks and real-world experiments.

Representation Learning

Controlling the Latent Diffusion Model for Generative Image Shadow Removal via Residual Generation

no code implementations3 Dec 2024 Xinjie Li, Yang Zhao, Dong Wang, Yuan Chen, Li Cao, Xiaoping Liu

Large-scale generative models have achieved remarkable advancements in various visual tasks, yet their application to shadow removal in images remains challenging.

Image Reconstruction Image Shadow Removal +1

ContextGNN: Beyond Two-Tower Recommendation Systems

1 code implementation29 Nov 2024 Yiwen Yuan, Zecheng Zhang, Xinwei He, Akihiro Nitta, Weihua Hu, Dong Wang, Manan Shah, Shenyang Huang, Blaž Stojanovič, Alan Krumholz, Jan Eric Lenssen, Jure Leskovec, Matthias Fey

Recommendation systems predominantly utilize two-tower architectures, which evaluate user-item rankings through the inner product of their respective embeddings.

Link Prediction Recommendation Systems

Improving Transferable Targeted Attacks with Feature Tuning Mixup

no code implementations23 Nov 2024 Kaisheng Liang, Xuelong Dai, YanJie Li, Dong Wang, Bin Xiao

Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations.

Night-to-Day Translation via Illumination Degradation Disentanglement

no code implementations21 Nov 2024 Guanzhou Lan, YuQi Yang, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

Specifically, our method comprises a degradation disentanglement module and a degradation-aware contrastive learning module.

Contrastive Learning Disentanglement +1

Zero-shot Dynamic MRI Reconstruction with Global-to-local Diffusion Model

1 code implementation6 Nov 2024 Yu Guan, Kunlong Zhang, Qi Qi, Dong Wang, Ziwen Ke, Shaoyu Wang, Dong Liang, Qiegen Liu

Diffusion models have recently demonstrated considerable advancement in the generation and reconstruction of magnetic resonance imaging (MRI) data.

MRI Reconstruction

Transferable Sequential Recommendation via Vector Quantized Meta Learning

no code implementations4 Nov 2024 Zhenrui Yue, Huimin Zeng, Yang Zhang, Julian McAuley, Dong Wang

Without requiring additional modalities or shared information across domains, our approach leverages user-item interactions from multiple source domains to improve the target domain performance.

Meta-Learning Quantization +1

Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

no code implementations30 Oct 2024 Yanchu Guan, Dong Wang, Yan Wang, Haiqing Wang, Renen Sun, Chenyi Zhuang, Jinjie Gu, Zhixuan Chu

In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction.

Code Generation Language Modeling +3

FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

no code implementations29 Oct 2024 Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li

Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints.

Optical Flow Estimation

An Actor-Critic Approach to Boosting Text-to-SQL Large Language Model

no code implementations28 Oct 2024 Ziyang Zheng, Haipeng Jing, Canyu Rui, Askar Hamdulla, Dong Wang

In this paper, we propose a simple, general, and performance guaranteed T2S enhancement approach called Actor-Critic (AC).

Language Modeling Language Modelling +2

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

1 code implementation26 Oct 2024 Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen

Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding.

Continual Learning multimodal interaction

AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition

1 code implementation21 Oct 2024 Zehua Liu, Xiaolou Li, Chen Chen, Li Guo, Lantian Li, Dong Wang

Then, based on the temporal correspondence between audio and video, a frame-level local alignment loss is introduced to refine the global alignment, improving the utility of the audio information.

cross-modal alignment speech-recognition +1

NeuralMAG: Fast and Generalizable Micromagnetic Simulation with Deep Neural Nets

1 code implementation19 Oct 2024 Yunqi Cai, Jiangnan Li, Dong Wang

Micromagnetics has made significant strides, particularly due to its wide-ranging applications in magnetic storage design.

Efficient Diffusion as Low Light Enhancer

no code implementations16 Oct 2024 Guanzhou Lan, Qianli Ma, YuQi Yang, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao

In this paper, we identify two primary factors contributing to performance degradation: fitting errors and the inference gap.

Low-Light Image Enhancement

Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning

no code implementations11 Oct 2024 Yunpeng Gao, Zhigang Wang, Linglin Jing, Dong Wang, Xuelong Li, Bin Zhao

Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues.

Language Modeling Language Modelling +4

Inference Scaling for Long-Context Retrieval Augmented Generation

no code implementations6 Oct 2024 Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky

Our observations reveal that increasing inference computation leads to nearly linear gains in RAG performance when optimally allocated, a relationship we describe as the inference scaling laws for RAG.

In-Context Learning RAG +1

Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

no code implementations29 Sep 2024 Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention.

Audio-Visual Speech Recognition Lip Reading +3

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

1 code implementation25 Sep 2024 Yueqi Wang, Zhenrui Yue, Huimin Zeng, Dong Wang, Julian McAuley

Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions.

Multimodal Recommendation Representation Learning +1

COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models

1 code implementation23 Sep 2024 Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors.

Task Planning

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

no code implementations18 Sep 2024 Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li

To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings.

Task Planning

Full-text Error Correction for Chinese Speech Recognition with Large Language Model

no code implementations12 Sep 2024 Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

no code implementations27 Aug 2024 Yifan Liu, Yike Li, Dong Wang

Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains.

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

no code implementations23 Aug 2024 Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao

3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics.

Human-Object Interaction Detection Object

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

no code implementations15 Aug 2024 Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture.

Mamba Rgb-T Tracking

KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

no code implementations6 Aug 2024 Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the mechanisms of "how to do".

Efficient Exploration Imitation Learning +1

Flexible Beam Coverage Optimization for Movable-Antenna Array

no code implementations1 Aug 2024 Dong Wang, Weidong Mei, Boyu Ning, Zhi Chen

Fluid antennas (FAs) and movable antennas (MAs) have attracted increasing attention in wireless communications recently.

Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

no code implementations23 Jul 2024 Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, LiWei Wang

In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33. 5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process.

Decision Making Specificity

Serialized Output Training by Learned Dominance

no code implementations4 Jul 2024 Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han

Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score.

Decoder speech-recognition +1

WANCO: Weak Adversarial Networks for Constrained Optimization problems

no code implementations4 Jul 2024 Gang Bao, Dong Wang, Boyi Zou

This paper focuses on integrating the networks and adversarial training into constrained optimization problems to develop a framework algorithm for constrained optimization problems.

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

1 code implementation2 Jul 2024 Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang

Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Joint Beamforming and Antenna Position Optimization for Movable Antenna-Assisted Spectrum Sharing

no code implementations28 Jun 2024 Xin Wei, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen

However, such an optimization problem is difficult to be optimally solved due to the highly nonlinear functions of the received signal/interference power at the SR/all PRs in terms of the MA positions.

Position

Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

no code implementations14 Jun 2024 Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, Dong Wang

Upon input claims, RAFTS starts with evidence retrieval, where we design a retrieval pipeline to collect and re-rank relevant documents from verifiable sources.

Decision Making Fact Verification +2

CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge

no code implementations14 Jun 2024 Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang

The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers.

speech-recognition Visual Speech Recognition

AD-H: Autonomous Driving with Hierarchical Agents

no code implementations5 Jun 2024 Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers.

Autonomous Driving Text Generation

Your Causal Self-Attentive Recommender Hosts a Lonely Neighborhood

3 code implementations4 Jun 2024 Yueqi Wang, Zhankui He, Zhenrui Yue, Julian McAuley, Dong Wang

In the context of sequential recommendation, a pivotal issue pertains to the comparative analysis between bi-directional/auto-encoding (AE) and uni-directional/auto-regressive (AR) attention mechanisms, where the conclusions regarding architectural and performance superiority remain inconclusive.

feature selection Inductive Bias +1

MOSEAC: Streamlined Variable Time Step Reinforcement Learning

1 code implementation3 Jun 2024 Dong Wang, Giovanni Beltrame

This validation shows that MOSEAC streamlines RL algorithm deployment by automatically tuning the agent control loop frequency using a single parameter.

reinforcement-learning Reinforcement Learning +1

Learning Manipulation by Predicting Interaction

1 code implementation1 Jun 2024 Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively.

Representation Learning

Lane Segmentation Refinement with Diffusion Models

no code implementations1 May 2024 Antonio Ruiz, Andrew Melnik, Dong Wang, Helge Ritter

The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning.

Autonomous Driving Segmentation

Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification

no code implementations23 Apr 2024 Yingquan Wang, Pingping Zhang, Dong Wang, Huchuan Lu

In this work, we first explore the influence of global and local features of ViT and then further propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID.

Object

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

7 code implementations11 Apr 2024 Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.

3D geometry parameter-efficient fine-tuning

Open-Vocabulary Federated Learning with Multimodal Prototyping

1 code implementation1 Apr 2024 Huimin Zeng, Zhenrui Yue, Dong Wang

A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems.

Federated Learning

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

no code implementations CVPR 2024 Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.

Image Reconstruction Segmentation +2

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

2 code implementations CVPR 2024 Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He

Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset.

Continual Learning Incremental Learning +1

Federated Recommendation via Hybrid Retrieval Augmented Generation

1 code implementation7 Mar 2024 Huimin Zeng, Zhenrui Yue, Qian Jiang, Dong Wang

To this end, we propose GPT-FedRec, a federated recommendation framework leveraging ChatGPT and a novel hybrid Retrieval Augmented Generation (RAG) mechanism.

Hallucination Privacy Preserving +3

Reinforcement Learning with Elastic Time Steps

2 code implementations22 Feb 2024 Dong Wang, Giovanni Beltrame

Traditional Reinforcement Learning (RL) policies are typically implemented with fixed control rates, often disregarding the impact of control rate selection.

reinforcement-learning Reinforcement Learning +1

Adversarial Data Augmentation for Robust Speaker Verification

no code implementations5 Feb 2024 Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations.

Data Augmentation Speaker Verification

Off-Policy Primal-Dual Safe Reinforcement Learning

2 code implementations26 Jan 2024 Zifan Wu, Bo Tang, Qian Lin, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang

Results on benchmark tasks show that our method not only achieves an asymptotic performance comparable to state-of-the-art on-policy methods while using much fewer samples, but also significantly reduces constraint violation during training.

reinforcement-learning Reinforcement Learning +1

Deployable Reinforcement Learning with Variable Control Rate

1 code implementation17 Jan 2024 Dong Wang, Giovanni Beltrame

Unfortunately, the system should be controlled at the highest, worst-case frequency to ensure stability, which can demand significant computational and energy resources and hinder the deployability of the controller on onboard hardware.

reinforcement-learning Reinforcement Learning +1

RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender Systems

no code implementations27 Dec 2023 Jiahong Zhou, Shunhui Mao, Guoliang Yang, Bo Tang, Qianlong Xie, Lebin Lin, Xingxing Wang, Dong Wang

The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i. e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints.

channel selection Model Selection +2

Noise Distribution Decomposition based Multi-Agent Distributional Reinforcement Learning

no code implementations12 Dec 2023 Wei Geng, Baidi Xiao, Rongpeng Li, Ning Wei, Dong Wang, Zhifeng Zhao

In this paper, we propose a novel decomposition-based multi-agent distributional RL method by approximating the globally shared noisy reward by a Gaussian mixture model (GMM) and decomposing it into the combination of individual distributional local rewards, with which each agent can be updated locally through distributional RL.

Distributional Reinforcement Learning Multi-agent Reinforcement Learning +3

Calibration-free quantitative phase imaging in multi-core fiber endoscopes using end-to-end deep learning

no code implementations12 Dec 2023 Jiawei Sun, Bin Zhao, Dong Wang, Zhigang Wang, Jie Zhang, Nektarios Koukourakis, Juergen W. Czarske, Xuelong Li

Quantitative phase imaging (QPI) through multi-core fibers (MCFs) has been an emerging in vivo label-free endoscopic imaging modality with minimal invasiveness.

Retrieval

Intelligent Virtual Assistants with LLM-based Process Automation

no code implementations4 Dec 2023 Yanchu Guan, Dong Wang, Zhixuan Chu, Shiyu Wang, Feiyue Ni, Ruihua Song, Longfei Li, Jinjie Gu, Chenyi Zhuang

This paper proposes a novel LLM-based virtual assistant that can automatically perform multi-step operations within mobile apps based on high-level user requests.

Language Modelling Large Language Model

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

no code implementations CVPR 2024 Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li

This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.

Pose Tracking Simultaneous Localization and Mapping

Implicit Event-RGBD Neural SLAM

no code implementations CVPR 2024 Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li

To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.

Analysis and Applications of Deep Learning with Finite Samples in Full Life-Cycle Intelligence of Nuclear Power Generation

no code implementations7 Nov 2023 Chenwei Tang, Wenqiang Zhou, Dong Wang, Caiyang Yu, Zhenan He, Jizhe Zhou, Shudong Huang, Yi Gao, Jianming Chen, Wentao Feng, Jiancheng Lv

The advent of Industry 4. 0 has precipitated the incorporation of Artificial Intelligence (AI) methods within industrial contexts, aiming to realize intelligent manufacturing, operation as well as maintenance, also known as industrial intelligence.

Few-Shot Learning Open Set Learning +1

LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking

2 code implementations25 Oct 2023 Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, Even Oldridge

Recently, large language models (LLMs) have exhibited significant progress in language understanding and generation.

Movie Recommendation

A Glance is Enough: Extract Target Sentence By Looking at A keyword

no code implementations9 Oct 2023 Ying Shi, Dong Wang, Lantian Li, Jiqing Han

This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input.

Sentence

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

7 code implementations4 Oct 2023 Yiwen Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters.

parameter-efficient fine-tuning

Linear Recurrent Units for Sequential Recommendation

1 code implementation3 Oct 2023 Zhenrui Yue, Yueqi Wang, Zhankui He, Huimin Zeng, Julian McAuley, Dong Wang

State-of-the-art sequential recommendation relies heavily on self-attention-based recommender models.

Language Modeling Language Modelling +1

Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text

no code implementations20 Sep 2023 Xuyang Chen, Dong Wang, Konrad Schindler, Mingwei Sun, Yongliang Wang, Nicolo Savioli, Liqiu Meng

Recently, Transformer-based text detection techniques have sought to predict polygons by encoding the coordinates of individual boundary vertices using distinct query features.

regression Text Detection

Leveraging the Power of Data Augmentation for Transformer-based Tracking

no code implementations15 Sep 2023 Jie Zhao, Johan Edstedt, Michael Felsberg, Dong Wang, Huchuan Lu

Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance.

Data Augmentation Visual Object Tracking

BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model with Non-textual Features for CTR Prediction

no code implementations17 Aug 2023 Dong Wang, Kavé Salamatian, Yunqing Xia, Weiwei Deng, Qi Zhiang

Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging.

Click-Through Rate Prediction Dimensionality Reduction +2

Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking

1 code implementation ICCV 2023 Ben Kang, Xin Chen, Dong Wang, Houwen Peng, Huchuan Lu

The Bridge Module incorporates the high-level information of deep features into the shallow large-resolution features.

Position Visual Tracking

Tracking Anything in High Quality

1 code implementation26 Jul 2023 Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li

To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.

Object Semantic Segmentation +3

Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier

1 code implementation22 Jul 2023 Zhixing Zhang, Ziwei Zhao, Dong Wang, Shishuang Zhao, Yuhang Liu, Jia Liu, LiWei Wang

Automatic labeling of coronary arteries is an essential task in the practical diagnosis process of cardiovascular diseases.

Anatomy

Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model

no code implementations17 Jul 2023 Rongke Liu, Dong Wang, Yizhi Ren, Zhen Wang, Kaitian Guo, Qianqian Qin, Xiaolei Liu

Therefore, the attack models in existing MIAs are difficult to effectively train with the knowledge of the target model, resulting in sub-optimal attacks.

model

A Collaborative Transfer Learning Framework for Cross-domain Recommendation

no code implementations26 Jun 2023 Wei zhang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang

The disadvantage of the former is that the data from other domains is not utilized by a single domain model, while the latter leverage all the data from different domains, but the fine-tuned model of transfer learning may trap the model in a local optimum of the source domain, making it difficult to fit the target domain.

Click-Through Rate Prediction Recommendation Systems +1

HSR-Diff:Hyperspectral Image Super-Resolution via Conditional Diffusion Models

no code implementations21 Jun 2023 Chanyue Wu, Dong Wang, Hanyu Mao, Ying Li

Despite the proven significance of hyperspectral images (HSIs) in performing various computer vision tasks, its potential is adversely affected by the low-resolution (LR) property in the spatial domain, resulting from multiple physical factors.

Denoising Image Super-Resolution

Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

no code implementations12 Jun 2023 AnLan Sun, Zhao Zhang, Meng Lei, Yuting Dai, Dong Wang, LiWei Wang

The coherence loss uses the feature centers generated by the static images to guide the frame attention in the video model.

Video Classification

Graph Based Long-Term And Short-Term Interest Model for Click-Through Rate Prediction

no code implementations5 Jun 2023 Huinan Sun, Guangliang Yu, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang

It consists of a multi-interest graph structure for capturing long-term user behavior, a multi-scenario heterogeneous sequence model for modeling short-term information, then an adaptive fusion mechanism to fused information from long-term and short-term behaviors.

Click-Through Rate Prediction

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

1 code implementation1 Jun 2023 Qian Lin, Bo Tang, Zifan Wu, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang

Aiming at promoting the safe real-world deployment of Reinforcement Learning (RL), research on safe RL has made significant progress in recent years.

reinforcement-learning Reinforcement Learning +1

Mining Negative Temporal Contexts For False Positive Suppression In Real-Time Ultrasound Lesion Detection

1 code implementation29 May 2023 Haojun Yu, Youcheng Li, Quanlin Wu, Ziwei Zhao, Dengbo Chen, Dong Wang, LiWei Wang

To address this issue, we propose to extract contexts from previous frames, including NTC, with the guidance of inverse optical flow.

Lesion Detection object-detection +2

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

1 code implementation NeurIPS 2023 Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li

Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.

Reinforcement Learning (RL)

Spot keywords from very noisy and mixed speech

no code implementations28 May 2023 Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin

We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech.

Data Augmentation Keyword Spotting

Zero- and Few-Shot Event Detection via Prompt-Based Meta Learning

1 code implementation27 May 2023 Zhenrui Yue, Huimin Zeng, Mengfei Lan, Heng Ji, Dong Wang

With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training.

Event Detection Meta-Learning

CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition

no code implementations25 May 2023 Lantian Li, Xiaolou Li, Haoyu Jiang, Chen Chen, Ruihai Hou, Dong Wang

A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research.

Person Recognition

Ordered and Binary Speaker Embedding

no code implementations25 May 2023 Jiaying Wang, Xianglong Wang, Namin Wang, Lantian Li, Dong Wang

Modern speaker recognition systems represent utterances by embedding vectors.

Clustering Retrieval +2

Neural Image Re-Exposure

1 code implementation23 May 2023 Xinyu Zhang, Hefei Huang, Xu Jia, Dong Wang, Huchuan Lu

In this work, we aim to re-expose the captured photo in post-processing to provide a more flexible way of addressing those issues within a unified framework.

Ranked #5 on Deblurring on GoPro (using extra training data)

Deblurring Decoder +6

Subspace-Configurable Networks

1 code implementation22 May 2023 Dong Wang, Olga Saukh, Xiaoxi He, Lothar Thiele

The obtained subspace is low-dimensional and has a surprisingly simple structure even for complex, non-invertible transformations of the input, leading to an exceptionally high efficiency of subspace-configurable networks (SCNs) when limited storage and computing resources are at stake.

Audio Signal Processing Data Augmentation

MetaAdapt: Domain Adaptive Few-Shot Misinformation Detection via Meta Learning

1 code implementation22 May 2023 Zhenrui Yue, Huimin Zeng, Yang Zhang, Lanyu Shang, Dong Wang

As such, MetaAdapt can learn how to adapt the misinformation detection model and exploit the source data for improved performance in the target domain.

Meta-Learning Misinformation +1

Label-free timing analysis of SiPM-based modularized detectors with physics-constrained deep learning

no code implementations24 Apr 2023 Pengcheng Ai, Le Xiao, Zhi Deng, Yi Wang, Xiangming Sun, Guangming Huang, Dong Wang, Yulei Li, Xinchi Ran

We mathematically demonstrate the existence of the optimal function desired by the method, and give a systematic algorithm for training and calibration of the model.

MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

no code implementations17 Apr 2023 Xiaowen Shi, Ze Wang, Yuanying Cai, Xiaoxu Wu, Fan Yang, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang

There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data.

Imitation Learning Position +2

Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results

3 code implementations12 Apr 2023 Dong Wang, Jia Guo, Qiqi Shao, Haochi He, Zhian Chen, Chuanbao Xiao, Ajian Liu, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Jun Wan, Jiankang Deng

Leveraging the WFAS dataset and Protocol 1 (Known-Type), we host the Wild Face Anti-Spoofing Challenge at the CVPR2023 workshop.

Diversity Face Anti-Spoofing +1

Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction

1 code implementation ICCV 2023 Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion.

Rolling Shutter Correction

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance

7 code implementations29 Mar 2023 Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

3D visual grounding

Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

no code implementations CVPR 2023 Yihao Wang, Zhigang Wang, Bin Zhao, Dong Wang, Mulin Chen, Xuelong Li

In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e. g., security.

Visual Prompt Multi-Modal Tracking

1 code implementation CVPR 2023 Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu

To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters.

Object Tracking Rgb-T Tracking

Fully Self-Supervised Depth Estimation from Defocus Clue

1 code implementation CVPR 2023 Haozhe Si, Bin Zhao, Dong Wang, Yunpeng Gao, Mulin Chen, Zhigang Wang, Xuelong Li

We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world.

Depth Estimation

Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation

1 code implementation17 Mar 2023 Dongsheng Wang, Xu Jia, Yang Zhang, Xinyu Zhang, Yaoyuan Wang, Ziyang Zhang, Dong Wang, Huchuan Lu

To fully exploit information with event streams to detect objects, a dual-memory aggregation network (DMANet) is proposed to leverage both long and short memory along event streams to aggregate effective information for object detection.

Object object-detection +1

Universal Instance Perception as Object Discovery and Retrieval

1 code implementation CVPR 2023 Bin Yan, Yi Jiang, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.

Described Object Detection Generalized Referring Expression Comprehension +15

PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce

1 code implementation6 Feb 2023 Xiaowen Shi, Fan Yang, Ze Wang, Xiaoxu Wu, Muzhi Guan, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang

Then we design a novel omnidirectional attention mechanism in OCPM to capture the context information in the permutation.

Re-Ranking

A Deep Behavior Path Matching Network for Click-Through Rate Prediction

no code implementations1 Feb 2023 Jian Dong, Yisong Yu, Yapeng Zhang, Yimin Lv, Shuli Wang, Beihong Jin, Yongkang Wang, Xingxing Wang, Dong Wang

User behaviors on an e-commerce app not only contain different kinds of feedback on items but also sometimes imply the cognitive clue of the user's decision-making.

Click-Through Rate Prediction Contrastive Learning +1

HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models

no code implementations ICCV 2023 Chanyue Wu, Dong Wang, Yunpeng Bai, Hanyu Mao, Ying Li, Qiang Shen

Despite the proven significance of hyperspectral images (HSIs) in performing various computer vision tasks, its potential is adversely affected by the low-resolution (LR) property in the spatial domain, resulting from multiple physical factors.

Denoising Hyperspectral Image Super-Resolution +1

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding

no code implementations ICCV 2023 Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

3D visual grounding

Direct Heterogeneous Causal Learning for Resource Allocation Problems in Marketing

no code implementations28 Nov 2022 Hao Zhou, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Dong Wang

Our key intuition is that we introduce the decision factor to establish a bridge between ML and OR such that the solution can be directly obtained in OR by only performing the sorting or comparison operations on the decision factor.

Decision Making Marketing

Wasserstein Archetypal Analysis

no code implementations25 Oct 2022 Katy Craig, Braxton Osting, Dong Wang, Yiming Xu

We prove a consistency result for the regularized problem, ensuring that if the data are iid samples from a probability measure, then as the number of samples is increased, a subsequence of the archetype points converges to the archetype points for the limiting data distribution, almost surely.

QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation

1 code implementation19 Oct 2022 Zhenrui Yue, Huimin Zeng, Bernhard Kratzwald, Stefan Feuerriegel, Dong Wang

Unlike existing approaches, we generate pseudo labels and propose to train the model via a novel attention-based contrastive adaptation method.

Contrastive Learning Data Augmentation +2

Unsupervised Domain Adaptation for COVID-19 Information Service with Contrastive Adversarial Domain Mixup

no code implementations6 Oct 2022 Huimin Zeng, Zhenrui Yue, Ziyi Kou, Lanyu Shang, Yang Zhang, Dong Wang

Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process.

Contrastive Learning Misinformation +1

On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks

no code implementations3 Oct 2022 Huimin Zeng, Zhenrui Yue, Yang Zhang, Ziyi Kou, Lanyu Shang, Dong Wang

In many applications with real-world consequences, it is crucial to develop reliable uncertainty estimation for the predictions made by the AI decision systems.

Adversarial Attack

Multiscale Latent-Guided Entropy Model for LiDAR Point Cloud Compression

no code implementations26 Sep 2022 Tingyu Fan, Linyao Gao, Yiling Xu, Dong Wang, Zhu Li

Besides, we propose a residual coding framework for the compression of the latent variable, which explores the spatial correlation of each layer by progressive downsampling, and model the corresponding residual with a fully-factorized entropy model.