no code implementations • ACL 2022 • Shuang Liu, Dong Wang, Xiaoguang Li, Minghui Huang, Meizhen Ding
Open-domain question answering is a challenging task with a wide variety of practical applications.
1 code implementation • 1 Feb 2025 • Turi Abu, Ying Shi, Thomas Fang Zheng, Dong Wang
We present a novel Automatic Speech Recognition (ASR) dataset for the Oromo language, a widely spoken language in Ethiopia and neighboring regions.
no code implementations • 27 Jan 2025 • Delin Qu, Haoming Song, Qizhi Chen, Yuanqi Yao, Xinyi Ye, Yan Ding, Zhigang Wang, Jiayuan Gu, Bin Zhao, Dong Wang, Xuelong Li
Specifically, we introduce Ego3D Position Encoding to inject 3D information into the input observations of the visual-language-action model, and propose Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control.
no code implementations • 23 Jan 2025 • Xuelong Dai, Dong Wang, Duan Mingxing, Bin Xiao
In this paper, we propose an effective and efficient adversarial defense method that counters both perturbation-based and unrestricted adversarial attacks.
no code implementations • 21 Jan 2025 • Yifan Liu, Yaokun Liu, Zelin Li, Ruichen Yao, Yang Zhang, Dong Wang
The proliferation of fake news on social media platforms disproportionately impacts vulnerable populations, eroding trust, exacerbating inequality, and amplifying harmful narratives.
no code implementations • 12 Jan 2025 • Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, LiWei Wang
Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models.
1 code implementation • 26 Dec 2024 • Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu
It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session.
no code implementations • 21 Dec 2024 • Silin Yang, Dong Wang, Haoqi Zheng, Ruochun Jin
Experiments on datasets from various domains show that the integration of RAG improved the prediction accuracy of the original model by 2. 97% on average.
no code implementations • 18 Dec 2024 • Xinghang Li, Peiyan Li, Minghuan Liu, Dong Wang, Jirong Liu, Bingyi Kang, Xiao Ma, Tao Kong, Hanbo Zhang, Huaping Liu
The obtained results convince us firmly to explain why we need VLA and develop a new family of VLAs, RoboVLMs, which require very few manual designs and achieve a new state-of-the-art performance in three simulation tasks and real-world experiments.
no code implementations • 17 Dec 2024 • Lanyu Shang, Bozhang Chen, Shiwei Liu, Yang Zhang, Ruohan Zong, Anav Vora, Ximing Cai, Na Wei, Dong Wang
Drought has become a critical global threat with significant societal impact.
1 code implementation • AAAI2025 2024 • Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang
Contextual information at the video level has become increasingly crucial for visual object tracking.
Ranked #1 on
Video Object Tracking
on NT-VOT211
no code implementations • 3 Dec 2024 • Xinjie Li, Yang Zhao, Dong Wang, Yuan Chen, Li Cao, Xiaoping Liu
Large-scale generative models have achieved remarkable advancements in various visual tasks, yet their application to shadow removal in images remains challenging.
1 code implementation • 29 Nov 2024 • Yiwen Yuan, Zecheng Zhang, Xinwei He, Akihiro Nitta, Weihua Hu, Dong Wang, Manan Shah, Shenyang Huang, Blaž Stojanovič, Alan Krumholz, Jan Eric Lenssen, Jure Leskovec, Matthias Fey
Recommendation systems predominantly utilize two-tower architectures, which evaluate user-item rankings through the inner product of their respective embeddings.
no code implementations • 25 Nov 2024 • Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang, Bin Zhao, Xuelong Li
Open-vocabulary 3D scene understanding is indispensable for embodied agents.
no code implementations • 23 Nov 2024 • Kaisheng Liang, Xuelong Dai, YanJie Li, Dong Wang, Bin Xiao
Recent clean feature mixup methods use random clean features to perturb the feature space but lack optimization for disrupting adversarial examples, overlooking the advantages of attack-specific perturbations.
no code implementations • 21 Nov 2024 • Guanzhou Lan, YuQi Yang, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, our method comprises a degradation disentanglement module and a degradation-aware contrastive learning module.
1 code implementation • 6 Nov 2024 • Yu Guan, Kunlong Zhang, Qi Qi, Dong Wang, Ziwen Ke, Shaoyu Wang, Dong Liang, Qiegen Liu
Diffusion models have recently demonstrated considerable advancement in the generation and reconstruction of magnetic resonance imaging (MRI) data.
no code implementations • 4 Nov 2024 • Zhenrui Yue, Huimin Zeng, Yang Zhang, Julian McAuley, Dong Wang
Without requiring additional modalities or shared information across domains, our approach leverages user-item interactions from multiple source domains to improve the target domain performance.
no code implementations • 30 Oct 2024 • Yanchu Guan, Dong Wang, Yan Wang, Haiqing Wang, Renen Sun, Chenyi Zhuang, Jinjie Gu, Zhixuan Chu
In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction.
no code implementations • 29 Oct 2024 • Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li
Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints.
no code implementations • 28 Oct 2024 • Ziyang Zheng, Haipeng Jing, Canyu Rui, Askar Hamdulla, Dong Wang
In this paper, we propose a simple, general, and performance guaranteed T2S enhancement approach called Actor-Critic (AC).
1 code implementation • 26 Oct 2024 • Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen
Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding.
no code implementations • 22 Oct 2024 • Ruixin Lia, Guoxu Zhaoa, Dylan Richard Muir, Yuya Ling, Karla Burelo, Mina Khoei, Dong Wang, Yannan Xing, Ning Qiao
In this study, we aimed to detect interictal and ictal periods of epileptic seizures using a spiking neural network (SNN).
1 code implementation • 21 Oct 2024 • Zehua Liu, Xiaolou Li, Chen Chen, Li Guo, Lantian Li, Dong Wang
Then, based on the temporal correspondence between audio and video, a frame-level local alignment loss is introduced to refine the global alignment, improving the utility of the audio information.
1 code implementation • 19 Oct 2024 • Yunqi Cai, Jiangnan Li, Dong Wang
Micromagnetics has made significant strides, particularly due to its wide-ranging applications in magnetic storage design.
no code implementations • 16 Oct 2024 • Guanzhou Lan, Qianli Ma, YuQi Yang, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao
In this paper, we identify two primary factors contributing to performance degradation: fitting errors and the inference gap.
no code implementations • 11 Oct 2024 • Yunpeng Gao, Zhigang Wang, Linglin Jing, Dong Wang, Xuelong Li, Bin Zhao
Aerial Vision-and-Language Navigation (VLN) is a novel task enabling Unmanned Aerial Vehicles (UAVs) to navigate in outdoor environments through natural language instructions and visual cues.
no code implementations • 6 Oct 2024 • Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky
Our observations reveal that increasing inference computation leads to nearly linear gains in RAG performance when optimally allocated, a relationship we describe as the inference scaling laws for RAG.
no code implementations • 29 Sep 2024 • Chen Chen, Xiaolou Li, Zehua Liu, Lantian Li, Dong Wang
In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention.
1 code implementation • 25 Sep 2024 • Yueqi Wang, Zhenrui Yue, Huimin Zeng, Dong Wang, Julian McAuley
Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions.
1 code implementation • 23 Sep 2024 • Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
Specifically, a Proposal-Execution-Feedback-Adjustment (PEFA) mechanism is designed to decompose and assign actions for individual robots, where a centralized task assigner makes a task planning proposal to decompose the complex task into subtasks, and then assigns subtasks to robot executors.
no code implementations • 18 Sep 2024 • Zhaxizhuoma, Pengan Chen, Ziniu Wu, Jiawei Sun, Dong Wang, Peng Zhou, Nieqing Cao, Yan Ding, Bin Zhao, Xuelong Li
To validate the effectiveness of AlignBot, experiments are conducted in real-world household environments, which are constructed within the laboratory to replicate typical household settings.
no code implementations • 12 Sep 2024 • Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang
This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+8
no code implementations • 27 Aug 2024 • Yifan Liu, Yike Li, Dong Wang
Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains.
no code implementations • 23 Aug 2024 • Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao
3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics.
no code implementations • 15 Aug 2024 • Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu
Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture.
Ranked #1 on
Rgb-T Tracking
on GTOT
no code implementations • 6 Aug 2024 • Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the mechanisms of "how to do".
no code implementations • 1 Aug 2024 • Dong Wang, Weidong Mei, Boyu Ning, Zhi Chen
Fluid antennas (FAs) and movable antennas (MAs) have attracted increasing attention in wireless communications recently.
no code implementations • 23 Jul 2024 • Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, LiWei Wang
In the prospective external evaluation, our diagnostic model outperforms the average performance of nine radiologists by 33. 5% in specificity with the same sensitivity, improving their performance by providing predictions with an interpretable decision-making process.
no code implementations • 17 Jul 2024 • Pengyu Zhang, Hao Yin, Zeren Wang, Wenyue Chen, Shengming Li, Dong Wang, Huchuan Lu, Xu Jia
Sign language is one of the most effective communication tools for people with hearing difficulties.
no code implementations • 4 Jul 2024 • Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han
Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score.
no code implementations • 4 Jul 2024 • Gang Bao, Dong Wang, Boyi Zou
This paper focuses on integrating the networks and adversarial training into constrained optimization problems to develop a framework algorithm for constrained optimization problems.
1 code implementation • 2 Jul 2024 • Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang
Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 28 Jun 2024 • Xin Wei, Weidong Mei, Dong Wang, Boyu Ning, Zhi Chen
However, such an optimization problem is difficult to be optimally solved due to the highly nonlinear functions of the received signal/interference power at the SR/all PRs in terms of the MA positions.
no code implementations • 23 Jun 2024 • Delin Qu, Qizhi Chen, Pingrui Zhang, Xianqiang Gao, Bin Zhao, Dong Wang, Xuelong Li
This paper scales object-level reconstruction to complex scenes, advancing interactive scene reconstruction.
no code implementations • 14 Jun 2024 • Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, Dong Wang
Upon input claims, RAFTS starts with evidence retrieval, where we design a retrieval pipeline to collect and re-rank relevant documents from verifiable sources.
no code implementations • 14 Jun 2024 • Chen Chen, Zehua Liu, Xiaolou Li, Lantian Li, Dong Wang
The first Chinese Continuous Visual Speech Recognition Challenge aimed to probe the performance of Large Vocabulary Continuous Visual Speech Recognition (LVC-VSR) on two tasks: (1) Single-speaker VSR for a particular speaker and (2) Multi-speaker VSR for a set of registered speakers.
1 code implementation • 10 Jun 2024 • Boxin Zhao, Weishi Wang, Dingyuan Zhu, Ziqi Liu, Dong Wang, Zhiqiang Zhang, Jun Zhou, Mladen Kolar
Causal discovery aims to recover the DAG structure using observational data.
no code implementations • 5 Jun 2024 • Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu
However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers.
3 code implementations • 4 Jun 2024 • Yueqi Wang, Zhankui He, Zhenrui Yue, Julian McAuley, Dong Wang
In the context of sequential recommendation, a pivotal issue pertains to the comparative analysis between bi-directional/auto-encoding (AE) and uni-directional/auto-regressive (AR) attention mechanisms, where the conclusions regarding architectural and performance superiority remain inconclusive.
1 code implementation • 3 Jun 2024 • Dong Wang, Giovanni Beltrame
This validation shows that MOSEAC streamlines RL algorithm deployment by automatically tuning the agent control loop frequency using a single parameter.
1 code implementation • 1 Jun 2024 • Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li
To this end, we propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction (MPI) and enhances the visual representation. Given a pair of keyframes representing the initial and final states, along with language instructions, our algorithm predicts the transition frame and detects the interaction object, respectively.
no code implementations • 1 May 2024 • Antonio Ruiz, Andrew Melnik, Dong Wang, Helge Ritter
The lane graph is a key component for building high-definition (HD) maps and crucial for downstream tasks such as autonomous driving or navigation planning.
no code implementations • 23 Apr 2024 • Yingquan Wang, Pingping Zhang, Dong Wang, Huchuan Lu
In this work, we first explore the influence of global and local features of ViT and then further propose a novel Global-Local Transformer (GLTrans) for high-performance object Re-ID.
7 code implementations • 11 Apr 2024 • Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li
The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.
1 code implementation • 1 Apr 2024 • Huimin Zeng, Zhenrui Yue, Dong Wang
A new user could come up with queries that involve data from unseen classes, and such open-vocabulary queries would directly defect such FL systems.
no code implementations • 26 Mar 2024 • Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu
For instance, DyTrack obtains 64. 9% AUC on LaSOT with a speed of 256 fps.
no code implementations • CVPR 2024 • Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li
In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.
no code implementations • 22 Mar 2024 • Zhenrui Yue, Huimin Zeng, Yimeng Lu, Lanyu Shang, Yang Zhang, Dong Wang
The proliferation of online misinformation has posed significant threats to public interest.
2 code implementations • CVPR 2024 • Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He
Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset.
1 code implementation • 7 Mar 2024 • Huimin Zeng, Zhenrui Yue, Qian Jiang, Dong Wang
To this end, we propose GPT-FedRec, a federated recommendation framework leveraging ChatGPT and a novel hybrid Retrieval Augmented Generation (RAG) mechanism.
2 code implementations • 22 Feb 2024 • Dong Wang, Giovanni Beltrame
Traditional Reinforcement Learning (RL) policies are typically implemented with fixed control rates, often disregarding the impact of control rate selection.
no code implementations • 5 Feb 2024 • Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang
This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations.
2 code implementations • 26 Jan 2024 • Zifan Wu, Bo Tang, Qian Lin, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang
Results on benchmark tasks show that our method not only achieves an asymptotic performance comparable to state-of-the-art on-policy methods while using much fewer samples, but also significantly reduces constraint violation during training.
1 code implementation • 17 Jan 2024 • Dong Wang, Giovanni Beltrame
Unfortunately, the system should be controlled at the highest, worst-case frequency to ensure stability, which can demand significant computational and energy resources and hinder the deployability of the controller on onboard hardware.
no code implementations • 29 Dec 2023 • Hao Wang, Bo Tang, Chi Harold Liu, Shangqin Mao, Jiahong Zhou, Zipeng Dai, Yaqi Sun, Qianlong Xie, Xingxing Wang, Dong Wang
Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day.
no code implementations • 27 Dec 2023 • Jiahong Zhou, Shunhui Mao, Guoliang Yang, Bo Tang, Qianlong Xie, Lebin Lin, Xingxing Wang, Dong Wang
The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i. e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints.
no code implementations • 12 Dec 2023 • Wei Geng, Baidi Xiao, Rongpeng Li, Ning Wei, Dong Wang, Zhifeng Zhao
In this paper, we propose a novel decomposition-based multi-agent distributional RL method by approximating the globally shared noisy reward by a Gaussian mixture model (GMM) and decomposing it into the combination of individual distributional local rewards, with which each agent can be updated locally through distributional RL.
Distributional Reinforcement Learning
Multi-agent Reinforcement Learning
+3
no code implementations • 12 Dec 2023 • Jiawei Sun, Bin Zhao, Dong Wang, Zhigang Wang, Jie Zhang, Nektarios Koukourakis, Juergen W. Czarske, Xuelong Li
Quantitative phase imaging (QPI) through multi-core fibers (MCFs) has been an emerging in vivo label-free endoscopic imaging modality with minimal invasiveness.
1 code implementation • 12 Dec 2023 • Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li
The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences.
no code implementations • 4 Dec 2023 • Yanchu Guan, Dong Wang, Zhixuan Chu, Shiyu Wang, Feiyue Ni, Ruihua Song, Longfei Li, Jinjie Gu, Chenyi Zhuang
This paper proposes a novel LLM-based virtual assistant that can automatically perform multi-step operations within mobile apps based on high-level user requests.
no code implementations • CVPR 2024 • Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li
This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.
no code implementations • CVPR 2024 • Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li
To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.
no code implementations • 7 Nov 2023 • Chenwei Tang, Wenqiang Zhou, Dong Wang, Caiyang Yu, Zhenan He, Jizhe Zhou, Shudong Huang, Yi Gao, Jianming Chen, Wentao Feng, Jiancheng Lv
The advent of Industry 4. 0 has precipitated the incorporation of Artificial Intelligence (AI) methods within industrial contexts, aiming to realize intelligent manufacturing, operation as well as maintenance, also known as industrial intelligence.
2 code implementations • 6 Nov 2023 • Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li
Generalizable articulated object manipulation is essential for home-assistant robots.
2 code implementations • 25 Oct 2023 • Zhenrui Yue, Sara Rabhi, Gabriel de Souza Pereira Moreira, Dong Wang, Even Oldridge
Recently, large language models (LLMs) have exhibited significant progress in language understanding and generation.
no code implementations • 9 Oct 2023 • Ying Shi, Dong Wang, Lantian Li, Jiqing Han
This paper investigates the possibility of extracting a target sentence from multi-talker speech using only a keyword as input.
7 code implementations • 4 Oct 2023 • Yiwen Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters.
1 code implementation • 3 Oct 2023 • Zhenrui Yue, Yueqi Wang, Zhankui He, Huimin Zeng, Julian McAuley, Dong Wang
State-of-the-art sequential recommendation relies heavily on self-attention-based recommender models.
no code implementations • 20 Sep 2023 • Xuyang Chen, Dong Wang, Konrad Schindler, Mingwei Sun, Yongliang Wang, Nicolo Savioli, Liqiu Meng
Recently, Transformer-based text detection techniques have sought to predict polygons by encoding the coordinates of individual boundary vertices using distinct query features.
no code implementations • 15 Sep 2023 • Jie Zhao, Johan Edstedt, Michael Felsberg, Dong Wang, Huchuan Lu
Due to long-distance correlation and powerful pretrained models, transformer-based methods have initiated a breakthrough in visual object tracking performance.
no code implementations • 17 Aug 2023 • Dong Wang, Kavé Salamatian, Yunqing Xia, Weiwei Deng, Qi Zhiang
Although deep pre-trained language models have shown promising benefit in a large set of industrial scenarios, including Click-Through-Rate (CTR) prediction, how to integrate pre-trained language models that handle only textual signals into a prediction pipeline with non-textual features is challenging.
1 code implementation • ICCV 2023 • Ben Kang, Xin Chen, Dong Wang, Houwen Peng, Huchuan Lu
The Bridge Module incorporates the high-level information of deep features into the shallow large-resolution features.
2 code implementations • 1 Aug 2023 • Mingzhan Yang, Guangxin Han, Bin Yan, Wenhua Zhang, Jinqing Qi, Huchuan Lu, Dong Wang
Also, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner.
Ranked #11 on
Multi-Object Tracking
on DanceTrack
1 code implementation • 26 Jul 2023 • Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li
To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.
Ranked #5 on
Semi-Supervised Video Object Segmentation
on YouTube-VOS 2019
(using extra training data)
1 code implementation • 22 Jul 2023 • Zhixing Zhang, Ziwei Zhao, Dong Wang, Shishuang Zhao, Yuhang Liu, Jia Liu, LiWei Wang
Automatic labeling of coronary arteries is an essential task in the practical diagnosis process of cardiovascular diseases.
no code implementations • 17 Jul 2023 • Rongke Liu, Dong Wang, Yizhi Ren, Zhen Wang, Kaitian Guo, Qianqian Qin, Xiaolei Liu
Therefore, the attack models in existing MIAs are difficult to effectively train with the knowledge of the target model, resulting in sub-optimal attacks.
no code implementations • 4 Jul 2023 • Wei zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
The effectiveness of ad creatives is greatly influenced by their visual appearance.
no code implementations • 26 Jun 2023 • Wei zhang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
The disadvantage of the former is that the data from other domains is not utilized by a single domain model, while the latter leverage all the data from different domains, but the fine-tuned model of transfer learning may trap the model in a local optimum of the source domain, making it difficult to fit the target domain.
no code implementations • 21 Jun 2023 • Chanyue Wu, Dong Wang, Hanyu Mao, Ying Li
Despite the proven significance of hyperspectral images (HSIs) in performing various computer vision tasks, its potential is adversely affected by the low-resolution (LR) property in the spatial domain, resulting from multiple physical factors.
no code implementations • 12 Jun 2023 • AnLan Sun, Zhao Zhang, Meng Lei, Yuting Dai, Dong Wang, LiWei Wang
The coherence loss uses the feature centers generated by the static images to guide the frame attention in the video model.
no code implementations • 5 Jun 2023 • Huinan Sun, Guangliang Yu, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
It consists of a multi-interest graph structure for capturing long-term user behavior, a multi-scenario heterogeneous sequence model for modeling short-term information, then an adaptive fusion mechanism to fused information from long-term and short-term behaviors.
1 code implementation • 1 Jun 2023 • Qian Lin, Bo Tang, Zifan Wu, Chao Yu, Shangqin Mao, Qianlong Xie, Xingxing Wang, Dong Wang
Aiming at promoting the safe real-world deployment of Reinforcement Learning (RL), research on safe RL has made significant progress in recent years.
1 code implementation • 29 May 2023 • Haojun Yu, Youcheng Li, Quanlin Wu, Ziwei Zhao, Dengbo Chen, Dong Wang, LiWei Wang
To address this issue, we propose to extract contexts from previous frames, including NTC, with the guidance of inverse optical flow.
1 code implementation • NeurIPS 2023 • Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.
no code implementations • 28 May 2023 • Ying Shi, Dong Wang, Lantian Li, Jiqing Han, Shi Yin
We propose a novel Mix Training (MT) strategy that encourages the model to discover low-energy keywords from noisy and mixed speech.
1 code implementation • 27 May 2023 • Zhenrui Yue, Huimin Zeng, Mengfei Lan, Heng Ji, Dong Wang
With emerging online topics as a source for numerous new events, detecting unseen / rare event types presents an elusive challenge for existing event detection methods, where only limited data access is provided for training.
no code implementations • 25 May 2023 • Lantian Li, Xiaolou Li, Haoyu Jiang, Chen Chen, Ruihai Hou, Dong Wang
A comprehensive study was conducted to compare CN-Celeb-AV with two popular public AVPR benchmark datasets, and the results demonstrated that CN-Celeb-AV is more in line with real-world scenarios and can be regarded as a new benchmark dataset for AVPR research.
no code implementations • 25 May 2023 • Jiaying Wang, Xianglong Wang, Namin Wang, Lantian Li, Dong Wang
Modern speaker recognition systems represent utterances by embedding vectors.
1 code implementation • 23 May 2023 • Xinyu Zhang, Hefei Huang, Xu Jia, Dong Wang, Huchuan Lu
In this work, we aim to re-expose the captured photo in post-processing to provide a more flexible way of addressing those issues within a unified framework.
Ranked #5 on
Deblurring
on GoPro
(using extra training data)
1 code implementation • 22 May 2023 • Dong Wang, Olga Saukh, Xiaoxi He, Lothar Thiele
The obtained subspace is low-dimensional and has a surprisingly simple structure even for complex, non-invertible transformations of the input, leading to an exceptionally high efficiency of subspace-configurable networks (SCNs) when limited storage and computing resources are at stake.
1 code implementation • 22 May 2023 • Zhenrui Yue, Huimin Zeng, Yang Zhang, Lanyu Shang, Dong Wang
As such, MetaAdapt can learn how to adapt the misinformation detection model and exploit the source data for improved performance in the target domain.
1 code implementation • CVPR 2023 • Simin Li, Shuing Zhang, Gujun Chen, Dong Wang, Pu Feng, Jiakai Wang, Aishan Liu, Xin Yi, Xianglong Liu
First, to benchmark attack naturalness, we contribute the first Physical Attack Naturalness (PAN) dataset with human rating and gaze.
1 code implementation • CVPR 2023 • Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu
In this paper, we introduce a new sequence-to-sequence learning framework for RGB-based and multi-modal object tracking.
Ranked #1 on
Rgb-T Tracking
on LasHeR
no code implementations • 24 Apr 2023 • Pengcheng Ai, Le Xiao, Zhi Deng, Yi Wang, Xiangming Sun, Guangming Huang, Dong Wang, Yulei Li, Xinchi Ran
We mathematically demonstrate the existence of the optimal function desired by the method, and give a systematic algorithm for training and calibration of the model.
no code implementations • 17 Apr 2023 • Xiaowen Shi, Ze Wang, Yuanying Cai, Xiaoxu Wu, Fan Yang, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang
There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data.
3 code implementations • 12 Apr 2023 • Dong Wang, Jia Guo, Qiqi Shao, Haochi He, Zhian Chen, Chuanbao Xiao, Ajian Liu, Sergio Escalera, Hugo Jair Escalante, Zhen Lei, Jun Wan, Jiankang Deng
Leveraging the WFAS dataset and Protocol 1 (Known-Type), we host the Wild Face Anti-Spoofing Challenge at the CVPR2023 workshop.
no code implementations • CVPR 2023 • Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li
Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image.
1 code implementation • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao
The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks.
1 code implementation • ICCV 2023 • Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li
This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion.
7 code implementations • 29 Mar 2023 • Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.
no code implementations • CVPR 2023 • Yihao Wang, Zhigang Wang, Bin Zhao, Dong Wang, Mulin Chen, Xuelong Li
In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e. g., security.
1 code implementation • CVPR 2023 • Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu
To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters.
Ranked #27 on
Rgb-T Tracking
on LasHeR
1 code implementation • CVPR 2023 • Haozhe Si, Bin Zhao, Dong Wang, Yunpeng Gao, Mulin Chen, Zhigang Wang, Xuelong Li
We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world.
1 code implementation • 17 Mar 2023 • Dongsheng Wang, Xu Jia, Yang Zhang, Xinyu Zhang, Yaoyuan Wang, Ziyang Zhang, Dong Wang, Huchuan Lu
To fully exploit information with event streams to detect objects, a dual-memory aggregation network (DMANet) is proposed to leverage both long and short memory along event streams to aggregate effective information for object detection.
1 code implementation • CVPR 2023 • Bin Yan, Yi Jiang, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu
All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.
Described Object Detection
Generalized Referring Expression Comprehension
+15
1 code implementation • 6 Feb 2023 • Xiaowen Shi, Fan Yang, Ze Wang, Xiaoxu Wu, Muzhi Guan, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang
Then we design a novel omnidirectional attention mechanism in OCPM to capture the context information in the permutation.
no code implementations • 1 Feb 2023 • Jian Dong, Yisong Yu, Yapeng Zhang, Yimin Lv, Shuli Wang, Beihong Jin, Yongkang Wang, Xingxing Wang, Dong Wang
User behaviors on an e-commerce app not only contain different kinds of feedback on items but also sometimes imply the cognitive clue of the user's decision-making.
no code implementations • 29 Jan 2023 • Xiang Li, Shuwei Chen, Jian Dong, Jin Zhang, Yongkang Wang, Xingxing Wang, Dong Wang
Click-through rate (CTR) prediction is crucial in recommendation and online advertising systems.
no code implementations • ICCV 2023 • Chanyue Wu, Dong Wang, Yunpeng Bai, Hanyu Mao, Ying Li, Qiang Shen
Despite the proven significance of hyperspectral images (HSIs) in performing various computer vision tasks, its potential is adversely affected by the low-resolution (LR) property in the spatial domain, resulting from multiple physical factors.
no code implementations • ICCV 2023 • Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li
In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.
1 code implementation • CVPR 2023 • Haojie Zhao, Dong Wang, Huchuan Lu
However, for the template, we make the decoder reconstruct the target appearance within the search region.
no code implementations • 28 Nov 2022 • Hao Zhou, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Dong Wang
Our key intuition is that we introduce the decision factor to establish a bridge between ML and OR such that the solution can be directly obtained in OR by only performing the sorting or comparison operations on the decision factor.
no code implementations • 25 Oct 2022 • Katy Craig, Braxton Osting, Dong Wang, Yiming Xu
We prove a consistency result for the regularized problem, ensuring that if the data are iid samples from a probability measure, then as the number of samples is increased, a subsequence of the archetype points converges to the archetype points for the limiting data distribution, almost surely.
1 code implementation • 19 Oct 2022 • Zhenrui Yue, Huimin Zeng, Bernhard Kratzwald, Stefan Feuerriegel, Dong Wang
Unlike existing approaches, we generate pseudo labels and propose to train the model via a novel attention-based contrastive adaptation method.
no code implementations • 6 Oct 2022 • Huimin Zeng, Zhenrui Yue, Ziyi Kou, Lanyu Shang, Yang Zhang, Dong Wang
Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process.
no code implementations • 3 Oct 2022 • Huimin Zeng, Zhenrui Yue, Yang Zhang, Ziyi Kou, Lanyu Shang, Dong Wang
In many applications with real-world consequences, it is crucial to develop reliable uncertainty estimation for the predictions made by the AI decision systems.
no code implementations • 26 Sep 2022 • Tingyu Fan, Linyao Gao, Yiling Xu, Dong Wang, Zhu Li
Besides, we propose a residual coding framework for the compression of the latent variable, which explores the spatial correlation of each layer by progressive downsampling, and model the corresponding residual with a fully-factorized entropy model.