no code implementations • 16 Mar 2025 • Ruoyu Wang, Yukai Ma, Yi Yao, Sheng Tao, Haoang Li, Zongzhi Zhu, Yong liu, Xingxing Zuo
Semantic Scene Completion (SSC) constitutes a pivotal element in autonomous driving perception systems, tasked with inferring the 3D semantic occupancy of a scene from sensory data.
no code implementations • 17 Feb 2025 • Junda Wu, Yuxin Xiong, Xintong Li, Yu Xia, Ruoyu Wang, Yu Wang, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Jingbo Shang, Julian McAuley
By explicitly disentangling the optimization of visual understanding from task-specific alignment, MDGD preserves pre-trained visual knowledge while enabling efficient task adaptation.
1 code implementation • 7 Feb 2025 • Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Jun Du, Kewei Li, Ruoyu Wang, Jiefeng Ma, Lei Sun, Jianqing Gao
To address this issue, we propose Self-Loop Latent Swap, a frame-level bidirectional swap applied to the overlapping region of adjacent views.
no code implementations • 6 Dec 2024 • Ruoyu Wang, Huayang Huang, Ye Zhu, Olga Russakovsky, Yu Wu
Text-to-image synthesis (T2I) has advanced remarkably with the emergence of large-scale diffusion models.
no code implementations • 28 Nov 2024 • Yuhan Pei, Ruoyu Wang, Yongqi Yang, Ye Zhu, Olga Russakovsky, Yu Wu
Originating from the diffusion phenomenon in physics, which describes the random movement and collisions of particles, diffusion generative models simulate a random walk in the data space along the denoising trajectory.
no code implementations • 27 Nov 2024 • Xiaoxuan Li, Yao Liu, Ruoyu Wang, Lina Yao
As the significance of understanding the cause-and-effect relationships among variables increases in the development of modern systems and algorithms, learning causality from observational data has become a preferred and efficient approach over conducting randomized control trials.
no code implementations • 31 Oct 2024 • Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley
To overcome the reasoning heterogeneity and grounding problems, we leverage on-policy KG exploration and RL to model a KG policy that generates token-level likelihood distributions for LLM-generated chain-of-thought reasoning paths, simulating KG reasoning preference.
no code implementations • 21 Oct 2024 • Jianjun Gao, Chen Cai, Ruoyu Wang, Wenyang Liu, Kim-Hui Yap, Kratika Garg, Boon-Siew Han
Human-object interaction (HOI) detection has seen advancements with Vision Language Models (VLMs), but these methods often depend on extensive manual annotations.
1 code implementation • 27 Sep 2024 • Shuo Wang, Binbin Huang, Ruoyu Wang, Shenghua Gao
To tackle the dynamic contents and the occlusions in complex scenes, we present a space-time 2D Gaussian Splatting approach.
no code implementations • 25 Sep 2024 • Ruoyu Wang, Shutong Niu, Gaobin Yang, Jun Du, Shuangqing Qian, Tian Gao, Jia Pan
This paper proposes a three-stage modular system to enhance single-channel neural speaker diarization systems and recognition performance by utilizing spatial cues from multi-channel speech to provide more accurate initialization for each stage of neural speaker diarization (NSD) decoding: (1) Overlap detection and continuous speech separation (CSS) on multi-channel speech are used to obtain cleaner single speaker speech segments for clustering, followed by the first NSD decoding pass.
1 code implementation • 14 Sep 2024 • Ruoyu Wang, Jiachen Sun, Shaowei Hua, Quan Fang
Additionally, we compare ASFT to DPO and its latest variants, such as the single-step approach ORPO, using the latest instruction-tuned model Llama3, which has been fine-tuned on UltraFeedback and HH-RLHF.
no code implementations • 4 Sep 2024 • Ruoyu Wang, Xiaoxuan Li, Lina Yao
Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions, but studies reveal that they often struggle with tasks requiring reasoning, such as math or physics.
no code implementations • 4 Sep 2024 • Ruoyu Wang, Yao Liu, Yuanjiang Cao, Lina Yao
We address these constraints by initially exploring the unique differences between Navigation tasks and other sequential data tasks through the lens of Causality, presenting a causal framework to elucidate the inadequacies of conventional sequential methods for Navigation.
no code implementations • 4 Sep 2024 • Ruoyu Wang, Lina Yao
Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process.
no code implementations • 3 Sep 2024 • Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao
This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge.
no code implementations • 24 Aug 2024 • Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren
We conducted comprehensive experiments on the nuScenes dataset, demonstrating significant improvements over existing methods.
no code implementations • 4 Aug 2024 • Xiang Mei, Pulkit Singh Singaria, Jordi Del Castillo, Haoran Xi, Abdelouahab, Benchikh, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, Hammond Pearce, Brendan Dolan-Gavitt
High-quality datasets of real-world vulnerabilities are enormously valuable for downstream research in software security, but existing datasets are typically small, require extensive manual effort to update, and are missing crucial features that such research needs.
no code implementations • 3 Aug 2024 • Ruoyu Wang, Wenqian Wang, Jianjun Gao, Dan Lin, Kim-Hui Yap, Bingbing Li
Driver action recognition, aiming to accurately identify drivers' behaviours, is crucial for enhancing driver-vehicle interactions and ensuring driving safety.
no code implementations • 17 Jun 2024 • Ruoyu Wang, Chen Cai, Wenqian Wang, Jianjun Gao, Dan Lin, Wenyang Liu, Kim-Hui Yap
Therefore, previous works have suggested independently learning each non-RGB modality by fine-tuning a model pre-trained on RGB videos, but these methods are less effective in extracting informative features when faced with newly-incoming modalities due to large domain gaps.
no code implementations • 31 May 2024 • Zhipeng Yang, Ruoyu Wang, Yang Tan, Liping Xie
In this way, output features transition from coarse to fine as the branches deepen.
1 code implementation • 24 May 2024 • Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering an innovative approach to synthesizing musical content from textual descriptions.
Ranked #1 on
Music Generation
on Song Describer Dataset
no code implementations • 11 May 2024 • Yao Liu, Ruoyu Wang, Yuanjiang Cao, Quan Z. Sheng, Lina Yao
The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation.
no code implementations • CVPR 2024 • Su Sun, Cheng Zhao, Yuliang Guo, Ruoyu Wang, Xinyu Huang, Yingjie Victor Chen, Liu Ren
The 3D Inpainter with abstract representation at coarse levels is trained offline using various scenes to complete occluded surfaces.
no code implementations • 3 Apr 2024 • Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren
Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data.
1 code implementation • 23 Mar 2024 • Yuliang Guo, Abhinav Kumar, Cheng Zhao, Ruoyu Wang, Xinyu Huang, Liu Ren
While gradient-based optimization in a NeRF framework updates the initial pose, this paper highlights that scale-depth ambiguity in monocular object reconstruction causes failures when the initial pose deviates moderately from the true pose.
no code implementations • 17 Mar 2024 • Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang
This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples.
1 code implementation • 14 Mar 2024 • Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang Vu
Stain normalization algorithms aim to transform the color and intensity characteristics of a source multi-gigapixel histology image to match those of a target image, mitigating inconsistencies in the appearance of stains used to highlight cellular components in the images.
1 code implementation • CVPR 2024 • Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee
In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.
1 code implementation • 5 Mar 2024 • Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer, Samuel Marks, Oam Patel, Andy Zou, Mantas Mazeika, Zifan Wang, Palash Oswal, Weiran Lin, Adam A. Hunt, Justin Tienken-Harder, Kevin Y. Shih, Kemper Talley, John Guan, Russell Kaplan, Ian Steneker, David Campbell, Brad Jokubaitis, Alex Levinson, Jean Wang, William Qian, Kallol Krishna Karmakar, Steven Basart, Stephen Fitz, Mindy Levine, Ponnurangam Kumaraguru, Uday Tupakula, Vijay Varadharajan, Ruoyu Wang, Yan Shoshitaishvili, Jimmy Ba, Kevin M. Esvelt, Alexandr Wang, Dan Hendrycks
To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs.
no code implementations • 26 Jan 2024 • Dan Lin, Philip Hann Yung Lee, Yiming Li, Ruoyu Wang, Kim-Hui Yap, Bingbing Li, You Shing Ngim
Driver Action Recognition (DAR) is crucial in vehicle cabin monitoring systems.
no code implementations • 16 Dec 2023 • Sai Wang, Ye Zhu, Ruoyu Wang, Amaya Dharmasiri, Olga Russakovsky, Yu Wu
While face swapping and attribute editing are performed on similar face regions such as eyes and nose, the inpainting operation can be performed on random image regions, removing the spurious correlations of previous datasets.
1 code implementation • 10 Nov 2023 • Adam J Shephard, Mostafa Jahanifar, Ruoyu Wang, Muhammad Dawood, Simon Graham, Kastytis Sidlauskas, Syed Ali Khurram, Nasir M Rajpoot, Shan E Ahmed Raza
Tumour-infiltrating lymphocytes (TILs) are considered as a valuable prognostic markers in both triple-negative and human epidermal growth factor receptor 2 (HER2) positive breast cancer.
1 code implementation • 17 Sep 2023 • Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee
We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.
no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
no code implementations • 28 Aug 2023 • Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee
This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.
no code implementations • 19 Jun 2023 • Ruoyu Wang, Yanfei Xue, Bharath Surianarayanan, Dong Tian, Chen Feng
We propose Concavity-induced Distance (CID) as a novel way to measure the dissimilarity between a pair of points in an unoriented point cloud.
1 code implementation • 14 Jun 2023 • Ruoyu Wang, Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu
In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties and propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon given a pre-trained frozen diffusion model for versatile customization application scenarios, where the low-level pixel information from the conditioning needs to be preserved.
no code implementations • 1 Mar 2023 • Jing Li, Jinpeng Yu, Ruoyu Wang, Zhengxin Li, Zhengyu Zhang, Lina Cao, Shenghua Gao
As the unsupervised plane segments are usually noisy and inaccurate, we propose to assign different weights to the sampled points on the plane in plane estimation as well as the regularization loss.
no code implementations • 2 Nov 2022 • Mohsin Bilal, Robert Jewsbury, Ruoyu Wang, Hammam M. AlGhamdi, Amina Asif, Mark Eastwood, Nasir Rajpoot
Image analysis and machine learning algorithms operating on multi-gigapixel whole-slide images (WSIs) often process a large number of tiles (sub-images) and require aggregating predictions from the tiles in order to predict WSI-level labels.
1 code implementation • CVPR 2023 • Ruoyu Wang, Zehao Yu, Shenghua Gao
PlaneDepth estimates the depth distribution using a Laplacian Mixture Model based on orthogonal planes for an input image.
1 code implementation • 19 Aug 2022 • Chao Chen, Zegang Cheng, Xinhao Liu, Yiming Li, Li Ding, Ruoyu Wang, Chen Feng
Inspired by noisy label learning, we propose a novel self-supervised framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods.
no code implementations • 14 Jul 2022 • Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma
The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data.
1 code implementation • 23 Jun 2022 • Adam Shephard, Mostafa Jahanifar, Ruoyu Wang, Muhammad Dawood, Simon Graham, Kastytis Sidlauskas, Syed Ali Khurram, Nasir Rajpoot, Shan E Ahmed Raza
The Tumor InfiltratinG lymphocytes in breast cancER (TiGER) challenge, aims to assess the prognostic significance of computer-generated TILs scores for predicting survival as part of a Cox proportional hazards model.
no code implementations • 16 Jun 2022 • Ruoyu Wang, Syed Ali Khurram, Amina Asif, Lawrence Young, Nasir Rajpoot
Unique gene expression profiles were also identified with respect to HPV infection status, and is in line with existing findings.
no code implementations • CVPR 2022 • Ruoyu Wang, Mingyang Yi, Zhitang Chen, Shengyu Zhu
In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature.
1 code implementation • 8 Feb 2022 • Zhenkun Shi, Qianqian Yuan, Ruoyu Wang, Hoaran Li, Xiaoping Liao, Hongwu Ma
Take UniPort protein "A0A0U5GJ41" as an example (1. 14.-.-), ECRECer annotated it with "1. 14. 11. 38", which supported by further protein structure analysis based on AlphaFold2.
no code implementations • 29 Dec 2021 • Ruoyu Wang
Active learning (AL) is a machine learning algorithm that can achieve greater accuracy with fewer labeled training instances, for having the ability to ask oracles to label the most valuable unlabeled data chosen iteratively and heuristically by query strategies.
no code implementations • 29 Sep 2021 • Ruoyu Wang, Mingyang Yi, Shengyu Zhu, Zhitang Chen
In this work, we obviate these assumptions and tackle the OOD problem without explicitly recovering the causal feature.
1 code implementation • Findings (ACL) 2021 • Kuntal Kumar Pal, Kazuaki Kashihara, Pratyay Banerjee, Swaroop Mishra, Ruoyu Wang, Chitta Baral
We must read the whole text to identify the relevant information or identify the instruction flows to complete a task, which is prone to failures.
no code implementations • 10 Apr 2021 • Ruoyu Wang, Xuchu Xu, Li Ding, Yang Huang, Chen Feng
PoseNet can map a photo to the position where it is taken, which is appealing in robotics.
1 code implementation • 4 Dec 2020 • Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma
Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.
1 code implementation • CVPR 2020 • Wenyu Han, Siyuan Xiang, Chenhui Liu, Ruoyu Wang, Chen Feng
Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses.
1 code implementation • 8 Apr 2019 • Ruoyu Wang, Shiheng Wang, Songyu Du, Erdong Xiao, Wenzhen Yuan, Chen Feng
Soft bodies made from flexible and deformable materials are popular in many robotics applications, but their proprioceptive sensing has been a long-standing challenge.
Robotics