Search Results for author: Yi Wang

Found 294 papers, 137 papers with code

Chinese Grammatical Error Correction Based on Hybrid Models with Data Augmentation

no code implementations AACL (NLP-TEA) 2020 Yi Wang, Ruibin Yuan, Yan‘gen Luo, Yufang Qin, NianYong Zhu, Peng Cheng, Lihuan Wang

A better Chinese Grammatical Error Diagnosis (CGED) system for automatic Grammatical Error Correction (GEC) can benefit foreign Chinese learners and lower Chinese learning barriers.

Data Augmentation Grammatical Error Correction

DoTAT: A Domain-oriented Text Annotation Tool

1 code implementation ACL 2022 Yupian Lin, Tong Ruan, Ming Liang, Tingting Cai, Wen Du, Yi Wang

Secondly, the tool provides annotation of events, nested event and nested entity, which are frequently required in domain-related text structuring tasks.

text annotation

Make Your Training Flexible: Towards Deployment-Efficient Video Models

1 code implementation18 Mar 2025 Chenting Wang, Kunchang Li, Tianxiang Jiang, Xiangyu Zeng, Yi Wang, LiMin Wang

By making the sampling grid flexible and leveraging token selection, it is easily adopted in most popular video training frameworks, boosting model robustness with nearly no additional cost.

Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation

1 code implementation13 Mar 2025 Leonard Waldmann, Ando Shah, Yi Wang, Nils Lehmann, Adam J. Stewart, Zhitong Xiong, Xiao Xiang Zhu, Stefan Bauer, John Chuang

Earth observation (EO) data features diverse sensing platforms with varying spectral bands, spatial resolutions, and sensing modalities.

Earth Observation

GeoLangBind: Unifying Earth Observation with Agglomerative Vision-Language Foundation Models

1 code implementation8 Mar 2025 Zhitong Xiong, Yi Wang, Weikang Yu, Adam J Stewart, Jie Zhao, Nils Lehmann, Thomas Dujardin, Zhenghang Yuan, Pedram Ghamisi, Xiao Xiang Zhu

Earth observation (EO) data, collected from diverse sensors with varying imaging principles, present significant challenges in creating unified analytical frameworks.

Earth Observation

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

no code implementations7 Mar 2025 Guanghao Zhang, Tao Zhong, Yan Xia, Zhelun Yu, Haoyuan Li, Wanggui He, Fangxun Shu, Mushui Liu, Dong She, Yi Wang, Hao Jiang

The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals.

Image Comprehension Memorization

WhatELSE: Shaping Narrative Spaces at Configurable Level of Abstraction for AI-bridged Interactive Storytelling

no code implementations25 Feb 2025 Zhuoran Lu, Qian Zhou, Yi Wang

Generative AI significantly enhances player agency in interactive narratives (IN) by enabling just-in-time content generation that adapts to player actions.

OccProphet: Pushing Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with Observer-Forecaster-Refiner Framework

1 code implementation21 Feb 2025 Junliang Chen, Huaiyuan Xu, Yi Wang, Lap-Pui Chau

OccProphet reduces 58\%$\sim$78\% of the computational cost with a 2. 6$\times$ speedup compared with the state-of-the-art Cam4DOcc.

Autonomous Driving

Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Models

no code implementations17 Feb 2025 Jiecheng Zhou, Ding Tang, Rong Fu, Boni Hu, Haoran Xu, Yi Wang, Zhilin Pei, Zhongling Su, Liang Liu, Xingcheng Zhang, Weiming Zhang

The burgeoning computational demands for training large language models (LLMs) necessitate efficient methods, including quantized training, which leverages low-bit arithmetic operations to reduce costs.

Quantization

DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

no code implementations17 Feb 2025 Yi Wang, Fenghua Weng, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang

Large Language Models (LLMs) are widely applied in decision making, but their deployment is threatened by jailbreak attacks, where adversarial users manipulate model behavior to bypass safety measures.

Decision Making Language Modeling +5

CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers

no code implementations10 Feb 2025 D. She, Mushui Liu, Jingxuan Pang, Jin Wang, Zhen Yang, Wanggui He, Guanghao Zhang, Yi Wang, Qihan Huang, Haobin Tang, Yunlong Yu, Siming Fu

Customized generation has achieved significant progress in image synthesis, yet personalized video generation remains challenging due to temporal inconsistencies and quality degradation.

Image Generation Video Generation

Learning Generalizable Features for Tibial Plateau Fracture Segmentation Using Masked Autoencoder and Limited Annotations

no code implementations5 Feb 2025 Peiyan Yue, Die Cai, Chu Guo, Mengxing Liu, Jun Xia, Yi Wang

Accurate automated segmentation of tibial plateau fractures (TPF) from computed tomography (CT) requires large amounts of annotated data to train deep learning models, but obtaining such annotations presents unique challenges.

Computed Tomography (CT)

Fuzzy-aware Loss for Source-free Domain Adaptation in Visual Emotion Recognition

no code implementations26 Jan 2025 Ying Zheng, Yiyi Zhang, Yi Wang, Lap-Pui Chau

Source-free domain adaptation in visual emotion recognition (SFDA-VER) is a highly challenging task that requires adapting VER models to the target domain without relying on source data, which is of great significance for data privacy protection.

Emotion Recognition Image Classification +1

DiffVSR: Enhancing Real-World Video Super-Resolution with Diffusion Models for Advanced Visual Quality and Temporal Consistency

no code implementations17 Jan 2025 Xiaohui Li, Yihao Liu, Shuo Cao, Ziyan Chen, Shaobin Zhuang, Xiangyu Chen, Yinan He, Yi Wang, Yu Qiao

Diffusion models have demonstrated exceptional capabilities in image generation and restoration, yet their application to video super-resolution faces significant challenges in maintaining both high fidelity and temporal consistency.

Decoder Image Generation +1

MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning

no code implementations13 Jan 2025 Tieyuan Chen, Huabin Liu, Yi Wang, Yihang Chen, Tianyao He, Chaofan Gan, Huanyu He, Weiyao Lin

Given visual segments and textual descriptions of events, MECD identifies the causal associations between these events to derive a comprehensive and structured event-level video causal graph explaining why and how the result event occurred.

Causal Discovery counterfactual +4

Salient Region Matching for Fully Automated MR-TRUS Registration

1 code implementation7 Jan 2025 Zetian Feng, Dong Ni, Yi Wang

The registration of magnetic resonance (MR) and transrectal ultrasound (TRUS) can provide guidance for the targeted biopsy of prostate cancer.

Segmentation

SELMA3D challenge: Self-supervised learning for 3D light-sheet microscopy image segmentation

no code implementations7 Jan 2025 Ying Chen, Rami Al-Maskari, Izabela Horvath, Mayar Ali, Luciano Hoher, Kaiyuan Yang, Zengming Lin, Zhiwei Zhai, Mengzhe Shen, Dejin Xun, Yi Wang, Tony Xu, Maged Goubran, Yunheng Wu, Kensaku MORI, Johannes C. Paetzold, Ali Erturk

Combined with the progress in large-scale data analysis, driven by deep learning, these innovations empower researchers to rapidly investigate the morphological and functional properties of diverse biological samples.

Image Segmentation Self-Supervised Learning +1

Stochastically Constrained Best Arm Identification with Thompson Sampling

no code implementations7 Jan 2025 Le Yang, Siyang Gao, Cheng Li, Yi Wang

We consider the problem of the best arm identification in the presence of stochastic constraints, where there is a finite number of arms associated with multiple performance measures.

Thompson Sampling

Interpretable Load Forecasting via Representation Learning of Geo-distributed Meteorological Factors

no code implementations4 Jan 2025 Yangze Zhou, Guoxin Lin, Gonghao Zhang, Yi Wang

However, the difference in MF collected in various locations within a region may be significant, which poses a challenge in selecting the appropriate MF from numerous locations.

Load Forecasting Representation Learning

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

2 code implementations31 Dec 2024 Xinhao Li, Yi Wang, Jiashuo Yu, Xiangyu Zeng, Yuhan Zhu, Haian Huang, Jianfei Gao, Kunchang Li, Yinan He, Chenting Wang, Yu Qiao, Yali Wang, LiMin Wang

This paper introduces a Hierarchical visual token Compression (HiCo) method designed for high-fidelity representation and a practical context modeling system VideoChat-Flash tailored for multimodal long-sequence processing.

Memorization

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

1 code implementation26 Dec 2024 Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, LiMin Wang, Yi Wang

Current multimodal large language models (MLLMs) struggle with fine-grained or precise understanding of visuals though they give comprehensive perception and reasoning in a spectrum of vision applications.

Tracking the Feature Dynamics in LLM Training: A Mechanistic Study

no code implementations23 Dec 2024 Yang Xu, Yi Wang, Hao Wang

Understanding training dynamics and feature evolution is crucial for the mechanistic interpretability of large language models (LLMs).

LLM Agent for Fire Dynamics Simulations

no code implementations22 Dec 2024 Leidong Xu, Danyal Mohaddes, Yi Wang

FoamPilot provides three core functionalities: code insight, case configuration and simulation evaluation.

RAG

QSM-RimDS: A detection and segmentation tool for paramagnetic rim lesions in multiple sclerosis

no code implementations13 Dec 2024 Ha Luu, Mert Sisman, Ilhami Kovanlikaya, Tam Vu, Pascal Spincemaille, Yi Wang, Francesca Bagnato, Susan Gauthier, Thanh Nguyen

Deep learning-based QSM-RimNet can provide automated PRL detection, but this method does not provide rim segmentation for microglial density quantification and requires precise QSM lesion masks.

Lesion Detection Segmentation

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

1 code implementation11 Dec 2024 Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, LiMin Wang

In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation.

SyncVIS: Synchronized Video Instance Segmentation

1 code implementation1 Dec 2024 Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao

Recent DETR-based methods have advanced the development of Video Instance Segmentation (VIS) through transformers' efficiency and capability in modeling spatial and temporal information.

Instance Segmentation Segmentation +2

Point Cloud Understanding via Attention-Driven Contrastive Learning

no code implementations22 Nov 2024 Yi Wang, Jiaze Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng

Recently Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms, however, these methods often overlook latent information in less prominent regions, leading to increased sensitivity to perturbations and limited global comprehension.

Contrastive Learning Few-Shot Learning

Analysis of the impact of heterogeneous platoon for mixed traffic flow: control strategy, fuel consumption and emissions

no code implementations22 Nov 2024 Yunxia Wu, Le Li, Zhihong Yao, Yi Wang, Gen Li, Yangsheng Jiang

Finally, numerical experiments were conducted to calculate the average fuel consumption and pollutant emissions of mixed traffic flow under different spacing control strategies, and the impact of platoon spacing control strategies on traffic flow fuel consumption and pollutant emissions was further analyzed.

SignEye: Traffic Sign Interpretation from Vehicle First-Person View

no code implementations18 Nov 2024 Chuang Yang, Xu Han, Tao Han, Yuejiao Su, Junyu Gao, Hongyuan Zhang, Yi Wang, Lap-Pui Chau

Meanwhile, we develop a traffic guidance assistant (TGA) scenario application to re-explore the role of traffic signs in ADS as a complement to popular autonomous technologies (such as obstacle perception).

Autonomous Driving

Unsupervised Congestion Status Identification Using LMP Data

no code implementations15 Nov 2024 Kedi Zheng, Qixin Chen, Yi Wang, Chongqing Kang, Le Xie

The congestion part of LMPs is spanned by certain row vectors of the power transfer distribution factor (PTDF) matrix, and the subspace attributes of an LMP vector uniquely are found to reflect the instantaneous congestion status of all the transmission lines.

A Novel Combined Data-Driven Approach for Electricity Theft Detection

no code implementations11 Nov 2024 Kedi Zheng, Qixin Chen, Yi Wang, Chongqing Kang, Qing Xia

One technique is the Maximum Information Coefficient (MIC), which can find the correlations between the non-technical loss (NTL) and a certain electricity behavior of the consumer.

Coherent Hierarchical Probabilistic Forecasting of Electric Vehicle Charging Demand

no code implementations1 Nov 2024 Kedi Zheng, Hanwei Xu, Zeyang Long, Yi Wang, Qixin Chen

The growing penetration of electric vehicles (EVs) significantly changes typical load curves in smart grids.

quantile regression

ByteNet: Rethinking Multimedia File Fragment Classification through Visual Perspectives

1 code implementation28 Oct 2024 Wenyang Liu, Kejun Wu, Tianyi Liu, Yi Wang, Kim-Hui Yap, Lap-Pui Chau

By looking inside bytes, the bit-level details of file fragments can be accessed, enabling a more accurate classification.

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

no code implementations25 Oct 2024 Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, LiMin Wang

This paper proposes TimeSuite, a collection of new designs to adapt the existing short-form video MLLMs for long video understanding, including a simple yet efficient framework to process long video sequence, a high-quality video dataset for grounded tuning of MLLMs, and a carefully-designed instruction tuning task to explicitly incorporate the grounding supervision in the traditional QA format.

Ranked #7 on Moment Retrieval on Charades-STA (using extra training data)

EgoSchema Hallucination +2

Multi-Level Speaker Representation for Target Speaker Extraction

1 code implementation21 Oct 2024 Ke Zhang, Junjie Li, Shuai Wang, Yangjie Wei, Yi Wang, Yannan Wang, Haizhou Li

In this work, we propose a multi-level speaker representation approach, from raw features to neural embeddings, to serve as the speaker reference cue.

Target Speaker Extraction

Open World Object Detection: A Survey

no code implementations15 Oct 2024 Yiming Li, Yi Wang, Wenqian Wang, Dan Lin, Bingbing Li, Kim-Hui Yap

Exploring new knowledge is a fundamental human ability that can be mirrored in the development of deep neural networks, especially in the field of object detection.

Incremental Learning Object +4

FiRework: Field Refinement Framework for Efficient Enhancement of Deformable Registration

1 code implementation12 Oct 2024 Haiqiao Wang, Dong Ni, Yi Wang

In FiRework, we redesign the continuous deformation framework to mitigate the aforementioned errors.

Image Registration

Task-oriented Time Series Imputation Evaluation via Generalized Representers

1 code implementation9 Oct 2024 Zhixian Wang, Linxiao Yang, Liang Sun, Qingsong Wen, Yi Wang

Time series analysis is widely used in many fields such as power energy, economics, and transportation, including different tasks such as forecasting, anomaly detection, classification, etc.

Anomaly Detection Imputation +3

XTRUST: On the Multilingual Trustworthiness of Large Language Models

1 code implementation24 Sep 2024 Yahan Li, Yi Wang, Yi Chang, Yuan Wu

Large language models (LLMs) have demonstrated remarkable capabilities across a range of natural language processing (NLP) tasks, capturing the attention of both practitioners and the broader public.

Ethics Fairness +2

Towards Real-world Deployment of NILM Systems: Challenges and Practices

no code implementations23 Sep 2024 Junyu Xue, Yu Zhang, Xudong Wang, Yi Wang, Guoming Tang

Non-intrusive load monitoring (NILM), as a key load monitoring technology, can much reduce the deployment cost of traditional power sensors.

Non-Intrusive Load Monitoring

Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras

1 code implementation7 Sep 2024 Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Li Ma, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang

However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation.

3DGS

NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

1 code implementation21 Aug 2024 Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan

To alleviate the labor-intensive requirement of manual prompts, we introduce a Gaussian-Kernel Prompt Encoder (GKP-Encoder) to generate density maps driven by a single point, which guides segmentation predictions by mixing position prompts and semantic prompts.

Decoder Domain Generalization +4

A Survey of Embodied Learning for Object-Centric Robotic Manipulation

1 code implementation21 Aug 2024 Ying Zheng, Lei Yao, Yuejiao Su, Yi Zhang, Yi Wang, Sicheng Zhao, Yiyi Zhang, Lap-Pui Chau

Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI.

Imitation Learning Object +1

Xinyu: An Efficient LLM-based System for Commentary Generation

no code implementations21 Aug 2024 Yiquan Wu, Bo Tang, Chenyang Xi, Yu Yu, Pengyu Wang, Yifei Liu, Kun Kuang, Haiying Deng, Zhiyu Li, Feiyu Xiong, Jie Hu, Peng Cheng, Zhonghao Wang, Yi Wang, Yi Luo, MingChuan Yang

To address the advanced requirements, we present an argument ranking model for arguments and establish a comprehensive evidence database that includes up-to-date events and classic books, thereby strengthening the substantiation of the evidence with retrieval augmented generation (RAG) technology.

RAG Text Generation

HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model

no code implementations21 Aug 2024 Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-Kun Lai, Kun Li

Specifically, to achieve layer-wise clothing generation, we propose a dual-representation decoupling framework for generating clothing decoupled from the human body, in conjunction with an innovative multi-layer fusion volume rendering method.

Human Animation Virtual Try-on

G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

1 code implementation18 Aug 2024 Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

This paper introduces G\textsuperscript{2}Face, which leverages both generative and geometric priors to enhance identity manipulation, achieving high-quality reversible face anonymization without compromising data utility.

Decoder Face Anonymization +1

PADetBench: Towards Benchmarking Physical Attacks against Object Detection

2 code implementations17 Aug 2024 Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Lap-Pui Chau, Shaohui Mei

Moreover, physical dynamics and cross-domain transformation are challenging to strictly regulate in the real world, leading to unaligned evaluation and comparison, severely hindering the development of physically robust models.

Adversarial Robustness Benchmarking +5

FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

1 code implementation15 Aug 2024 Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations.

Computational Efficiency Scheduling

SpectralEarth: Training Hyperspectral Foundation Models at Scale

1 code implementation15 Aug 2024 Nassim Ait Ali Braham, Conrad M Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, Xiao Xiang Zhu

To close this gap, we introduce SpectralEarth, a large-scale multi-temporal dataset designed to pretrain hyperspectral foundation models leveraging data from the Environmental Mapping and Analysis Program (EnMAP).

Computational Efficiency Crop Type Mapping +1

Context-aware knowledge graph framework for traffic speed forecasting using graph neural network

1 code implementation25 Jul 2024 Yatao Zhang, Yi Wang, Song Gao, Martin Raubal

This study proposes a novel context-aware knowledge graph (CKG) framework to enhance traffic speed forecasting by effectively modeling spatial and temporal contexts.

Graph Neural Network Knowledge Graphs

ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

1 code implementation19 Jul 2024 Qing Xu, Jiaxuan Li, Xiangjian He, Ziyu Liu, Zhen Chen, Wenting Duan, Chenxin Li, Maggie M. He, Fiseha B. Tesema, Wooi P. Cheah, Yi Wang, Rong Qu, Jonathan M. Garibaldi

Finally, we design the Query-Decoupled Modality Decoder (QDMD) that leverages a one-to-one strategy to provide an independent decoding channel for every modality.

Decoder Image Segmentation +5

Internal Consistency and Self-Feedback in Large Language Models: A Survey

1 code implementation19 Jul 2024 Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Yi Wang, Zhonghao Wang, Feiyu Xiong, Zhiyu Li

In this paper, we use a unified perspective of internal consistency, offering explanations for reasoning deficiencies and hallucinations.

SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

1 code implementation16 Jul 2024 Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau

Specifically, the principle of our SMQ initialization scheme is to leverage the predicted voxel-wise semantic information to implicitly generate the scene-aware query, yielding adequate scene prior and compensating for the learnable query set.

3D Instance Segmentation Decoder +1

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

1 code implementation10 Jul 2024 Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, Leilei Gan, Hao Jiang

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis.

Image Generation Text Generation

ORMNet: Object-centric Relationship Modeling for Egocentric Hand-object Segmentation

1 code implementation8 Jul 2024 Yuejiao Su, Yi Wang, Lap-Pui Chau

Egocentric hand-object segmentation (EgoHOS) is a promising new task aiming at segmenting hands and interacting objects in egocentric images.

Decoder Object +3

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

no code implementations3 Jul 2024 Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

Experiments are conducted on four tasks: the English UASpeech and TORGO dysarthric speech corpora; and the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech datasets.

Alzheimer's Disease Detection Self-Supervised Learning +2

A Narrative Review of Image Processing Techniques Related to Prostate Ultrasound

no code implementations30 Jun 2024 Haiqiao Wang, Hong Wu, Zhuoyuan Wang, Peiyan Yue, Dong Ni, Pheng-Ann Heng, Yi Wang

In consequence, this survey provides a \textcolor{blue}{narrative } analysis of this field, outlining the evolution of image processing methods in the context of TRUS image analysis and meanwhile highlighting their relevant contributions.

Image Registration Prognosis +1

Encoding Matching Criteria for Cross-domain Deformable Image Registration

1 code implementation18 Jun 2024 Zhuoyuan Wang, Haiqiao Wang, Yi Wang

Most existing deep learning-based registration methods are trained on single-type images to address same-domain tasks. However, cross-domain deformable registration remains challenging. We argue that the tailor-made matching criteria in traditional registration methods is one of the main reason they are applicable in different domains. Motivated by this, we devise a registration-oriented encoder to model the matching criteria of image features and structural features, which is beneficial to boost registration accuracy and adaptability. Specifically, a general feature encoder (Encoder-G) is proposed to capture comprehensive medical image features, while a structural feature encoder (Encoder-S) is designed to encode the structural self-similarity into the global representation. Extensive experiments on images from three different domains prove the efficacy of the proposed method.

Image Registration One-Shot Learning

Data-driven Power Flow Linearization: Simulation

no code implementations10 Jun 2024 Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

Subsequently, this paper evaluates a total of 44 methods, containing over 30 existing DPFL approaches, some innovative DPFL techniques, and several classic physics-driven power flow linearization methods for benchmarking.

Benchmarking Computational Efficiency

Data-driven Power Flow Linearization: Theory

no code implementations10 Jun 2024 Mengshuo Jia, Gabriela Hug, Ning Zhang, Zhaojian Wang, Yi Wang, Chongqing Kang

Further, this tutorial implements extensive numerical comparisons of all existing DPFL methods (40 methods in total) and four classic physics-driven approaches, focusing on their generalizability, applicability, accuracy, and computational efficiency.

Computational Efficiency

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining

1 code implementation30 May 2024 Yi Wang, Conrad M Albrecht, Xiao Xiang Zhu

Second, we revisit and explore cross-domain continual pretraining for both multispectral and SAR imagery, building efficient EO foundation models from strongest vision models such as DINOv2.

Continual Pretraining Contrastive Learning +1

StoryVerse: Towards Co-authoring Dynamic Plot with LLM-based Character Simulation via Narrative Planning

no code implementations17 May 2024 Yi Wang, Qian Zhou, David Ledo

The process creates "living stories" that dynamically adapt to various game world states, resulting in narratives co-created by the author, character simulation, and player.

Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

no code implementations9 May 2024 Sheng Yan, Xin Du, Zongying Li, Yi Wang, Hongcang Jin, Mengyuan Liu

Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments.

A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

1 code implementation8 May 2024 Huaiyuan Xu, Junliang Chen, Shiyu Meng, Yi Wang, Lap-Pui Chau

A comprehensive list of studies in this survey is publicly available in an active repository that continuously collects the latest work: https://github. com/HuaiyuanXu/3D-Occupancy-Perception.

Autonomous Driving Survey

On the Foundations of Earth and Climate Foundation Models

no code implementations7 May 2024 Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model.

CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation

no code implementations2 May 2024 Chenying Liu, Conrad Albrecht, Yi Wang, Xiao Xiang Zhu

We study the potential of noisy labels y to pretrain semantic segmentation models in a multi-modal learning framework for geospatial applications.

Image Segmentation Semantic Segmentation +1

Federated Graph Learning for EV Charging Demand Forecasting with Personalization Against Cyberattacks

no code implementations30 Apr 2024 Yi Li, Renyou Xie, Chaojie Li, Yi Wang, ZhaoYang Dong

To address these challenges, a federated graph learning approach involving multiple charging stations is proposed to collaboratively train a more generalized deep learning model for demand forecasting while capturing spatial correlations among various stations and enhancing robustness against potential attacks.

Demand Forecasting Federated Learning +2

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data

1 code implementation29 Apr 2024 Yiyuan Yang, Ming Jin, Haomin Wen, Chaoli Zhang, Yuxuan Liang, Lintao Ma, Yi Wang, Chenghao Liu, Bin Yang, Zenglin Xu, Jiang Bian, Shirui Pan, Qingsong Wen

Conditioned models, on the other hand, utilize extra information to enhance performance and are similarly divided for both predictive and generative tasks.

Anomaly Detection Imputation +1

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

1 code implementation28 Apr 2024 Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.

In-Context Learning Music Generation

Exploring Kinetic Curves Features for the Classification of Benign and Malignant Breast Lesions in DCE-MRI

1 code implementation22 Apr 2024 Zixian Li, Yuming Zhong, Yi Wang

In this study, we propose to fully leverage the dynamic characteristics from the kinetic curves as well as the radiomic features to boost the classification accuracy of benign and malignant breast lesions.

Prognosis

Texture Classification Network Integrating Adaptive Wavelet Transform

no code implementations8 Apr 2024 Su-Xi Yu, Jing-Yuan He, Yi Wang, Yu-Jiao Cai, Jun Yang, Bo Lin, Wei-Bin Yang, Jian Ruan

Graves' disease is a common condition that is diagnosed clinically by determining the smoothness of the thyroid texture and its morphology in ultrasound images.

Classification Texture Classification

Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

1 code implementation2 Apr 2024 Zhuoyuan Wang, Dong Sun, Xiangyun Zeng, Ruodai Wu, Yi Wang

Accordingly, we propose a contextual embedding learning approach to facilitate 2D CNNs capturing spatial information properly.

Image Segmentation Segmentation +1

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

2 code implementations22 Mar 2024 Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Chenting Wang, Guo Chen, Baoqi Pei, Ziang Yan, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei HUANG, Yu Qiao, Yali Wang, LiMin Wang

We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue.

Action Classification Action Recognition +13

Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation

1 code implementation22 Mar 2024 Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, Xiao Xiang Zhu

The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data.

Earth Observation

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

1 code implementation14 Mar 2024 Yunfei Cheng, Aonan Zhang, Xuanyu Zhang, Chong Wang, Yi Wang

We present Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art speedup for large language models (LLMs) inference.

Benchmarking Knowledge Distillation

Non-Intrusive Load Monitoring in Smart Grids: A Comprehensive Review

no code implementations11 Mar 2024 Yinyan Liu, Yi Wang, Jin Ma

Non-Intrusive Load Monitoring (NILM) is pivotal in today's energy landscape, offering vital solutions for energy conservation and efficient management.

Management Non-Intrusive Load Monitoring

VideoMamba: State Space Model for Efficient Video Understanding

2 code implementations11 Mar 2024 Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, LiMin Wang, Yu Qiao

Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain.

Action Classification Mamba +2

Learning to Maximize Mutual Information for Chain-of-Thought Distillation

1 code implementation5 Mar 2024 Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, Ke Ding

Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment.

Knowledge Distillation Language Modeling +2

AIO2: Online Correction of Object Labels for Deep Learning with Incomplete Annotation in Remote Sensing Image Segmentation

1 code implementation3 Mar 2024 Chenying Liu, Conrad M Albrecht, Yi Wang, Qingyu Li, Xiao Xiang Zhu

AIO2 utilizes a mean teacher model to enhance training robustness with noisy labels to both stabilize the training accuracy curve for fitting in ACT and provide pseudo labels for correction in O2C.

Earth Observation Image Segmentation +1

Task Specific Pretraining with Noisy Labels for Remote Sensing Image Segmentation

no code implementations25 Feb 2024 Chenying Liu, Conrad M Albrecht, Yi Wang, Xiao Xiang Zhu

Compared to supervised deep learning, self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations.

Image Segmentation Segmentation +2

Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer

1 code implementation14 Feb 2024 Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zhou, Jianhua Zhou, Yi Wang

With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos.

Video Classification

Pyramid Attention Network for Medical Image Registration

1 code implementation14 Feb 2024 Zhuoyuan Wang, Haiqiao Wang, Yi Wang

The advent of deep-learning-based registration networks has addressed the time-consuming challenge in traditional iterative methods. However, the potential of current registration networks for comprehensively capturing spatial relationships has not been fully explored, leading to inadequate performance in large-deformation image registration. The pure convolutional neural networks (CNNs) neglect feature enhancement, while current Transformer-based networks are susceptible to information redundancy. To alleviate these issues, we propose a pyramid attention network (PAN) for deformable medical image registration. Specifically, the proposed PAN incorporates a dual-stream pyramid encoder with channel-wise attention to boost the feature representation. Moreover, a multi-head local attention Transformer is introduced as decoder to analyze motion patterns and generate deformation fields. Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets and one abdominal MRI dataset demonstrate that our method achieves favorable registration performance, while outperforming several CNN-based and Transformer-based registration networks. Our code is publicly available at https://github. com/JuliusWang-7/PAN.

Decoder Image Registration +1

Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks

no code implementations8 Feb 2024 Wei Wang, Huilong Ning, Gaowei Zhang, Libo Liu, Yi Wang

Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers and motivates the need for novel interaction mechanisms that help developers effectively work with large language models to achieve desired outcomes.

Learning the Market: Sentiment-Based Ensemble Trading Agents

no code implementations2 Feb 2024 Andrew Ye, James Xu, Vidyut Veedgav, Yi Wang, Yifan Yu, Daniel Yan, Ryan Chen, Vipin Chaudhary, Shuai Xu

We propose and study the integration of sentiment analysis and deep reinforcement learning ensemble algorithms for stock trading by evaluating strategies capable of dynamically altering their active agent given the concurrent market environment.

Deep Reinforcement Learning Sentiment Analysis

Explaining Time Series via Contrastive and Locally Sparse Perturbations

1 code implementation16 Jan 2024 Zichuan Liu, Yingying Zhang, Tianchun Wang, Zefan Wang, Dongsheng Luo, Mengnan Du, Min Wu, Yi Wang, Chunlin Chen, Lunting Fan, Qingsong Wen

Explaining multivariate time series is a compound challenge, as it requires identifying important locations in the time series and matching complex temporal patterns.

Contrastive Learning counterfactual +1

One for All: Toward Unified Foundation Models for Earth Vision

no code implementations15 Jan 2024 Zhitong Xiong, Yi Wang, Fahong Zhang, Xiao Xiang Zhu

Current remote sensing foundation models typically specialize in a single modality or a specific spatial resolution range, limiting their versatility for downstream datasets.

All

Seamless and multi-resolution energy forecasting

1 code implementation28 Dec 2023 Chenxi Wang, Pierre Pinson, Yi Wang

The relationship between (i) errors in both time and frequency domains and (ii) operational value of the forecasts is analysed.

Scheduling

Guidelines in Wastewater-based Epidemiology of SARS-CoV-2 with Diagnosis

no code implementations26 Dec 2023 Madiha Fatima, Zhihua Cao, Aichun Huang, Shengyuan Wu, Xinxian Fan, Yi Wang, Liu Jiren, Ziyun Zhu, Qiongrou Ye, Yuan Ma, Joseph K. F Chow, Peng Jia, Yangshou Liu, Yubin Lin, Manjun Ye, Tong Wu, ZHIXUN LI, Cong Cai, Wenhai Zhang, Cheris H. Q. Ding, Yuanzhe Cai, Feijuan Huang

With the global spread and increasing transmission rate of SARS-CoV-2, more and more laboratories and researchers are turning their attention to wastewater-based epidemiology (WBE), hoping it can become an effective tool for large-scale testing and provide more ac-curate predictions of the number of infected individuals.

Diagnostic Epidemiology

Dataset Distillation via Adversarial Prediction Matching

1 code implementation14 Dec 2023 Mingyang Chen, Bo Huang, Junda Lu, Bing Li, Yi Wang, Minhao Cheng, Wei Wang

This ensures the memory efficiency of our method and provides a flexible tradeoff between time and memory budgets, allowing us to distil ImageNet-1K using a minimum of only 6. 5GB of GPU memory.

Dataset Distillation Prediction

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

1 code implementation NeurIPS 2023 Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao

What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity.

Instance Segmentation Semantic Segmentation +1

QuickQuakeBuildings: Post-earthquake SAR-Optical Dataset for Quick Damaged-building Detection

1 code implementation11 Dec 2023 Yao Sun, Yi Wang, Michael Eineder

Quick and automated earthquake-damaged building detection from post-event satellite imagery is crucial, yet it is challenging due to the scarcity of training data required to develop robust algorithms.

Anomaly Detection Damaged Building Detection +1

Layered 3D Human Generation via Semantic-Aware Diffusion Model

no code implementations10 Dec 2023 Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-Kun Lai, Yebin Liu, Kun Li

To keep the generated clothing consistent with the target text, we propose a semantic-confidence strategy for clothing that can eliminate the non-clothing content generated by the model.

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

3 code implementations CVPR 2024 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Diagnostic Fairness +11

Multi-delay arterial spin-labeled perfusion estimation with biophysics simulation and deep learning

no code implementations17 Nov 2023 Renjiu Hu, Qihao Zhang, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang

The trained network was further tested in a synthetic brain ASL image based on vasculature network extracted from magnetic resonance (MR) angiography.

Load Data Valuation in Multi-Energy Systems: An End-to-End Approach

no code implementations16 Nov 2023 Yangze Zhou, Qingsong Wen, Jie Song, Xueyuan Cui, Yi Wang

Accurate load forecasting serves as the foundation for the flexible operation of multi-energy systems (MES).

Data Valuation Load Forecasting

Goal-Oriented Wireless Communication Resource Allocation for Cyber-Physical Systems

no code implementations6 Nov 2023 Cheng Feng, Kedi Zheng, Yi Wang, Kaibin Huang, Qixin Chen

We formulate a bandwidth allocation problem aimed at maximizing the information utility gain of transmitted data brought to CPS operation goals.

Decision Making Distributed Optimization +1

Harvest Video Foundation Models via Efficient Post-Pretraining

1 code implementation30 Oct 2023 Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, LiMin Wang, Yu Qiao, Ping Luo

Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets.

Question Answering Text Retrieval +2

Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing

1 code implementation28 Oct 2023 Yi Wang, Hugo Hernández Hernández, Conrad M Albrecht, Xiao Xiang Zhu

Self-supervised learning guided by masked image modelling, such as Masked AutoEncoder (MAE), has attracted wide attention for pretraining vision transformers in remote sensing.

Multi-Label Image Classification Self-Supervised Learning

Video Referring Expression Comprehension via Transformer with Content-conditioned Query

no code implementations25 Oct 2023 Ji Jiang, Meng Cao, Tengtao Song, Long Chen, Yi Wang, Yuexian Zou

Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language.

cross-modal alignment Referring Expression +2

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

6 code implementations16 Oct 2023 Ming Jin, Qingsong Wen, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, XiaoLi Li, Shirui Pan, Vincent S. Tseng, Yu Zheng, Lei Chen, Hui Xiong

In this survey, we offer a comprehensive and up-to-date review of large models tailored (or adapted) for time series and spatio-temporal data, spanning four key facets: data types, model categories, model scopes, and application areas/tasks.

Time Series Time Series Analysis

Boosting High Resolution Image Classification with Scaling-up Transformers

1 code implementation26 Sep 2023 Yi Wang

We present a holistic approach for high resolution image classification that won second place in the ICCV/CVPPA2023 Deep Nutrient Deficiency Challenge.

Classification Data Augmentation +3

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations26 Sep 2023 Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Text-to-Video Generation Video Generation +1

PlotMap: Automated Layout Design for Building Game Worlds

1 code implementation26 Sep 2023 Yi Wang, Jieliang Luo, Adam Gaier, Evan Atherton, Hilmar Koch

We develop a method of generating datasets of facility layout tasks, create a gym-like environment for experimenting with and evaluating different methods, and further analyze the two methods with comprehensive experiments, aiming to provide insights for solving facility layout tasks.

Decision Making Layout Design +1

Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method

1 code implementation NeurIPS 2023 Tianyi Liu, Kejun Wu, Yi Wang, Wenyang Liu, Kim-Hui Yap, Lap-Pui Chau

The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment.

Video Inpainting

OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple Pedestrian Tracking

no code implementations19 Sep 2023 Jianjun Gao, Yi Wang, Kim-Hui Yap, Kratika Garg, Boon Siew Han

Particularly, the improvements on IDF1, IDSw, AssA, and AssR demonstrate the effectiveness of our OccluTrack on tracking and association performance.

Motion Estimation

Representation Learning for Sequential Volumetric Design Tasks

no code implementations5 Sep 2023 Md Ferdous Alam, Yi Wang, Chin-Yi Cheng, Jieliang Luo

We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation.

Representation Learning

Joint Oscillation Damping and Inertia Provision Service for Converter-Interfaced Generation

no code implementations4 Sep 2023 Cheng Feng, Linbin Huang, Xiuqiang He, Yi Wang, Florian Dörfler, Qixin Chen

To address this gap, this paper defines the joint oscillation damping and inertia provision services at the system level, seeking to encourage converter-interfaced generation to provide enhanced damping and fast frequency response capabilities.

Deep Semantic Model Fusion for Ancient Agricultural Terrace Detection

1 code implementation4 Aug 2023 Yi Wang, Chenying Liu, Arti Tiwari, Micha Silver, Arnon Karnieli, Xiao Xiang Zhu, Conrad M Albrecht

Discovering ancient agricultural terraces in desert regions is important for the monitoring of long-term climate changes on the Earth's surface.

Segmentation Semantic Segmentation

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation ICCV 2023 Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

SimPLe: Similarity-Aware Propagation Learning for Weakly-Supervised Breast Cancer Segmentation in DCE-MRI

1 code implementation29 Jun 2023 Yuming Zhong, Yi Wang

The network first utilizes the pseudo-masks generated using the extreme points to train itself, by minimizing a contrastive loss, which encourages the network to learn more representative features for cancerous voxels.

Prognosis Segmentation

Semi-Supervised Learning for hyperspectral images by non parametrically predicting view assignment

no code implementations19 Jun 2023 Shivam Pande, Nassim Ait Ali Braham, Yi Wang, Conrad M Albrecht, Biplab Banerjee, Xiao Xiang Zhu

Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting.

Pseudo Label

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

no code implementations15 Jun 2023 Junting Pan, Ziyi Lin, Yuying Ge, Xiatian Zhu, Renrui Zhang, Yi Wang, Yu Qiao, Hongsheng Li

Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs).

Ranked #3 on Temporal/Casual QA on NExT-QA (using extra training data)

cross-modal alignment Domain Generalization +3

SaDI: A Self-adaptive Decomposed Interpretable Framework for Electric Load Forecasting under Extreme Events

no code implementations14 Jun 2023 Hengbo Liu, Ziqing Ma, Linxiao Yang, Tian Zhou, Rui Xia, Yi Wang, Qingsong Wen, Liang Sun

In this paper, we propose a novel forecasting framework, named Self-adaptive Decomposed Interpretable framework~(SaDI), which ensembles long-term trend, short-term trend, and period modelings to capture temporal characteristics in different components.

Load Forecasting Management

Top-Down Framework for Weakly-supervised Grounded Image Captioning

1 code implementation13 Jun 2023 Chen Cai, Suchen Wang, Kim-Hui Yap, Yi Wang

Weakly-supervised grounded image captioning (WSGIC) aims to generate the caption and ground (localize) predicted object words in the input image without using bounding box supervision.

Image Captioning Multi-Label Classification +3

ModeT: Learning Deformable Image Registration via Motion Decomposition Transformer

1 code implementation9 Jun 2023 Haiqiao Wang, Dong Ni, Yi Wang

The Transformer structures have been widely used in computer vision and have recently made an impact in the area of medical image registration.

Image Registration Medical Image Registration

DiffLoad: Uncertainty Quantification in Electrical Load Forecasting with the Diffusion Model

1 code implementation31 May 2023 Zhixian Wang, Qingsong Wen, Chaoli Zhang, Liang Sun, Yi Wang

The uncertainties in load forecasting can be divided into two types: epistemic uncertainty and aleatoric uncertainty.

Decision Making energy management +3

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

1 code implementation24 May 2023 Zhitong Xiong, Sining Chen, Yi Wang, Lichao Mou, Xiao Xiang Zhu

Towards a fair and comprehensive analysis of existing methods, the proposed benchmark consists of 1) a large-scale dataset including co-registered RGB and nDSM pairs and pixel-wise semantic labels; 2) a comprehensive evaluation and analysis of existing multi-modal fusion strategies for both convolutional and Transformer-based networks on remote sensing data.

Segmentation Semantic Segmentation

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation22 May 2023 Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Decoder Video Understanding

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

2 code implementations9 May 2023 Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, LiMin Wang, Ping Luo, Jifeng Dai, Yu Qiao

Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, the proposed iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios where the number of objects is greater than 2.

Language Modelling

Physics-based network fine-tuning for robust quantitative susceptibility mapping from high-pass filtered phase

no code implementations5 May 2023 Jinwei Zhang, Alexey Dimov, Chao Li, Hang Zhang, Thanh D. Nguyen, Pascal Spincemaille, Yi Wang

Purpose: To improve the generalization ability of convolutional neural network (CNN) based prediction of quantitative susceptibility mapping (QSM) from high-pass filtered phase (HPFP) image.

SSIM

ScatterFormer: Locally-Invariant Scattering Transformer for Patient-Independent Multispectral Detection of Epileptiform Discharges

1 code implementation26 Apr 2023 Ruizhe Zheng, Jun Li, Yi Wang, Tian Luo, Yuguo Yu

Patient-independent detection of epileptic activities based on visual spectral representation of continuous EEG (cEEG) has been widely used for diagnosing epilepsy.

EEG Seizure Detection

Label-free timing analysis of SiPM-based modularized detectors with physics-constrained deep learning

no code implementations24 Apr 2023 Pengcheng Ai, Le Xiao, Zhi Deng, Yi Wang, Xiangming Sun, Guangming Huang, Dong Wang, Yulei Li, Xinchi Ran

We mathematically demonstrate the existence of the optimal function desired by the method, and give a systematic algorithm for training and calibration of the model.

SSN: Stockwell Scattering Network for SAR Image Change Detection

no code implementations22 Apr 2023 Gong Chen, Yanan Zhao, Yi Wang, Kim-Hui Yap

Recently, synthetic aperture radar (SAR) image change detection has become an interesting yet challenging direction due to the presence of speckle noise.

Change Detection Computational Efficiency

Maximum Spherical Mean Value (mSMV) Filtering for Whole Brain Quantitative Susceptibility Mapping

1 code implementation22 Apr 2023 Alexandra G. Roberts, Dominick J. Romano, Mert Şişman, Alexey V. Dimov, Pascal Spincemaille, Thanh D. Nguyen, Ilhami Kovanlikaya, Susan A. Gauthier, Yi Wang

To develop a tissue field filtering algorithm, called maximum Spherical Mean Value (mSMV), for reducing shadow artifacts in quantitative susceptibility mapping (QSM) of the brain without requiring brain tissue erosion. Residual background field is a major source of shadow artifacts in QSM.

A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

1 code implementation14 Apr 2023 Wenyang Liu, Yi Wang, Kejun Wu, Kim-Hui Yap, Lap-Pui Chau

File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security.

Data Augmentation

mcLARO: Multi-Contrast Learned Acquisition and Reconstruction Optimization for simultaneous quantitative multi-parametric mapping

no code implementations7 Apr 2023 Jinwei Zhang, Thanh D. Nguyen, Eddy Solomon, Chao Li, Qihao Zhang, Jiahao Li, Hang Zhang, Pascal Spincemaille, Yi Wang

Results: The retrospective ablation study showed improved image sharpness of mcLARO compared to the baseline network without multi-contrast sampling pattern optimization or image feature fusion, and negligible bias and narrow 95% limits of agreement on regional T1, T2, T2* and QSM values were obtained by the under-sampled reconstructions compared to the fully sampled reconstruction.

Image Reconstruction

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

1 code implementation CVPR 2023 LiMin Wang, Bingkun Huang, Zhiyu Zhao, Zhan Tong, Yinan He, Yi Wang, Yali Wang, Yu Qiao

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

 Ranked #1 on Self-Supervised Action Recognition on UCF101 (using extra training data)

Action Classification Action Recognition In Videos +4

PointPatchMix: Point Cloud Mixing with Patch Scoring

no code implementations12 Mar 2023 Yi Wang, Jiaze Wang, Jinpeng Li, Zixu Zhao, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng

With Point-MAE as our baseline, our model surpasses previous methods by a significant margin, achieving 86. 3% accuracy on ScanObjectNN and 94. 1% accuracy on ModelNet40.

Data Augmentation

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

no code implementations28 Feb 2023 Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng

Experiments conducted on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest TDNN and Conformer ASR systems integrated domain adapted wav2vec2. 0 models consistently outperform the standalone wav2vec2. 0 models by statistically significant WER reductions of 8. 22% and 3. 43% absolute (26. 71% and 15. 88% relative) on the two tasks respectively.

speech-recognition Speech Recognition

Rate-Perception Optimized Preprocessing for Video Coding

no code implementations25 Jan 2023 Chengqian Ma, Zhiqiang Wu, Chunlei Cai, Pengwei Zhang, Yi Wang, Long Zheng, Chao Chen, Quan Zhou

In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec.

Image Quality Assessment Video Compression

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

1 code implementation CVPR 2023 Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie

The former aims to infer all masked entities in the caption given the group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities.

Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding

no code implementations ICCV 2023 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

The prolific performances of Vision Transformers (ViTs) in image tasks have prompted research into adapting the image ViTs for video tasks.

Video Understanding

Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation

1 code implementation CVPR 2023 Bo Huang, Mingyang Chen, Yi Wang, Junda Lu, Minhao Cheng, Wei Wang

Thus, recent studies concern about adversarial distillation (AD) that aims to inherit not only prediction accuracy but also adversarial robustness of a robust teacher model under the paradigm of robust optimization.

Adversarial Robustness Knowledge Distillation

NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360deg Views

no code implementations CVPR 2023 Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Yi Wang, Zhangyang Wang

In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360deg views that corresponds well with the given reference image.

Denoising Depth Estimation +1

Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection

1 code implementation CVPR 2023 Yi Wang, Ruili Wang, Xin Fan, Tianzhu Wang, Xiangjian He

A multi-level hybrid loss is firstly designed to guide the network to learn pixel-level, region-level, and object-level features.

Decoder object-detection +2

A Survey of Face Recognition

no code implementations26 Dec 2022 Xinyi Wang, Jianteng Peng, Sufang Zhang, Bihui Chen, Yi Wang, Yandong Guo

Recent years witnessed the breakthrough of face recognition with deep convolutional neural networks.

Face Recognition Survey

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

2 code implementations6 Dec 2022 Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

 Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views

1 code implementation29 Nov 2022 Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Yi Wang, Zhangyang Wang

In this work, we study the challenging task of lifting a single image to a 3D object and, for the first time, demonstrate the ability to generate a plausible 3D object with 360{\deg} views that correspond well with the given reference image.

3D Reconstruction Image to 3D +4

CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors

no code implementations26 Nov 2022 Junlin Hou, Jilan Xu, Nan Zhang, Yi Wang, Yuejie Zhang, Xiaobo Zhang, Rui Feng

This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop at the European Conference on Computer Vision (ECCV 2022).

COVID-19 Diagnosis Representation Learning

A Particle-based Sparse Gaussian Process Optimizer

no code implementations26 Nov 2022 Chandrajit Bajaj, Omatharv Bharat Vaidya, Yi Wang

Task learning in neural networks typically requires finding a globally optimal minimizer to a loss function objective.

Image Classification

Adjacent Slice Feature Guided 2.5D Network for Pulmonary Nodule Segmentation

no code implementations19 Nov 2022 Xinwei Xue, Gaoyu Wang, Long Ma, Qi Jia, Yi Wang

In this paper, we design an adjacent slice feature fusion model to introduce information from adjacent slices.

Segmentation

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

3 code implementations17 Nov 2022 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, LiMin Wang, Yu Qiao

UniFormer has successfully alleviated this issue, by unifying convolution and self-attention as a relation aggregator in the transformer format.

Video Understanding

LARO: Learned Acquisition and Reconstruction Optimization to accelerate Quantitative Susceptibility Mapping

1 code implementation1 Nov 2022 Jinwei Zhang, Pascal Spincemaille, Hang Zhang, Thanh D. Nguyen, Chao Li, Jiahao Li, Ilhami Kovanlikaya, Mert R. Sabuncu, Yi Wang

In this paper, we present our new framework, called Learned Acquisition and Reconstruction Optimization (LARO), which aims to accelerate the multi-echo gradient echo (mGRE) pulse sequence for QSM.

Non-Iterative Scribble-Supervised Learning with Pacing Pseudo-Masks for Medical Image Segmentation

1 code implementation20 Oct 2022 Zefan Yang, Di Lin, Dong Ni, Yi Wang

To address these issues, we propose a non-iterative method where a stream of varying (pacing) pseudo-masks teach a network via consistency training, named PacingPseudo.

Image Segmentation Medical Image Segmentation +2

EarthNets: Empowering AI in Earth Observation

no code implementations10 Oct 2022 Zhitong Xiong, Fahong Zhang, Yi Wang, Yilei Shi, Xiao Xiang Zhu

Furthermore, a new platform for EO, termed EarthNets, is released to achieve a fair and consistent evaluation of deep learning methods on remote sensing data.

Deep Learning Earth Observation +2

Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

2 code implementations15 Sep 2022 Yi Wang, Zhiwen Fan, Tianlong Chen, Hehe Fan, Zhangyang Wang

Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds.

Point Cloud Segmentation

A multi view multi stage and multi window framework for pulmonary artery segmentation from CT scans

no code implementations8 Sep 2022 Zeyu Liu, Yi Wang, Jing Wen, Yong Zhang, Hao Yin, Chao Guo, Zhongyu Wang

In addition, in order to improve the segmentation performance, we adopt multi-view and multi-window level method, at the same time we employ a fine-tune strategy to mitigate the impact of inconsistent labeling.

Segmentation

PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals

no code implementations2 Sep 2022 Pengcheng Ai, Zhi Deng, Yi Wang, Hui Gong, Xinchi Ran, Zijian Lang

Recent literature reveals that deep learning models, especially one-dimensional convolutional neural networks, are promising when dealing with digital signals from nuclear detectors.

Deep Learning Quantization

Quality-Constant Per-Shot Encoding by Two-Pass Learning-based Rate Factor Prediction

no code implementations23 Aug 2022 Chunlei Cai, Yi Wang, Xiaobo Li, Tianxiao Ye

With the help of first pass predicted RF and corresponding actual quality as feedback, the second pass prediction will be highly accurate.

Parameter Prediction Prediction

Self-supervised Learning in Remote Sensing: A Review

3 code implementations27 Jun 2022 Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham, Lichao Mou, Xiao Xiang Zhu

In deep learning research, self-supervised learning (SSL) has received great attention triggering interest within both the computer vision and remote sensing communities.

Earth Observation Multi-Label Image Classification +1

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

1 code implementation23 Jun 2022 Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao

Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.

Data Augmentation Vision and Language Navigation

WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis

no code implementations20 Jun 2022 Yi Wang, Yi Si

Recently, GAN-based neural vocoders such as Parallel WaveGAN, MelGAN, HiFiGAN, and UnivNet have become popular due to their lightweight and parallel structure, resulting in a real-time synthesized waveform with high fidelity, even on a CPU.

Speech Synthesis Vocal Bursts Intensity Prediction

Monitoring Urban Forests from Auto-Generated Segmentation Maps

no code implementations14 Jun 2022 Conrad M Albrecht, Chenying Liu, Yi Wang, Levente Klein, Xiao Xiang Zhu

We present and evaluate a weakly-supervised methodology to quantify the spatio-temporal distribution of urban forests based on remotely sensed data with close-to-zero human interaction.

Semantic Segmentation

UMSNet: An Universal Multi-sensor Network for Human Activity Recognition

no code implementations24 May 2022 Jialiang Wang, Haotian Wei, Yi Wang, Shu Yang, Chi Li

Human activity recognition (HAR) based on multimodal sensors has become a rapidly growing branch of biometric recognition and artificial intelligence.

Human Activity Recognition Time Series +2

Beam Training and Tracking in MmWave Communication: A Survey

no code implementations20 May 2022 Yi Wang, Zhiqing Wei, Zhiyong Feng

This article provides an overview of the beam training and tracking technologies on mmWave bands and reveals the insights for future research in the 6th Generation (6G) mobile network.

Survey

Long-run User Value Optimization in Recommender Systems through Content Creation Modeling

no code implementations25 Apr 2022 Akos Lada, Xiaoxuan Liu, Jens Rischbieth, Yi Wang, Yuwen Zhang

Content recommender systems are generally adept at maximizing immediate user satisfaction but to optimize for the \textit{long-run} user value, we need more statistically sophisticated solutions than off-the-shelf simple recommender algorithms.

BIG-bench Machine Learning Recommendation Systems

Self-supervised Vision Transformers for Joint SAR-optical Representation Learning

2 code implementations11 Apr 2022 Yi Wang, Conrad M Albrecht, Xiao Xiang Zhu

Experimental results employing the BigEarthNet-MM dataset demonstrate the benefits of both, the ViT backbones and the proposed multimodal SSL algorithm DINO-MM.

Data Augmentation Earth Observation +2

A Global Modeling Approach for Load Forecasting in Distribution Networks

no code implementations1 Apr 2022 Miha Grabner, Yi Wang, Qingsong Wen, Boštjan Blažič, Vitomir Štruc

Efficient load forecasting is needed to ensure better observability in the distribution networks, whereas such forecasting is made possible by an increasing number of smart meter installations.

Load Forecasting

Cannot find the paper you are looking for? You can Submit a new open access paper.